lynx   »   [go: up one dir, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-03-23T01:20:49.906Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7259346842765808},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"6665290d382f6b0a96ce036b","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":142},"createdAt":"2024-06-09T04:01:17.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"# Boosting Efficiency: Cobra's Leap in Multi-Modal AI Inference\n\nhttps://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/XbLUCrY4PDwOexGn5_Ucd.mp4 \n\n## Links 🔗:\n👉 Subscribe: https://www.youtube.com/@Arxflix\n👉 Twitter: https://x.com/arxflix\n👉 LMNT (Partner): https://lmnt.com/\n\n\nBy Arxflix\n![9t4iCUHx_400x400-1.jpg](https://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/v4S5zBurs0ouGNwYj1GEd.jpeg)","html":"

Boosting Efficiency: Cobra's Leap in Multi-Modal AI Inference

\n

\n\n

Links 🔗:

\n

👉 Subscribe: https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/

\n

By Arxflix
\"9t4iCUHx_400x400-1.jpg\"

\n","updatedAt":"2024-06-09T04:01:17.773Z","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":142}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5024954676628113},"editors":["blanchon"],"editorAvatarUrls":["/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2403.14520","authors":[{"_id":"65fd0a404d36be78e694bd0d","user":{"_id":"646dbbc8075bbcc48ddcecbf","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646dbbc8075bbcc48ddcecbf/V52Em-78O5F3QxRbRwG5O.jpeg","isPro":false,"fullname":"Han Zhao","user":"han1997","type":"user"},"name":"Han Zhao","status":"claimed_verified","statusLastChangedAt":"2024-03-22T13:24:59.524Z","hidden":false},{"_id":"65fd0a404d36be78e694bd0e","name":"Min Zhang","hidden":false},{"_id":"65fd0a404d36be78e694bd0f","name":"Wei Zhao","hidden":false},{"_id":"65fd0a404d36be78e694bd10","name":"Pengxiang Ding","hidden":false},{"_id":"65fd0a404d36be78e694bd11","user":{"_id":"65fd82762bf2cd20ddaa193f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/yBYbWp_mT7UusYdkqtAvw.png","isPro":false,"fullname":"Siteng Huang","user":"huangsiteng","type":"user"},"name":"Siteng Huang","status":"claimed_verified","statusLastChangedAt":"2024-03-22T13:25:01.255Z","hidden":false},{"_id":"65fd0a404d36be78e694bd12","name":"Donglin Wang","hidden":false}],"publishedAt":"2024-03-21T16:17:57.000Z","submittedOnDailyAt":"2024-03-22T03:04:09.151Z","title":"Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient\n Inference","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"In recent years, the application of multimodal large language models (MLLM)\nin various fields has achieved remarkable success. However, as the foundation\nmodel for many downstream tasks, current MLLMs are composed of the well-known\nTransformer network, which has a less efficient quadratic computation\ncomplexity. To improve the efficiency of such basic models, we propose Cobra, a\nlinear computational complexity MLLM. Specifically, Cobra integrates the\nefficient Mamba language model into the visual modality. Moreover, we explore\nand study various modal fusion schemes to create an effective multi-modal\nMamba. Extensive experiments demonstrate that (1) Cobra achieves extremely\ncompetitive performance with current computationally efficient state-of-the-art\nmethods, e.g., LLaVA-Phi, TinyLLaVA, and MobileVLM v2, and has faster\nspeed due to Cobra's linear sequential modeling. (2) Interestingly, the results\nof closed-set challenging prediction benchmarks show that Cobra performs well\nin overcoming visual illusions and spatial relationship judgments. (3) Notably,\nCobra even achieves comparable performance to LLaVA with about 43% of the\nnumber of parameters. We will make all codes of Cobra open-source and hope that\nthe proposed method can facilitate future research on complexity problems in\nMLLM. Our project page is available at: https://sites.google.com/view/cobravlm.","upvotes":35,"discussionId":"65fd0a414d36be78e694bd2a","ai_summary":"Cobra, a linear-complexity multimodal large language model, integrates the Mamba language model with visual modality, achieving competitive performance and faster inference speed compared to existing state-of-the-art models.","ai_keywords":["multimodal large language models","MLLMs","Transformer network","linear computational complexity","Mamba language model","modal fusion schemes","closed-set prediction benchmarks","visual illusions","spatial relationship judgments","parameter-efficient fine-tuning"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"655ac762cb17ec19ef82719b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655ac762cb17ec19ef82719b/1kDncYrGLYS_2SR8cNdAL.png","isPro":false,"fullname":"Welcome to matlok","user":"matlok","type":"user"},{"_id":"63869d1e81fe8c678a3a9422","avatarUrl":"/avatars/3bb8728057fa2ba0e24f5ceb1600068d.svg","isPro":true,"fullname":"Zach Mustafa","user":"Zmu","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"657152eb12f162153b50ec9d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/657152eb12f162153b50ec9d/qnldHP35PclV0pDz_05q8.jpeg","isPro":false,"fullname":"Byung-Kwan Lee","user":"BK-Lee","type":"user"},{"_id":"635f9fd1ae7144a6674c839b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1667211208219-noauth.jpeg","isPro":false,"fullname":"Marcus Gawronsky","user":"marcusinthesky","type":"user"},{"_id":"65fd82762bf2cd20ddaa193f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/yBYbWp_mT7UusYdkqtAvw.png","isPro":false,"fullname":"Siteng Huang","user":"huangsiteng","type":"user"},{"_id":"646dbbc8075bbcc48ddcecbf","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646dbbc8075bbcc48ddcecbf/V52Em-78O5F3QxRbRwG5O.jpeg","isPro":false,"fullname":"Han Zhao","user":"han1997","type":"user"},{"_id":"6065a9cbe43e52694178ed78","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6065a9cbe43e52694178ed78/TwnrOfTacrwtLnBC8cS8e.jpeg","isPro":false,"fullname":"Emanuele Vivoli","user":"emanuelevivoli","type":"user"},{"_id":"65646b22ac9d3c2bd7b14788","avatarUrl":"/avatars/0bf19dcfa568a694361fb3a63b999997.svg","isPro":false,"fullname":"Juhwan Choi","user":"c-juhwan","type":"user"},{"_id":"620c35eece371f5bad535d6e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1669407156872-620c35eece371f5bad535d6e.jpeg","isPro":true,"fullname":"Andrew Pouliot","user":"darknoon","type":"user"},{"_id":"63c1b96770b05b9663757e08","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1673640235935-noauth.png","isPro":false,"fullname":"Brent Moreno","user":"Aideations","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":3}">
Papers
arxiv:2403.14520

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Published on Mar 21, 2024
· Submitted by AK on Mar 22, 2024
#3 Paper of the day
Authors:
,
,
,

Abstract

Cobra, a linear-complexity multimodal large language model, integrates the Mamba language model with visual modality, achieving competitive performance and faster inference speed compared to existing state-of-the-art models.

AI-generated summary

In recent years, the application of multimodal large language models (MLLM) in various fields has achieved remarkable success. However, as the foundation model for many downstream tasks, current MLLMs are composed of the well-known Transformer network, which has a less efficient quadratic computation complexity. To improve the efficiency of such basic models, we propose Cobra, a linear computational complexity MLLM. Specifically, Cobra integrates the efficient Mamba language model into the visual modality. Moreover, we explore and study various modal fusion schemes to create an effective multi-modal Mamba. Extensive experiments demonstrate that (1) Cobra achieves extremely competitive performance with current computationally efficient state-of-the-art methods, e.g., LLaVA-Phi, TinyLLaVA, and MobileVLM v2, and has faster speed due to Cobra's linear sequential modeling. (2) Interestingly, the results of closed-set challenging prediction benchmarks show that Cobra performs well in overcoming visual illusions and spatial relationship judgments. (3) Notably, Cobra even achieves comparable performance to LLaVA with about 43% of the number of parameters. We will make all codes of Cobra open-source and hope that the proposed method can facilitate future research on complexity problems in MLLM. Our project page is available at: https://sites.google.com/view/cobravlm.

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Boosting Efficiency: Cobra's Leap in Multi-Modal AI Inference

Links 🔗:

👉 Subscribe: https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/

By Arxflix
9t4iCUHx_400x400-1.jpg

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2403.14520 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 7

Лучший частный хостинг