lynx   »   [go: up one dir, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-02-21T01:22:04.550Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7230544090270996},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"65d5a86e5531669ed2b8eca9","author":{"_id":"657152eb12f162153b50ec9d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/657152eb12f162153b50ec9d/qnldHP35PclV0pDz_05q8.jpeg","fullname":"Byung-Kwan Lee","name":"BK-Lee","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":60},"createdAt":"2024-02-21T07:38:22.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"You can access the code of CoLLaVO-7B by https://github.com/ByungKwanLee/CoLLaVO","html":"

You can access the code of CoLLaVO-7B by https://github.com/ByungKwanLee/CoLLaVO

\n","updatedAt":"2024-03-01T04:19:45.858Z","author":{"_id":"657152eb12f162153b50ec9d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/657152eb12f162153b50ec9d/qnldHP35PclV0pDz_05q8.jpeg","fullname":"Byung-Kwan Lee","name":"BK-Lee","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":60}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.7319943308830261},"editors":["BK-Lee"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/657152eb12f162153b50ec9d/qnldHP35PclV0pDz_05q8.jpeg"],"reactions":[{"reaction":"👍","users":["chae-won-kim","steve74"],"count":2}],"isReport":false}},{"id":"65e7a1e574ab027493a1ac79","author":{"_id":"6141a88b3a0ec78603c9e784","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6141a88b3a0ec78603c9e784/DJsxSmWV39M33JFheLobC.jpeg","fullname":"merve","name":"merve","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":9206},"createdAt":"2024-03-05T22:51:17.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"@BK-Lee would you like to host the model and the demo on Hugging Face?","html":"

\n\n@BK-Lee\n\t would you like to host the model and the demo on Hugging Face?

\n","updatedAt":"2024-03-05T22:51:17.313Z","author":{"_id":"6141a88b3a0ec78603c9e784","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6141a88b3a0ec78603c9e784/DJsxSmWV39M33JFheLobC.jpeg","fullname":"merve","name":"merve","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":9206}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9776211380958557},"editors":["merve"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6141a88b3a0ec78603c9e784/DJsxSmWV39M33JFheLobC.jpeg"],"reactions":[],"isReport":false},"replies":[{"id":"65e7c2d4239d815cc6004485","author":{"_id":"657152eb12f162153b50ec9d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/657152eb12f162153b50ec9d/qnldHP35PclV0pDz_05q8.jpeg","fullname":"Byung-Kwan Lee","name":"BK-Lee","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":60},"createdAt":"2024-03-06T01:11:48.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Yes! I am preparing the code first and then will upload the hosting model on huggingface space. Now, we are also preparing follow-up large language and vision model for more strong performance, so we plan to upload simultaneously. Thanks for your interest!","html":"

Yes! I am preparing the code first and then will upload the hosting model on huggingface space. Now, we are also preparing follow-up large language and vision model for more strong performance, so we plan to upload simultaneously. Thanks for your interest!

\n","updatedAt":"2024-03-06T01:11:48.188Z","author":{"_id":"657152eb12f162153b50ec9d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/657152eb12f162153b50ec9d/qnldHP35PclV0pDz_05q8.jpeg","fullname":"Byung-Kwan Lee","name":"BK-Lee","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":60}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9475874304771423},"editors":["BK-Lee"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/657152eb12f162153b50ec9d/qnldHP35PclV0pDz_05q8.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"65e7a1e574ab027493a1ac79"}}]},{"id":"65f8a5d3ecfebaf32607443e","author":{"_id":"657152eb12f162153b50ec9d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/657152eb12f162153b50ec9d/qnldHP35PclV0pDz_05q8.jpeg","fullname":"Byung-Kwan Lee","name":"BK-Lee","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":60},"createdAt":"2024-03-18T20:36:35.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"CoLLaVO-7B model has been released in https://huggingface.co/BK-Lee/CoLLaVO-7B!","html":"

CoLLaVO-7B model has been released in https://huggingface.co/BK-Lee/CoLLaVO-7B!

\n","updatedAt":"2024-03-18T20:36:35.739Z","author":{"_id":"657152eb12f162153b50ec9d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/657152eb12f162153b50ec9d/qnldHP35PclV0pDz_05q8.jpeg","fullname":"Byung-Kwan Lee","name":"BK-Lee","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":60}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9498259425163269},"editors":["BK-Lee"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/657152eb12f162153b50ec9d/qnldHP35PclV0pDz_05q8.jpeg"],"reactions":[],"isReport":false}},{"id":"65f9b448ca7b4b9e1857fe93","author":{"_id":"6141a88b3a0ec78603c9e784","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6141a88b3a0ec78603c9e784/DJsxSmWV39M33JFheLobC.jpeg","fullname":"merve","name":"merve","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":9206},"createdAt":"2024-03-19T15:50:32.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"@BK-Lee great initiative with model card 🤩 looking forward to the demo!","html":"

\n\n@BK-Lee\n\t great initiative with model card 🤩 looking forward to the demo!

\n","updatedAt":"2024-03-19T15:50:32.293Z","author":{"_id":"6141a88b3a0ec78603c9e784","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6141a88b3a0ec78603c9e784/DJsxSmWV39M33JFheLobC.jpeg","fullname":"merve","name":"merve","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":9206}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.887412428855896},"editors":["merve"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6141a88b3a0ec78603c9e784/DJsxSmWV39M33JFheLobC.jpeg"],"reactions":[{"reaction":"❤️","users":["BK-Lee"],"count":1}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2402.11248","authors":[{"_id":"65d43a60adc348158d095a23","user":{"_id":"657152eb12f162153b50ec9d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/657152eb12f162153b50ec9d/qnldHP35PclV0pDz_05q8.jpeg","isPro":false,"fullname":"Byung-Kwan Lee","user":"BK-Lee","type":"user"},"name":"Byung-Kwan Lee","status":"admin_assigned","statusLastChangedAt":"2024-02-20T10:13:56.582Z","hidden":false},{"_id":"65d43a60adc348158d095a24","user":{"_id":"65390e7ce279d2fc80a3f800","avatarUrl":"/avatars/a930c1f20fa1eef372ca92139cb32887.svg","isPro":false,"fullname":"Beomchan Park","user":"bpbpbp0810","type":"user"},"name":"Beomchan Park","status":"claimed_verified","statusLastChangedAt":"2024-02-20T10:10:04.916Z","hidden":false},{"_id":"65d43a60adc348158d095a25","user":{"_id":"65aa2e84e2a2c86356263340","avatarUrl":"/avatars/cba9ecceff0f67b2b5dde02e7c0a9991.svg","isPro":false,"fullname":"Chae Won Kim","user":"chae-won-kim","type":"user"},"name":"Chae Won Kim","status":"claimed_verified","statusLastChangedAt":"2024-02-20T11:43:57.662Z","hidden":false},{"_id":"65d43a60adc348158d095a26","user":{"_id":"660603529e3555d648e3baf1","avatarUrl":"/avatars/0f54479afcfc19df00b25d5aedb4cf67.svg","isPro":false,"fullname":"Yong Man Ro","user":"dwightro","type":"user"},"name":"Yong Man Ro","status":"claimed_verified","statusLastChangedAt":"2024-04-03T09:05:18.949Z","hidden":false}],"publishedAt":"2024-02-17T11:03:02.000Z","submittedOnDailyAt":"2024-02-20T03:06:33.162Z","title":"CoLLaVO: Crayon Large Language and Vision mOdel","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"The remarkable success of Large Language Models (LLMs) and instruction tuning\ndrives the evolution of Vision Language Models (VLMs) towards a versatile\ngeneral-purpose model. Yet, it remains unexplored whether current VLMs\ngenuinely possess quality object-level image understanding capabilities\ndetermined from 'what objects are in the image?' or 'which object corresponds\nto a specified bounding box?'. Our findings reveal that the image understanding\ncapabilities of current VLMs are strongly correlated with their zero-shot\nperformance on Vision Language (VL) tasks. This suggests that prioritizing\nbasic image understanding is crucial for VLMs to excel at VL tasks. To enhance\nobject-level image understanding, we propose Crayon Large Language and Vision\nmOdel (CoLLaVO), which incorporates instruction tuning with crayon prompt as a\nnew visual prompt tuning scheme based on panoptic color maps. Furthermore, we\npresent a learning strategy of Dual QLoRA to preserve object-level image\nunderstanding without forgetting it during visual instruction tuning, thereby\nachieving a significant leap in zero-shot numerous VL benchmarks.","upvotes":22,"discussionId":"65d43a61adc348158d095a44","ai_summary":"The study proposes CoLLaVO, a Visual Language Model enhanced with crayon prompt tuning and Dual QLoRA to improve object-level image understanding and zero-shot performance in Vision Language tasks.","ai_keywords":["Large Language Models (LLMs)","Vision Language Models (VLMs)","zero-shot performance","Vision Language (VL) tasks","object-level image understanding","instruction tuning","crayon prompt","panoptic color maps","Dual QLoRA","VL benchmarks"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"65390e7ce279d2fc80a3f800","avatarUrl":"/avatars/a930c1f20fa1eef372ca92139cb32887.svg","isPro":false,"fullname":"Beomchan Park","user":"bpbpbp0810","type":"user"},{"_id":"65aa2e84e2a2c86356263340","avatarUrl":"/avatars/cba9ecceff0f67b2b5dde02e7c0a9991.svg","isPro":false,"fullname":"Chae Won Kim","user":"chae-won-kim","type":"user"},{"_id":"5ecea265968f6028e0559fa5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1619623771844-5ecea265968f6028e0559fa5.jpeg","isPro":true,"fullname":"Victor Sanh","user":"VictorSanh","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"6434b6619bd5a84b5dcfa4de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6434b6619bd5a84b5dcfa4de/h8Q6kPNjFNc03wmdboHzq.jpeg","isPro":false,"fullname":"Young-Jun Lee","user":"passing2961","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6311bca0ae8896941da24e66","avatarUrl":"/avatars/48de64894fc3c9397e26e4d6da3ff537.svg","isPro":false,"fullname":"Fynn Kröger","user":"fynnkroeger","type":"user"},{"_id":"6527e89a8808d80ccff88b7a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6527e89a8808d80ccff88b7a/CuGNmF1Et8KMQ0mCd1NEJ.jpeg","isPro":true,"fullname":"Hafedh Hichri","user":"not-lain","type":"user"},{"_id":"657152eb12f162153b50ec9d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/657152eb12f162153b50ec9d/qnldHP35PclV0pDz_05q8.jpeg","isPro":false,"fullname":"Byung-Kwan Lee","user":"BK-Lee","type":"user"},{"_id":"6093a02dc4a92d63a91c5236","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6093a02dc4a92d63a91c5236/yUte6V0FU0BvVFAbON-9n.jpeg","isPro":true,"fullname":"Diwank Tomer","user":"diwank","type":"user"},{"_id":"641a5747f5e9c66105009992","avatarUrl":"/avatars/971346de2396361114f11a2527e909b8.svg","isPro":false,"fullname":"Ermson Seg","user":"emersonium","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2402.11248

CoLLaVO: Crayon Large Language and Vision mOdel

Published on Feb 17, 2024
· Submitted by AK on Feb 20, 2024

Abstract

The study proposes CoLLaVO, a Visual Language Model enhanced with crayon prompt tuning and Dual QLoRA to improve object-level image understanding and zero-shot performance in Vision Language tasks.

AI-generated summary

The remarkable success of Large Language Models (LLMs) and instruction tuning drives the evolution of Vision Language Models (VLMs) towards a versatile general-purpose model. Yet, it remains unexplored whether current VLMs genuinely possess quality object-level image understanding capabilities determined from 'what objects are in the image?' or 'which object corresponds to a specified bounding box?'. Our findings reveal that the image understanding capabilities of current VLMs are strongly correlated with their zero-shot performance on Vision Language (VL) tasks. This suggests that prioritizing basic image understanding is crucial for VLMs to excel at VL tasks. To enhance object-level image understanding, we propose Crayon Large Language and Vision mOdel (CoLLaVO), which incorporates instruction tuning with crayon prompt as a new visual prompt tuning scheme based on panoptic color maps. Furthermore, we present a learning strategy of Dual QLoRA to preserve object-level image understanding without forgetting it during visual instruction tuning, thereby achieving a significant leap in zero-shot numerous VL benchmarks.

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

You can access the code of CoLLaVO-7B by https://github.com/ByungKwanLee/CoLLaVO

@BK-Lee would you like to host the model and the demo on Hugging Face?

·
Paper author

Yes! I am preparing the code first and then will upload the hosting model on huggingface space. Now, we are also preparing follow-up large language and vision model for more strong performance, so we plan to upload simultaneously. Thanks for your interest!

Paper author

CoLLaVO-7B model has been released in https://huggingface.co/BK-Lee/CoLLaVO-7B!

@BK-Lee great initiative with model card 🤩 looking forward to the demo!

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2402.11248 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2402.11248 in a Space README.md to link it from this page.

Collections including this paper 2

Лучший частный хостинг