lynx   »   [go: up one dir, main page]

https://github.com/ByteDance-Seed/Seed1.5-VL
API: https://www.volcengine.com/product/doubao

\n","updatedAt":"2025-05-13T02:17:43.028Z","author":{"_id":"646b3db131968a60a01e4cf5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646b3db131968a60a01e4cf5/DhfdqUYQaD1Qa8Svw996J.jpeg","fullname":"Tianheng Cheng","name":"wondervictor","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":41}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7382585406303406},"editors":["wondervictor"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/646b3db131968a60a01e4cf5/DhfdqUYQaD1Qa8Svw996J.jpeg"],"reactions":[],"isReport":false},"replies":[{"id":"6822ed666617510781801277","author":{"_id":"6697522e1359e2aadf1a3bbd","avatarUrl":"/avatars/8c53df921854309ac010e94ff14109bc.svg","fullname":"Zhang","name":"zdhpeter","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1},"createdAt":"2025-05-13T06:57:42.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Will this model be open source in the future?","html":"

Will this model be open source in the future?

\n","updatedAt":"2025-05-13T06:57:42.603Z","author":{"_id":"6697522e1359e2aadf1a3bbd","avatarUrl":"/avatars/8c53df921854309ac010e94ff14109bc.svg","fullname":"Zhang","name":"zdhpeter","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8833333849906921},"editors":["zdhpeter"],"editorAvatarUrls":["/avatars/8c53df921854309ac010e94ff14109bc.svg"],"reactions":[{"reaction":"👍","users":["Enigrand","mrdbourke","zdhpeter","oceansweep","Vil","pszemraj","travisking","sohampnow","gavinpu","Ada321","Bruvs","sunny0414","zhuya1996","Shubham7262","Foxyz1302","jordyvl","Room64","randomtomato","Hundsfutz","bot9696","tanhuajie2001","HermanHuang","brandonbeiler","HaFred","sujangowda","reptillicus","JeffersonNunn","sunhaoxing"],"count":28}],"isReport":false,"parentCommentId":"6822abc747ecc0ddcfa98346"}}]},{"id":"68239fd7f4516a6a69eff416","author":{"_id":"6813ee19c9b224a738fea856","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/g1uPHIKEgWe1ftHGHbo_U.png","fullname":"YJ","name":"yjh415","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false},"createdAt":"2025-05-13T19:39:03.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"an audio overview for learning on the go: https://youtu.be/h-l7jqKs-Xg","html":"

an audio overview for learning on the go: https://youtu.be/h-l7jqKs-Xg

\n","updatedAt":"2025-05-13T20:38:45.129Z","author":{"_id":"6813ee19c9b224a738fea856","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/g1uPHIKEgWe1ftHGHbo_U.png","fullname":"YJ","name":"yjh415","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.24454575777053833},"editors":["yjh415"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/g1uPHIKEgWe1ftHGHbo_U.png"],"reactions":[{"reaction":"🔥","users":["krohak"],"count":1}],"isReport":false}},{"id":"6823f37b3febb0b649c0e7bf","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264},"createdAt":"2025-05-14T01:35:55.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models](https://huggingface.co/papers/2504.15271) (2025)\n* [Kimi-VL Technical Report](https://huggingface.co/papers/2504.07491) (2025)\n* [Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains](https://huggingface.co/papers/2504.20199) (2025)\n* [Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency](https://huggingface.co/papers/2504.18589) (2025)\n* [JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse](https://huggingface.co/papers/2503.16365) (2025)\n* [VGRP-Bench: Visual Grid Reasoning Puzzle Benchmark for Large Vision-Language Models](https://huggingface.co/papers/2503.23064) (2025)\n* [NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks](https://huggingface.co/papers/2504.19854) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-05-14T01:35:55.498Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6786352396011353},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2505.07062","authors":[{"_id":"6822ab54a79983c00c8164eb","name":"Dong Guo","hidden":false},{"_id":"6822ab54a79983c00c8164ec","name":"Faming Wu","hidden":false},{"_id":"6822ab54a79983c00c8164ed","name":"Feida Zhu","hidden":false},{"_id":"6822ab54a79983c00c8164ee","name":"Fuxing Leng","hidden":false},{"_id":"6822ab54a79983c00c8164ef","user":{"_id":"63561bfdbcf42eac0b8f13cf","avatarUrl":"/avatars/6c126e2681b930e0bad3255358fc6e48.svg","isPro":false,"fullname":"Guang Shi","user":"anyuzx","type":"user"},"name":"Guang Shi","status":"admin_assigned","statusLastChangedAt":"2025-05-13T08:17:52.619Z","hidden":false},{"_id":"6822ab54a79983c00c8164f0","user":{"_id":"642a8cbfa096201096ea8fca","avatarUrl":"/avatars/232b5c38034d777d63be1157e37866b6.svg","isPro":false,"fullname":"haobin chen","user":"chb1997","type":"user"},"name":"Haobin Chen","status":"admin_assigned","statusLastChangedAt":"2025-05-13T08:17:44.245Z","hidden":false},{"_id":"6822ab54a79983c00c8164f1","name":"Haoqi Fan","hidden":false},{"_id":"6822ab54a79983c00c8164f2","name":"Jian Wang","hidden":false},{"_id":"6822ab54a79983c00c8164f3","name":"Jianyu Jiang","hidden":false},{"_id":"6822ab54a79983c00c8164f4","user":{"_id":"6788122301d5ba1d3eff23ba","avatarUrl":"/avatars/203dc8e1d542be55ea16eafcd7f396ff.svg","isPro":false,"fullname":"Jiawei Wang","user":"0nejiawei","type":"user"},"name":"Jiawei Wang","status":"admin_assigned","statusLastChangedAt":"2025-05-13T08:18:32.411Z","hidden":false},{"_id":"6822ab54a79983c00c8164f5","name":"Jingji Chen","hidden":false},{"_id":"6822ab54a79983c00c8164f6","name":"Jingjia Huang","hidden":false},{"_id":"6822ab54a79983c00c8164f7","name":"Kang Lei","hidden":false},{"_id":"6822ab54a79983c00c8164f8","user":{"_id":"6638cd255b81e56d337a4a98","avatarUrl":"/avatars/b00805982755e4838b5ce0b23e3357c2.svg","isPro":false,"fullname":"Liping Yuan","user":"yuanlp","type":"user"},"name":"Liping Yuan","status":"admin_assigned","statusLastChangedAt":"2025-05-13T08:18:54.175Z","hidden":false},{"_id":"6822ab54a79983c00c8164f9","name":"Lishu Luo","hidden":false},{"_id":"6822ab54a79983c00c8164fa","name":"Pengfei Liu","hidden":false},{"_id":"6822ab54a79983c00c8164fb","user":{"_id":"64530fc01a57e1179c1fe4c0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/0lncTpIHXn6suB0p-oSma.jpeg","isPro":false,"fullname":"QinghaoYe","user":"MAGAer13","type":"user"},"name":"Qinghao Ye","status":"claimed_verified","statusLastChangedAt":"2025-05-22T07:18:27.280Z","hidden":false},{"_id":"6822ab54a79983c00c8164fc","name":"Rui Qian","hidden":false},{"_id":"6822ab54a79983c00c8164fd","name":"Shen Yan","hidden":false},{"_id":"6822ab54a79983c00c8164fe","user":{"_id":"61e5fb1377496de0a6d95c74","avatarUrl":"/avatars/d9864dc9033e2a6f6563a645cf9f455f.svg","isPro":false,"fullname":"Shixiong Zhao","user":"kuma-zhao","type":"user"},"name":"Shixiong Zhao","status":"admin_assigned","statusLastChangedAt":"2025-05-13T08:19:22.238Z","hidden":false},{"_id":"6822ab54a79983c00c8164ff","name":"Shuai Peng","hidden":false},{"_id":"6822ab54a79983c00c816500","name":"Shuangye Li","hidden":false},{"_id":"6822ab54a79983c00c816501","name":"Sihang Yuan","hidden":false},{"_id":"6822ab54a79983c00c816502","name":"Sijin Wu","hidden":false},{"_id":"6822ab54a79983c00c816503","user":{"_id":"646b3db131968a60a01e4cf5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646b3db131968a60a01e4cf5/DhfdqUYQaD1Qa8Svw996J.jpeg","isPro":false,"fullname":"Tianheng Cheng","user":"wondervictor","type":"user"},"name":"Tianheng Cheng","status":"claimed_verified","statusLastChangedAt":"2025-05-13T09:54:28.393Z","hidden":false},{"_id":"6822ab54a79983c00c816504","name":"Weiwei Liu","hidden":false},{"_id":"6822ab54a79983c00c816505","name":"Wenqian Wang","hidden":false},{"_id":"6822ab54a79983c00c816506","user":{"_id":"666d10506ec1340a7421a92d","avatarUrl":"/avatars/5bff16fca817b35760e0c26d8d7c9424.svg","isPro":false,"fullname":"Xianhan Zeng","user":"RitzzZz23","type":"user"},"name":"Xianhan Zeng","status":"claimed_verified","statusLastChangedAt":"2025-05-13T08:12:07.125Z","hidden":false},{"_id":"6822ab54a79983c00c816507","name":"Xiao Liu","hidden":false},{"_id":"6822ab54a79983c00c816508","name":"Xiaobo Qin","hidden":false},{"_id":"6822ab54a79983c00c816509","name":"Xiaohan Ding","hidden":false},{"_id":"6822ab54a79983c00c81650a","name":"Xiaojun Xiao","hidden":false},{"_id":"6822ab54a79983c00c81650b","name":"Xiaoying Zhang","hidden":false},{"_id":"6822ab54a79983c00c81650c","name":"Xuanwei Zhang","hidden":false},{"_id":"6822ab54a79983c00c81650d","name":"Xuehan Xiong","hidden":false},{"_id":"6822ab54a79983c00c81650e","name":"Yanghua Peng","hidden":false},{"_id":"6822ab54a79983c00c81650f","name":"Yangrui Chen","hidden":false},{"_id":"6822ab54a79983c00c816510","name":"Yanwei Li","hidden":false},{"_id":"6822ab54a79983c00c816511","name":"Yanxu Hu","hidden":false},{"_id":"6822ab54a79983c00c816512","name":"Yi Lin","hidden":false},{"_id":"6822ab54a79983c00c816513","name":"Yiyuan Hu","hidden":false},{"_id":"6822ab54a79983c00c816514","user":{"_id":"63176933b58b0184630d2c74","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63176933b58b0184630d2c74/53b5EASwW76zeyyqeJA3O.jpeg","isPro":false,"fullname":"Yiyuan Zhang","user":"Yiyuan","type":"user"},"name":"Yiyuan Zhang","status":"claimed_verified","statusLastChangedAt":"2025-05-13T08:12:09.217Z","hidden":false},{"_id":"6822ab54a79983c00c816515","name":"Youbin Wu","hidden":false},{"_id":"6822ab54a79983c00c816516","name":"Yu Li","hidden":false},{"_id":"6822ab54a79983c00c816517","name":"Yudong Liu","hidden":false},{"_id":"6822ab54a79983c00c816518","name":"Yue Ling","hidden":false},{"_id":"6822ab54a79983c00c816519","name":"Yujia Qin","hidden":false},{"_id":"6822ab54a79983c00c81651a","name":"Zanbo Wang","hidden":false},{"_id":"6822ab54a79983c00c81651b","user":{"_id":"65bf84e772615eb159a604cc","avatarUrl":"/avatars/c0ff550281e6894a762ac57d362ffdde.svg","isPro":false,"fullname":"hezhiwu","user":"hezhiwu","type":"user"},"name":"Zhiwu He","status":"admin_assigned","statusLastChangedAt":"2025-05-13T08:22:32.103Z","hidden":false},{"_id":"6822ab54a79983c00c81651c","name":"Aoxue Zhang","hidden":false},{"_id":"6822ab54a79983c00c81651d","user":{"_id":"6369d92f64aad59d4d44d362","avatarUrl":"/avatars/73956400cfbfd53116aefc17b3c9f0fd.svg","isPro":false,"fullname":"Yi","user":"Bairen","type":"user"},"name":"Bairen Yi","status":"claimed_verified","statusLastChangedAt":"2025-05-14T07:36:04.807Z","hidden":false},{"_id":"6822ab54a79983c00c81651e","name":"Bencheng Liao","hidden":false},{"_id":"6822ab54a79983c00c81651f","name":"Can Huang","hidden":false},{"_id":"6822ab54a79983c00c816520","name":"Can Zhang","hidden":false},{"_id":"6822ab54a79983c00c816521","name":"Chaorui Deng","hidden":false},{"_id":"6822ab54a79983c00c816522","name":"Chaoyi Deng","hidden":false},{"_id":"6822ab54a79983c00c816523","name":"Cheng Lin","hidden":false},{"_id":"6822ab54a79983c00c816524","name":"Cheng Yuan","hidden":false},{"_id":"6822ab54a79983c00c816525","name":"Chenggang Li","hidden":false},{"_id":"6822ab54a79983c00c816526","user":{"_id":"652e9c5774d1b0d7ff73d091","avatarUrl":"/avatars/a6d2098b3dde4a8b7488a193f0ecb776.svg","isPro":true,"fullname":"Chenhui Gou","user":"gouc","type":"user"},"name":"Chenhui Gou","status":"claimed_verified","statusLastChangedAt":"2025-05-21T09:04:42.162Z","hidden":false},{"_id":"6822ab54a79983c00c816527","name":"Chenwei Lou","hidden":false},{"_id":"6822ab54a79983c00c816528","name":"Chengzhi Wei","hidden":false},{"_id":"6822ab54a79983c00c816529","name":"Chundian Liu","hidden":false},{"_id":"6822ab54a79983c00c81652a","name":"Chunyuan Li","hidden":false},{"_id":"6822ab54a79983c00c81652b","name":"Deyao Zhu","hidden":false},{"_id":"6822ab54a79983c00c81652c","name":"Donghong Zhong","hidden":false},{"_id":"6822ab54a79983c00c81652d","name":"Feng Li","hidden":false},{"_id":"6822ab54a79983c00c81652e","name":"Feng Zhang","hidden":false},{"_id":"6822ab54a79983c00c81652f","name":"Gang Wu","hidden":false},{"_id":"6822ab54a79983c00c816530","name":"Guodong Li","hidden":false},{"_id":"6822ab54a79983c00c816531","name":"Guohong Xiao","hidden":false},{"_id":"6822ab54a79983c00c816532","name":"Haibin Lin","hidden":false},{"_id":"6822ab54a79983c00c816533","name":"Haihua Yang","hidden":false},{"_id":"6822ab54a79983c00c816534","user":{"_id":"678a19ba39c63f336d24cc27","avatarUrl":"/avatars/5bec449236ac7d4a0936ef0dd4046761.svg","isPro":false,"fullname":"Haoming Wang","user":"MingComplex","type":"user"},"name":"Haoming Wang","status":"claimed_verified","statusLastChangedAt":"2025-09-03T09:31:36.819Z","hidden":false},{"_id":"6822ab54a79983c00c816535","name":"Heng Ji","hidden":false},{"_id":"6822ab54a79983c00c816536","name":"Hongxiang Hao","hidden":false},{"_id":"6822ab54a79983c00c816537","name":"Hui Shen","hidden":false},{"_id":"6822ab54a79983c00c816538","user":{"_id":"6368b457e943416fc80619fb","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6368b457e943416fc80619fb/RjAJKMwl6aIFKAZs1vgvK.jpeg","isPro":false,"fullname":"HXLee","user":"HXLee","type":"user"},"name":"Huixia Li","status":"claimed_verified","statusLastChangedAt":"2025-05-26T08:23:50.968Z","hidden":false},{"_id":"6822ab54a79983c00c816539","name":"Jiahao Li","hidden":false},{"_id":"6822ab54a79983c00c81653a","user":{"_id":"643b866bff50448bcfc7d1d1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/q12v-BatVKimoi8q-coi-.jpeg","isPro":false,"fullname":"Jialong Wu","user":"manchery","type":"user"},"name":"Jialong Wu","status":"claimed_verified","statusLastChangedAt":"2025-05-22T07:18:24.724Z","hidden":false},{"_id":"6822ab54a79983c00c81653b","name":"Jianhua Zhu","hidden":false},{"_id":"6822ab54a79983c00c81653c","name":"Jianpeng Jiao","hidden":false},{"_id":"6822ab54a79983c00c81653d","name":"Jiashi Feng","hidden":false},{"_id":"6822ab54a79983c00c81653e","name":"Jiaze Chen","hidden":false},{"_id":"6822ab54a79983c00c81653f","name":"Jianhui Duan","hidden":false},{"_id":"6822ab54a79983c00c816540","name":"Jihao Liu","hidden":false},{"_id":"6822ab54a79983c00c816541","name":"Jin Zeng","hidden":false},{"_id":"6822ab54a79983c00c816542","name":"Jingqun Tang","hidden":false},{"_id":"6822ab54a79983c00c816543","name":"Jingyu Sun","hidden":false},{"_id":"6822ab54a79983c00c816544","user":{"_id":"642435a1a3adbc7142c3b0a6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/642435a1a3adbc7142c3b0a6/wgLT_w9jNWRU3O0jU0646.jpeg","isPro":true,"fullname":"Joya Chen","user":"chenjoya","type":"user"},"name":"Joya Chen","status":"claimed_verified","statusLastChangedAt":"2025-05-13T08:12:04.988Z","hidden":false},{"_id":"6822ab54a79983c00c816545","name":"Jun Long","hidden":false},{"_id":"6822ab54a79983c00c816546","name":"Junda Feng","hidden":false},{"_id":"6822ab54a79983c00c816547","name":"Junfeng Zhan","hidden":false},{"_id":"6822ab54a79983c00c816548","name":"Junjie Fang","hidden":false},{"_id":"6822ab54a79983c00c816549","name":"Junting Lu","hidden":false},{"_id":"6822ab54a79983c00c81654a","name":"Kai Hua","hidden":false},{"_id":"6822ab54a79983c00c81654b","name":"Kai Liu","hidden":false},{"_id":"6822ab54a79983c00c81654c","name":"Kai Shen","hidden":false},{"_id":"6822ab54a79983c00c81654d","name":"Kaiyuan Zhang","hidden":false},{"_id":"6822ab54a79983c00c81654e","user":{"_id":"645604eebabbbbd3486dc615","avatarUrl":"/avatars/17a5ca8274e2bfc8f183a4af9878a930.svg","isPro":false,"fullname":"shenke","user":"shenke18","type":"user"},"name":"Ke Shen","status":"claimed_verified","statusLastChangedAt":"2025-05-26T08:23:53.193Z","hidden":false},{"_id":"6822ab54a79983c00c81654f","name":"Ke Wang","hidden":false},{"_id":"6822ab54a79983c00c816550","name":"Keyu Pan","hidden":false},{"_id":"6822ab54a79983c00c816551","name":"Kun Zhang","hidden":false},{"_id":"6822ab54a79983c00c816552","user":{"_id":"61fb81006374891646732f37","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1643872995181-61fb81006374891646732f37.jpeg","isPro":false,"fullname":"Kunchang Li","user":"Andy1621","type":"user"},"name":"Kunchang Li","status":"claimed_verified","statusLastChangedAt":"2025-05-13T10:27:23.151Z","hidden":false},{"_id":"6822ab54a79983c00c816553","name":"Lanxin Li","hidden":false},{"_id":"6822ab54a79983c00c816554","name":"Lei Li","hidden":false},{"_id":"6822ab54a79983c00c816555","name":"Lei Shi","hidden":false},{"_id":"6822ab54a79983c00c816556","name":"Li Han","hidden":false},{"_id":"6822ab54a79983c00c816557","name":"Liang Xiang","hidden":false},{"_id":"6822ab54a79983c00c816558","name":"Liangqiang Chen","hidden":false},{"_id":"6822ab54a79983c00c816559","user":{"_id":"64b02ec0e5000ae8a572ced5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b02ec0e5000ae8a572ced5/6ifLntBU2ICQK7SW8WxKU.png","isPro":false,"fullname":"Lin Chen","user":"Lin-Chen","type":"user"},"name":"Lin Chen","status":"claimed_verified","statusLastChangedAt":"2025-05-13T08:12:12.502Z","hidden":false},{"_id":"6822ab54a79983c00c81655a","name":"Lin Li","hidden":false},{"_id":"6822ab54a79983c00c81655b","name":"Lin Yan","hidden":false},{"_id":"6822ab54a79983c00c81655c","name":"Liying Chi","hidden":false},{"_id":"6822ab54a79983c00c81655d","name":"Longxiang Liu","hidden":false},{"_id":"6822ab54a79983c00c81655e","name":"Mengfei Du","hidden":false},{"_id":"6822ab54a79983c00c81655f","name":"Mingxuan Wang","hidden":false},{"_id":"6822ab54a79983c00c816560","name":"Ningxin Pan","hidden":false},{"_id":"6822ab54a79983c00c816561","name":"Peibin Chen","hidden":false},{"_id":"6822ab54a79983c00c816562","name":"Pengfei Chen","hidden":false},{"_id":"6822ab54a79983c00c816563","name":"Pengfei Wu","hidden":false},{"_id":"6822ab54a79983c00c816564","name":"Qingqing Yuan","hidden":false},{"_id":"6822ab54a79983c00c816565","name":"Qingyao Shuai","hidden":false},{"_id":"6822ab54a79983c00c816566","name":"Qiuyan Tao","hidden":false},{"_id":"6822ab54a79983c00c816567","name":"Renjie Zheng","hidden":false},{"_id":"6822ab54a79983c00c816568","name":"Renrui Zhang","hidden":false},{"_id":"6822ab54a79983c00c816569","name":"Ru Zhang","hidden":false},{"_id":"6822ab54a79983c00c81656a","name":"Rui Wang","hidden":false},{"_id":"6822ab54a79983c00c81656b","name":"Rui Yang","hidden":false},{"_id":"6822ab54a79983c00c81656c","name":"Rui Zhao","hidden":false},{"_id":"6822ab54a79983c00c81656d","name":"Shaoqiang Xu","hidden":false},{"_id":"6822ab54a79983c00c81656e","name":"Shihao Liang","hidden":false},{"_id":"6822ab54a79983c00c81656f","name":"Shipeng Yan","hidden":false},{"_id":"6822ab54a79983c00c816570","name":"Shu Zhong","hidden":false},{"_id":"6822ab54a79983c00c816571","name":"Shuaishuai Cao","hidden":false},{"_id":"6822ab54a79983c00c816572","name":"Shuangzhi Wu","hidden":false},{"_id":"6822ab54a79983c00c816573","name":"Shufan Liu","hidden":false},{"_id":"6822ab54a79983c00c816574","name":"Shuhan Chang","hidden":false},{"_id":"6822ab54a79983c00c816575","name":"Songhua Cai","hidden":false},{"_id":"6822ab54a79983c00c816576","name":"Tenglong Ao","hidden":false},{"_id":"6822ab54a79983c00c816577","name":"Tianhao Yang","hidden":false},{"_id":"6822ab54a79983c00c816578","name":"Tingting Zhang","hidden":false},{"_id":"6822ab54a79983c00c816579","name":"Wanjun Zhong","hidden":false},{"_id":"6822ab54a79983c00c81657a","name":"Wei Jia","hidden":false},{"_id":"6822ab54a79983c00c81657b","name":"Wei Weng","hidden":false},{"_id":"6822ab54a79983c00c81657c","user":{"_id":"5df833bdda6d0311fd3d5403","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df833bdda6d0311fd3d5403/62OtGJEQXdOuhV9yCd4HS.png","isPro":false,"fullname":"Weihao Yu","user":"whyu","type":"user"},"name":"Weihao Yu","status":"claimed_verified","statusLastChangedAt":"2025-05-21T10:05:39.022Z","hidden":false},{"_id":"6822ab54a79983c00c81657d","name":"Wenhao Huang","hidden":false},{"_id":"6822ab54a79983c00c81657e","name":"Wenjia Zhu","hidden":false},{"_id":"6822ab54a79983c00c81657f","name":"Wenli Yang","hidden":false},{"_id":"6822ab54a79983c00c816580","name":"Wenzhi Wang","hidden":false},{"_id":"6822ab54a79983c00c816581","name":"Xiang Long","hidden":false},{"_id":"6822ab54a79983c00c816582","name":"XiangRui Yin","hidden":false},{"_id":"6822ab54a79983c00c816583","name":"Xiao Li","hidden":false},{"_id":"6822ab54a79983c00c816584","name":"Xiaolei Zhu","hidden":false},{"_id":"6822ab54a79983c00c816585","name":"Xiaoying Jia","hidden":false},{"_id":"6822ab54a79983c00c816586","name":"Xijin Zhang","hidden":false},{"_id":"6822ab54a79983c00c816587","name":"Xin Liu","hidden":false},{"_id":"6822ab54a79983c00c816588","user":{"_id":"653e5d31ffd60206c8b64bb5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/653e5d31ffd60206c8b64bb5/bgztraPC27L6culMlJw4s.png","isPro":false,"fullname":"Xinchen Zhang","user":"comin","type":"user"},"name":"Xinchen Zhang","status":"claimed_verified","statusLastChangedAt":"2025-05-14T07:36:07.529Z","hidden":false},{"_id":"6822ab54a79983c00c816589","name":"Xinyu Yang","hidden":false},{"_id":"6822ab54a79983c00c81658a","name":"Xiongcai Luo","hidden":false},{"_id":"6822ab54a79983c00c81658b","name":"Xiuli Chen","hidden":false},{"_id":"6822ab54a79983c00c81658c","name":"Xuantong Zhong","hidden":false},{"_id":"6822ab54a79983c00c81658d","name":"Xuefeng Xiao","hidden":false},{"_id":"6822ab54a79983c00c81658e","name":"Xujing Li","hidden":false},{"_id":"6822ab54a79983c00c81658f","name":"Yan Wu","hidden":false},{"_id":"6822ab54a79983c00c816590","name":"Yawei Wen","hidden":false},{"_id":"6822ab54a79983c00c816591","name":"Yifan Du","hidden":false},{"_id":"6822ab54a79983c00c816592","name":"Yihao Zhang","hidden":false},{"_id":"6822ab54a79983c00c816593","name":"Yining Ye","hidden":false},{"_id":"6822ab54a79983c00c816594","name":"Yonghui Wu","hidden":false},{"_id":"6822ab54a79983c00c816595","name":"Yu Liu","hidden":false},{"_id":"6822ab54a79983c00c816596","name":"Yu Yue","hidden":false},{"_id":"6822ab54a79983c00c816597","name":"Yufeng Zhou","hidden":false},{"_id":"6822ab54a79983c00c816598","name":"Yufeng Yuan","hidden":false},{"_id":"6822ab54a79983c00c816599","name":"Yuhang Xu","hidden":false},{"_id":"6822ab54a79983c00c81659a","name":"Yuhong Yang","hidden":false},{"_id":"6822ab54a79983c00c81659b","name":"Yun Zhang","hidden":false},{"_id":"6822ab54a79983c00c81659c","name":"Yunhao Fang","hidden":false},{"_id":"6822ab54a79983c00c81659d","name":"Yuntao Li","hidden":false},{"_id":"6822ab54a79983c00c81659e","name":"Yurui Ren","hidden":false},{"_id":"6822ab54a79983c00c81659f","name":"Yuwen Xiong","hidden":false},{"_id":"6822ab54a79983c00c8165a0","user":{"_id":"66597ea2a45952e2144b526c","avatarUrl":"/avatars/7b91f85f667694a891c531de7681d75e.svg","isPro":false,"fullname":"hongzehua","user":"hongzehua","type":"user"},"name":"Zehua Hong","status":"claimed_verified","statusLastChangedAt":"2025-05-16T07:12:49.775Z","hidden":true},{"_id":"6822ab54a79983c00c8165a1","name":"Zehua Wang","hidden":false},{"_id":"6822ab54a79983c00c8165a2","name":"Zewei Sun","hidden":false},{"_id":"6822ab54a79983c00c8165a3","name":"Zeyu Wang","hidden":false},{"_id":"6822ab54a79983c00c8165a4","name":"Zhao Cai","hidden":false},{"_id":"6822ab54a79983c00c8165a5","name":"Zhaoyue Zha","hidden":false},{"_id":"6822ab54a79983c00c8165a6","name":"Zhecheng An","hidden":false},{"_id":"6822ab54a79983c00c8165a7","name":"Zhehui Zhao","hidden":false},{"_id":"6822ab54a79983c00c8165a8","name":"Zhengzhuo Xu","hidden":false},{"_id":"6822ab54a79983c00c8165a9","name":"Zhipeng Chen","hidden":false},{"_id":"6822ab54a79983c00c8165aa","name":"Zhiyong Wu","hidden":false},{"_id":"6822ab54a79983c00c8165ab","name":"Zhuofan Zheng","hidden":false},{"_id":"6822ab54a79983c00c8165ac","user":{"_id":"642e8c99c1b0f8e4e76bcaab","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/642e8c99c1b0f8e4e76bcaab/BOs9r0P9KyT9pEba9v0H4.png","isPro":false,"fullname":"Zihao Wang","user":"zhwang4ai","type":"user"},"name":"Zihao Wang","status":"claimed_verified","statusLastChangedAt":"2025-09-03T08:31:24.128Z","hidden":false},{"_id":"6822ab54a79983c00c8165ad","user":{"_id":"63a430ec84a6a25c65bff20c","avatarUrl":"/avatars/3ab51214c491f28d8fe6b70c373d72aa.svg","isPro":false,"fullname":"Zilong Huang","user":"speedinghzl","type":"user"},"name":"Zilong Huang","status":"claimed_verified","statusLastChangedAt":"2025-05-14T07:36:02.269Z","hidden":false},{"_id":"6822ab54a79983c00c8165ae","name":"Ziyu Zhu","hidden":false},{"_id":"6822ab54a79983c00c8165af","name":"Zuquan Song","hidden":false}],"publishedAt":"2025-05-11T17:28:30.000Z","submittedOnDailyAt":"2025-05-13T00:47:43.019Z","title":"Seed1.5-VL Technical Report","submittedOnDailyBy":{"_id":"646b3db131968a60a01e4cf5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646b3db131968a60a01e4cf5/DhfdqUYQaD1Qa8Svw996J.jpeg","isPro":false,"fullname":"Tianheng Cheng","user":"wondervictor","type":"user"},"summary":"We present Seed1.5-VL, a vision-language foundation model designed to advance\ngeneral-purpose multimodal understanding and reasoning. Seed1.5-VL is composed\nwith a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B\nactive parameters. Despite its relatively compact architecture, it delivers\nstrong performance across a wide spectrum of public VLM benchmarks and internal\nevaluation suites, achieving the state-of-the-art performance on 38 out of 60\npublic benchmarks. Moreover, in agent-centric tasks such as GUI control and\ngameplay, Seed1.5-VL outperforms leading multimodal systems, including OpenAI\nCUA and Claude 3.7. Beyond visual and video understanding, it also demonstrates\nstrong reasoning abilities, making it particularly effective for multimodal\nreasoning challenges such as visual puzzles. We believe these capabilities will\nempower broader applications across diverse tasks. In this report, we mainly\nprovide a comprehensive review of our experiences in building Seed1.5-VL across\nmodel design, data construction, and training at various stages, hoping that\nthis report can inspire further research. Seed1.5-VL is now accessible at\nhttps://www.volcengine.com/ (Volcano Engine Model ID:\ndoubao-1-5-thinking-vision-pro-250428)","upvotes":149,"discussionId":"6822ab59a79983c00c8166ff","projectPage":"https://seed.bytedance.com/en/tech/seed1_5_vl","githubRepo":"https://github.com/ByteDance-Seed/Seed1.5-VL","ai_summary":"Seed1.5-VL, a vision-language foundation model combining a vision encoder and a large MoE LLM, achieves state-of-the-art performance across various benchmarks and excels in multimodal reasoning tasks such as visual puzzles.","ai_keywords":["vision-language foundation model","vision encoder","Mixture-of-Experts (MoE)","LLM","multimodal understanding","multimodal reasoning","visual puzzles","GUI control","gameplay","VLM benchmarks","visual and video understanding"],"githubStars":1445},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"646b3db131968a60a01e4cf5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646b3db131968a60a01e4cf5/DhfdqUYQaD1Qa8Svw996J.jpeg","isPro":false,"fullname":"Tianheng Cheng","user":"wondervictor","type":"user"},{"_id":"649d69387bd538409c4616e7","avatarUrl":"/avatars/5486e9e54e92ed348c370ce33ec22be6.svg","isPro":false,"fullname":"V","user":"Liuv","type":"user"},{"_id":"643f37cce9d063936912048b","avatarUrl":"/avatars/25822ea5676a79b2e1ddf08d5fc2226c.svg","isPro":false,"fullname":"Yujia Qin","user":"YujiaHi","type":"user"},{"_id":"6822ac770adcf6f3e0784ff2","avatarUrl":"/avatars/652be5c8a9ad3f1badb596f8d3c15aba.svg","isPro":false,"fullname":"Xuehan Xiong","user":"xiong828","type":"user"},{"_id":"668506e3a3537a9154e98a7d","avatarUrl":"/avatars/3c8c1c1877baaa0dd4742797d9f9cea1.svg","isPro":true,"fullname":"omni-research","user":"omni-research","type":"user"},{"_id":"6453fa96ed6d7fede94408e0","avatarUrl":"/avatars/e8c9025ef24cec958c87a1008bb54fd7.svg","isPro":false,"fullname":"Keming Lu","user":"keminglu","type":"user"},{"_id":"64d592c28767727dffa1f002","avatarUrl":"/avatars/fe38bcac944a2742dc12c624e62d24ef.svg","isPro":false,"fullname":"WangKe","user":"scikkk","type":"user"},{"_id":"6305f643660f01f1509efec6","avatarUrl":"/avatars/bebea525ebb5bde43e411ef9bbe47760.svg","isPro":false,"fullname":"matrix-zxw","user":"zxw","type":"user"},{"_id":"6536187bd34e9f02b9df1c3b","avatarUrl":"/avatars/0b34d62868b93053b0a05062a018b5bd.svg","isPro":false,"fullname":"Hao Gao","user":"Hao605","type":"user"},{"_id":"65a4a180c8a09bd5e8e900b8","avatarUrl":"/avatars/c135db68f6ff2c40119acd2e9ddce968.svg","isPro":false,"fullname":"Bo Jiang","user":"rb93dett","type":"user"},{"_id":"6342796a0875f2c99cfd313b","avatarUrl":"/avatars/98575092404c4197b20c929a6499a015.svg","isPro":false,"fullname":"Yuseung \"Phillip\" Lee","user":"phillipinseoul","type":"user"},{"_id":"645f719aa3e00932c9ef2dc8","avatarUrl":"/avatars/a6436453cfccf0351ac83719051db415.svg","isPro":false,"fullname":"Doctor-James","user":"Doctor-James","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":1}">
Papers
arxiv:2505.07062

Seed1.5-VL Technical Report

Published on May 11
· Submitted by Tianheng Cheng on May 13
#1 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Seed1.5-VL, a vision-language foundation model combining a vision encoder and a large MoE LLM, achieves state-of-the-art performance across various benchmarks and excels in multimodal reasoning tasks such as visual puzzles.

AI-generated summary

We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluation suites, achieving the state-of-the-art performance on 38 out of 60 public benchmarks. Moreover, in agent-centric tasks such as GUI control and gameplay, Seed1.5-VL outperforms leading multimodal systems, including OpenAI CUA and Claude 3.7. Beyond visual and video understanding, it also demonstrates strong reasoning abilities, making it particularly effective for multimodal reasoning challenges such as visual puzzles. We believe these capabilities will empower broader applications across diverse tasks. In this report, we mainly provide a comprehensive review of our experiences in building Seed1.5-VL across model design, data construction, and training at various stages, hoping that this report can inspire further research. Seed1.5-VL is now accessible at https://www.volcengine.com/ (Volcano Engine Model ID: doubao-1-5-thinking-vision-pro-250428)

Community

Paper author Paper submitter

Seed1.5-VL, a powerful and efficient vision-language foundation model designed for advanced general-purpose multimodal understanding and reasoning, achieves state-of-the-art performance on 38 out of 60 public benchmarks.

GitHub: https://github.com/ByteDance-Seed/Seed1.5-VL
API: https://www.volcengine.com/product/doubao

·

Will this model be open source in the future?

an audio overview for learning on the go: https://youtu.be/h-l7jqKs-Xg

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.07062 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.07062 in a dataset README.md to link it from this page.

Spaces citing this paper 3

Collections including this paper 15

Лучший частный хостинг