lynx   »   [go: up one dir, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-06-12T01:40:40.131Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7252591252326965},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false},"replies":[{"id":"684cda0749a865af1b3440e0","author":{"_id":"682d317bea67e90811a6442b","avatarUrl":"/avatars/f5624453cbd8a40dc819bed8e908403f.svg","fullname":"Monique Keys","name":"Moniquekeys95","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false},"createdAt":"2025-06-14T02:10:15.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"https://aws.amazon.com/about-aws/whats-new/2025/02/anthropics-claude-3-7-sonnet-amazon-bedrock/ @librarian-bot recommend","html":"

https://aws.amazon.com/about-aws/whats-new/2025/02/anthropics-claude-3-7-sonnet-amazon-bedrock/ \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-06-14T02:14:59.311Z","author":{"_id":"682d317bea67e90811a6442b","avatarUrl":"/avatars/f5624453cbd8a40dc819bed8e908403f.svg","fullname":"Monique Keys","name":"Moniquekeys95","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":3,"identifiedLanguage":{"language":"en","probability":0.9056209921836853},"editors":["Moniquekeys95"],"editorAvatarUrls":["/avatars/f5624453cbd8a40dc819bed8e908403f.svg"],"reactions":[],"isReport":false,"parentCommentId":"684a30184d081c369bde5607"}},{"id":"684cda9877db10fa88b936c0","author":{"_id":"682d317bea67e90811a6442b","avatarUrl":"/avatars/f5624453cbd8a40dc819bed8e908403f.svg","fullname":"Monique Keys","name":"Moniquekeys95","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false},"createdAt":"2025-06-14T02:12:40.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"@librarian-bot recommend","html":"

\n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-06-14T02:14:40.816Z","author":{"_id":"682d317bea67e90811a6442b","avatarUrl":"/avatars/f5624453cbd8a40dc819bed8e908403f.svg","fullname":"Monique Keys","name":"Moniquekeys95","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.7558995485305786},"editors":["Moniquekeys95"],"editorAvatarUrls":["/avatars/f5624453cbd8a40dc819bed8e908403f.svg"],"reactions":[],"isReport":false,"parentCommentId":"684a30184d081c369bde5607"}}]},{"id":"68510ee198d16171eb6b5bc6","author":{"_id":"66c8264759227bf53ddced74","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66c8264759227bf53ddced74/tJXlxUAfl7kdCkJhcM513.jpeg","fullname":"ryan-u","name":"ryan-u","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4},"createdAt":"2025-06-17T06:44:49.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Dear authors,\n\nThank you for your significant contribution in sharing this excellent model and your valuable insights. I have a few questions and a small comment regarding the technical report.\n\n1. I noticed that the report provides a comprehensive description of the 8B model, but details regarding the 0.5B model's training configuration are largely omitted. It would be very helpful if you could share some information on its training setup.\n\n2. I would also like to point out a potential point of confusion in section 5.1, \"Pre-training Pipeline.\" The text initially introduces a \"four-stage pipeline to pre-train MiniCPM4.\" However, the same paragraph later says, \"Following three pre-training stages, we conduct supervised fine-tuning and reinforcement learning...\" This suggests that the pre-training itself consists of three stages, followed by SFT/RL. For clarity, perhaps referring to it as a \"three-stage pipeline to pre-train\" would be more precise.\n\nThank you once again for your dedication and for sharing your work with the community.","html":"

Dear authors,

\n

Thank you for your significant contribution in sharing this excellent model and your valuable insights. I have a few questions and a small comment regarding the technical report.

\n
    \n
  1. I noticed that the report provides a comprehensive description of the 8B model, but details regarding the 0.5B model's training configuration are largely omitted. It would be very helpful if you could share some information on its training setup.

    \n
  2. \n
  3. I would also like to point out a potential point of confusion in section 5.1, \"Pre-training Pipeline.\" The text initially introduces a \"four-stage pipeline to pre-train MiniCPM4.\" However, the same paragraph later says, \"Following three pre-training stages, we conduct supervised fine-tuning and reinforcement learning...\" This suggests that the pre-training itself consists of three stages, followed by SFT/RL. For clarity, perhaps referring to it as a \"three-stage pipeline to pre-train\" would be more precise.

    \n
  4. \n
\n

Thank you once again for your dedication and for sharing your work with the community.

\n","updatedAt":"2025-06-17T06:44:49.431Z","author":{"_id":"66c8264759227bf53ddced74","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66c8264759227bf53ddced74/tJXlxUAfl7kdCkJhcM513.jpeg","fullname":"ryan-u","name":"ryan-u","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9622312188148499},"editors":["ryan-u"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/66c8264759227bf53ddced74/tJXlxUAfl7kdCkJhcM513.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2506.07900","authors":[{"_id":"6847924d3ec10bdd8ab4ddb9","name":"MiniCPM Team","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddba","user":{"_id":"608f6d72283d0a8d7be9d1f9","avatarUrl":"/avatars/7f499a37019359a3c488ba6cc11751fc.svg","isPro":false,"fullname":"Chaojun XIAO","user":"xcjthu","type":"user"},"name":"Chaojun Xiao","status":"claimed_verified","statusLastChangedAt":"2025-06-10T11:00:03.612Z","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddbb","name":"Yuxuan Li","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddbc","name":"Xu Han","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddbd","name":"Yuzhuo Bai","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddbe","name":"Jie Cai","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddbf","name":"Haotian Chen","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddc0","name":"Wentong Chen","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddc1","name":"Xin Cong","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddc2","name":"Ganqu Cui","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddc3","user":{"_id":"60cf4bcb1ce3775ebb86e5d5","avatarUrl":"/avatars/12bcd18d215abf91f297f93007733148.svg","isPro":false,"fullname":"Ning Ding","user":"stingning","type":"user"},"name":"Ning Ding","status":"claimed_verified","statusLastChangedAt":"2025-09-13T15:05:01.370Z","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddc4","name":"Shengdan Fan","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddc5","name":"Yewei Fang","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddc6","name":"Zixuan Fu","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddc7","name":"Wenyu Guan","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddc8","name":"Yitong Guan","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddc9","user":{"_id":"66add9c413ac672510f6cdba","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66add9c413ac672510f6cdba/ubTWt8XN7aDdSihYAORah.png","isPro":false,"fullname":"guojunshao","user":"guojunshaoyao","type":"user"},"name":"Junshao Guo","status":"claimed_verified","statusLastChangedAt":"2025-06-10T11:00:01.410Z","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddca","name":"Yufeng Han","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddcb","name":"Bingxiang He","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddcc","name":"Yuxiang Huang","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddcd","name":"Cunliang Kong","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddce","name":"Qiuzuo Li","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddcf","name":"Siyuan Li","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddd0","name":"Wenhao Li","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddd1","name":"Yanghao Li","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddd2","name":"Yishan Li","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddd3","name":"Zhen Li","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddd4","name":"Dan Liu","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddd5","name":"Biyuan Lin","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddd6","name":"Yankai Lin","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddd7","name":"Xiang Long","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddd8","name":"Quanyu Lu","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddd9","name":"Yaxi Lu","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddda","name":"Peiyan Luo","hidden":false},{"_id":"6847924d3ec10bdd8ab4dddb","name":"Hongya Lyu","hidden":false},{"_id":"6847924d3ec10bdd8ab4dddc","user":{"_id":"622f2feea32d46b4be9ed8c4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/NDeZQZQK5U-9m10yQwDVf.png","isPro":false,"fullname":"Litu Ou","user":"learn3r","type":"user"},"name":"Litu Ou","status":"claimed_verified","statusLastChangedAt":"2025-07-07T11:25:55.969Z","hidden":false},{"_id":"6847924d3ec10bdd8ab4dddd","name":"Yinxu Pan","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddde","name":"Zekai Qu","hidden":false},{"_id":"6847924d3ec10bdd8ab4dddf","name":"Qundong Shi","hidden":false},{"_id":"6847924d3ec10bdd8ab4dde0","name":"Zijun Song","hidden":false},{"_id":"6847924d3ec10bdd8ab4dde1","name":"Jiayuan Su","hidden":false},{"_id":"6847924d3ec10bdd8ab4dde2","name":"Zhou Su","hidden":false},{"_id":"6847924d3ec10bdd8ab4dde3","name":"Ao Sun","hidden":false},{"_id":"6847924d3ec10bdd8ab4dde4","name":"Xianghui Sun","hidden":false},{"_id":"6847924d3ec10bdd8ab4dde5","user":{"_id":"642101b85bccaa424844df6e","avatarUrl":"/avatars/6015bbd6ff98fd5d61d690e0f8d60564.svg","isPro":false,"fullname":"tangpeijun","user":"lumosity","type":"user"},"name":"Peijun Tang","status":"claimed_verified","statusLastChangedAt":"2025-06-11T08:36:11.840Z","hidden":false},{"_id":"6847924d3ec10bdd8ab4dde6","name":"Fangzheng Wang","hidden":false},{"_id":"6847924d3ec10bdd8ab4dde7","name":"Feng Wang","hidden":false},{"_id":"6847924d3ec10bdd8ab4dde8","name":"Shuo Wang","hidden":false},{"_id":"6847924d3ec10bdd8ab4dde9","user":{"_id":"63be286fb3b8c44f8cecc16f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63be286fb3b8c44f8cecc16f/1CIkfEKoTnBYdYDSuQ8AT.jpeg","isPro":false,"fullname":"Yudong Wang","user":"BigDong","type":"user"},"name":"Yudong Wang","status":"claimed_verified","statusLastChangedAt":"2025-06-10T08:45:33.890Z","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddea","name":"Yesai Wu","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddeb","name":"Zhenyu Xiao","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddec","name":"Jie Xie","hidden":false},{"_id":"6847924d3ec10bdd8ab4dded","name":"Zihao Xie","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddee","name":"Yukun Yan","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddef","name":"Jiarui Yuan","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddf0","name":"Kaihuo Zhang","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddf1","name":"Lei Zhang","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddf2","name":"Linyue Zhang","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddf3","name":"Xueren Zhang","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddf4","name":"Yudi Zhang","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddf5","name":"Hengyu Zhao","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddf6","name":"Weilin Zhao","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddf7","name":"Weilun Zhao","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddf8","name":"Yuanqian Zhao","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddf9","name":"Zhi Zheng","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddfa","name":"Ge Zhou","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddfb","name":"Jie Zhou","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddfc","name":"Wei Zhou","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddfd","name":"Zihan Zhou","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddfe","name":"Zixuan Zhou","hidden":false},{"_id":"6847924d3ec10bdd8ab4ddff","name":"Zhiyuan Liu","hidden":false},{"_id":"6847924d3ec10bdd8ab4de00","name":"Guoyang Zeng","hidden":false},{"_id":"6847924d3ec10bdd8ab4de01","name":"Chao Jia","hidden":false},{"_id":"6847924d3ec10bdd8ab4de02","name":"Dahai Li","hidden":false},{"_id":"6847924d3ec10bdd8ab4de03","name":"Maosong Sun","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/608f6d72283d0a8d7be9d1f9/NF1aHsqQbJ_Dl18__cn2H.qt"],"publishedAt":"2025-06-09T16:16:50.000Z","submittedOnDailyAt":"2025-06-10T00:50:56.021Z","title":"MiniCPM4: Ultra-Efficient LLMs on End Devices","submittedOnDailyBy":{"_id":"608f6d72283d0a8d7be9d1f9","avatarUrl":"/avatars/7f499a37019359a3c488ba6cc11751fc.svg","isPro":false,"fullname":"Chaojun XIAO","user":"xcjthu","type":"user"},"summary":"This paper introduces MiniCPM4, a highly efficient large language model (LLM)\ndesigned explicitly for end-side devices. We achieve this efficiency through\nsystematic innovation in four key dimensions: model architecture, training\ndata, training algorithms, and inference systems. Specifically, in terms of\nmodel architecture, we propose InfLLM v2, a trainable sparse attention\nmechanism that accelerates both prefilling and decoding phases for long-context\nprocessing. Regarding training data, we propose UltraClean, an efficient and\naccurate pre-training data filtering and generation strategy, and UltraChat v2,\na comprehensive supervised fine-tuning dataset. These datasets enable\nsatisfactory model performance to be achieved using just 8 trillion training\ntokens. Regarding training algorithms, we propose ModelTunnel v2 for efficient\npre-training strategy search, and improve existing post-training methods by\nintroducing chunk-wise rollout for load-balanced reinforcement learning and\ndata-efficient tenary LLM, BitCPM. Regarding inference systems, we propose\nCPM.cu that integrates sparse attention, model quantization, and speculative\nsampling to achieve efficient prefilling and decoding. To meet diverse\non-device requirements, MiniCPM4 is available in two versions, with 0.5B and 8B\nparameters, respectively. Sufficient evaluation results show that MiniCPM4\noutperforms open-source models of similar size across multiple benchmarks,\nhighlighting both its efficiency and effectiveness. Notably, MiniCPM4-8B\ndemonstrates significant speed improvements over Qwen3-8B when processing long\nsequences. Through further adaptation, MiniCPM4 successfully powers diverse\napplications, including trustworthy survey generation and tool use with model\ncontext protocol, clearly showcasing its broad usability.","upvotes":91,"discussionId":"6847924e3ec10bdd8ab4de04","projectPage":"https://huggingface.co/collections/openbmb/minicpm4-6841ab29d180257e940baa9b","githubRepo":"https://github.com/openbmb/minicpm","ai_summary":"MiniCPM4, a highly efficient large language model for end-side devices, achieves superior performance using innovations in sparse attention, pre-training datasets, training algorithms, and inference systems.","ai_keywords":["InfLLM v2","sparse attention mechanism","UltraClean","UltraChat v2","prefilling","decoding","long-context processing","ModelTunnel v2","chunk-wise rollout","data-efficient tenary LLM","BitCPM","CPM.cu","model quantization","speculative sampling"],"githubStars":8374},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"608f6d72283d0a8d7be9d1f9","avatarUrl":"/avatars/7f499a37019359a3c488ba6cc11751fc.svg","isPro":false,"fullname":"Chaojun XIAO","user":"xcjthu","type":"user"},{"_id":"64c5e944979493279b700cb2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/vjFuPWw8Vl7b7gXB19Sk-.jpeg","isPro":false,"fullname":"Bingxiang He","user":"hbx","type":"user"},{"_id":"64e46c2742028a0616c2bba9","avatarUrl":"/avatars/f4707f5f21a451d465cd218727a5d102.svg","isPro":false,"fullname":" zhou su","user":"suhmily","type":"user"},{"_id":"66e52bfe0226c0369a613ac4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66e52bfe0226c0369a613ac4/sgNVKczj8_5AyH4tJufMI.jpeg","isPro":false,"fullname":"Xin Li","user":"georgethrax","type":"user"},{"_id":"63be286fb3b8c44f8cecc16f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63be286fb3b8c44f8cecc16f/1CIkfEKoTnBYdYDSuQ8AT.jpeg","isPro":false,"fullname":"Yudong Wang","user":"BigDong","type":"user"},{"_id":"6323f399462470712720c155","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6323f399462470712720c155/SWsMNa7vETUSrOt9Qf-oe.png","isPro":false,"fullname":"Yinxu Pan","user":"cppowboy","type":"user"},{"_id":"64b8ff3d95bd42c770878042","avatarUrl":"/avatars/564a4dccdf9e5b813a99979b0ef58183.svg","isPro":false,"fullname":"Weilin Zhao","user":"Achazwl","type":"user"},{"_id":"62d0d2f1c1d35087f508fbea","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676873251372-62d0d2f1c1d35087f508fbea.jpeg","isPro":false,"fullname":"Zhi Zheng","user":"neoz","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6434b6619bd5a84b5dcfa4de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6434b6619bd5a84b5dcfa4de/h8Q6kPNjFNc03wmdboHzq.jpeg","isPro":false,"fullname":"Young-Jun Lee","user":"passing2961","type":"user"},{"_id":"616812f1081d4f73c3ae4a41","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1634210536431-noauth.jpeg","isPro":false,"fullname":"jiachao","user":"jctime","type":"user"},{"_id":"6472f1f7485a7c8e1bcf507b","avatarUrl":"/avatars/79dca137583875191ea883b193b630b7.svg","isPro":false,"fullname":"Zhiyuan Liu","user":"zibuyu9","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":3}">
Papers
arxiv:2506.07900

MiniCPM4: Ultra-Efficient LLMs on End Devices

Published on Jun 9
· Submitted by Chaojun XIAO on Jun 10
#3 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

MiniCPM4, a highly efficient large language model for end-side devices, achieves superior performance using innovations in sparse attention, pre-training datasets, training algorithms, and inference systems.

AI-generated summary

This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelerates both prefilling and decoding phases for long-context processing. Regarding training data, we propose UltraClean, an efficient and accurate pre-training data filtering and generation strategy, and UltraChat v2, a comprehensive supervised fine-tuning dataset. These datasets enable satisfactory model performance to be achieved using just 8 trillion training tokens. Regarding training algorithms, we propose ModelTunnel v2 for efficient pre-training strategy search, and improve existing post-training methods by introducing chunk-wise rollout for load-balanced reinforcement learning and data-efficient tenary LLM, BitCPM. Regarding inference systems, we propose CPM.cu that integrates sparse attention, model quantization, and speculative sampling to achieve efficient prefilling and decoding. To meet diverse on-device requirements, MiniCPM4 is available in two versions, with 0.5B and 8B parameters, respectively. Sufficient evaluation results show that MiniCPM4 outperforms open-source models of similar size across multiple benchmarks, highlighting both its efficiency and effectiveness. Notably, MiniCPM4-8B demonstrates significant speed improvements over Qwen3-8B when processing long sequences. Through further adaptation, MiniCPM4 successfully powers diverse applications, including trustworthy survey generation and tool use with model context protocol, clearly showcasing its broad usability.

Community

Paper author Paper submitter

This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelerates both prefilling and decoding phases for long-context processing. Regarding training data, we propose UltraClean, an efficient and accurate pre-training data filtering and generation strategy, and UltraChat v2, a comprehensive supervised fine-tuning dataset. These datasets enable satisfactory model performance to be achieved using just 8 trillion training tokens. Regarding training algorithms, we propose ModelTunnel v2 for efficient pre-training strategy search, and improve existing post-training methods by introducing chunk-wise rollout for load-balanced reinforcement learning and data-efficient tenary LLM, BitCPM. Regarding inference systems, we propose CPM.cu that integrates sparse attention, model quantization, and speculative sampling to achieve efficient prefilling and decoding. To meet diverse on-device requirements, MiniCPM4 is available in two versions, with 0.5B and 8B parameters, respectively. Sufficient evaluation results show that MiniCPM4 outperforms open-source models of similar size across multiple benchmarks, highlighting both its efficiency and effectiveness. Notably, MiniCPM4-8B demonstrates significant speed improvements over Qwen3-8B when processing long sequences. Through further adaptation, MiniCPM4 successfully powers diverse applications, including trustworthy survey generation and tool use with model context protocol, clearly showcasing its broad usability.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

·

Dear authors,

Thank you for your significant contribution in sharing this excellent model and your valuable insights. I have a few questions and a small comment regarding the technical report.

  1. I noticed that the report provides a comprehensive description of the 8B model, but details regarding the 0.5B model's training configuration are largely omitted. It would be very helpful if you could share some information on its training setup.

  2. I would also like to point out a potential point of confusion in section 5.1, "Pre-training Pipeline." The text initially introduces a "four-stage pipeline to pre-train MiniCPM4." However, the same paragraph later says, "Following three pre-training stages, we conduct supervised fine-tuning and reinforcement learning..." This suggests that the pre-training itself consists of three stages, followed by SFT/RL. For clarity, perhaps referring to it as a "three-stage pipeline to pre-train" would be more precise.

Thank you once again for your dedication and for sharing your work with the community.

Sign up or log in to comment

Models citing this paper 13

Browse 13 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.07900 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 14

Лучший частный хостинг