lynx   »   [go: up one dir, main page]

\"Screenshot

\n","updatedAt":"2025-03-19T02:26:39.787Z","author":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","fullname":"AK","name":"akhaliq","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":8213}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.3784181475639343},"editors":["akhaliq"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg"],"reactions":[],"isReport":false}},{"id":"67db70b8a43da5a5f76af5c3","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264},"createdAt":"2025-03-20T01:34:48.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling](https://huggingface.co/papers/2501.11651) (2025)\n* [MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning](https://huggingface.co/papers/2503.07365) (2025)\n* [Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning](https://huggingface.co/papers/2503.09516) (2025)\n* [Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models](https://huggingface.co/papers/2503.06749) (2025)\n* [Process Reinforcement through Implicit Rewards](https://huggingface.co/papers/2502.01456) (2025)\n* [R1-Zero's\"Aha Moment\"in Visual Reasoning on a 2B Non-SFT Model](https://huggingface.co/papers/2503.05132) (2025)\n* [Kimi k1.5: Scaling Reinforcement Learning with LLMs](https://huggingface.co/papers/2501.12599) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-03-20T01:34:48.564Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7415807247161865},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"67e284d321a4949c34c5f99e","author":{"_id":"665edfcf2b842ec980842bd4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/665edfcf2b842ec980842bd4/GJHNPJ3ULIMEMq6VGxZaI.png","fullname":"AI Papers Academy","name":"aipapersacademy","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3},"createdAt":"2025-03-25T10:26:27.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"A video and written explanation - https://aipapersacademy.com/dapo/","html":"

A video and written explanation - https://aipapersacademy.com/dapo/

\n","updatedAt":"2025-03-25T10:26:27.887Z","author":{"_id":"665edfcf2b842ec980842bd4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/665edfcf2b842ec980842bd4/GJHNPJ3ULIMEMq6VGxZaI.png","fullname":"AI Papers Academy","name":"aipapersacademy","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8754496574401855},"editors":["aipapersacademy"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/665edfcf2b842ec980842bd4/GJHNPJ3ULIMEMq6VGxZaI.png"],"reactions":[],"isReport":false}},{"id":"67eb43e1d843f3129c6f4068","author":{"_id":"67a4235df09fe7a0c5fd5087","avatarUrl":"/avatars/9060ab4c485827c839bdd022d35bcb76.svg","fullname":"Bayron Vladimir cortes realpe","name":"bVladimir","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false},"createdAt":"2025-04-01T01:39:45.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2025-04-01T01:42:19.470Z","author":{"_id":"67a4235df09fe7a0c5fd5087","avatarUrl":"/avatars/9060ab4c485827c839bdd022d35bcb76.svg","fullname":"Bayron Vladimir cortes realpe","name":"bVladimir","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[]}},{"id":"67eb4414abe646bb593dca74","author":{"_id":"67a4235df09fe7a0c5fd5087","avatarUrl":"/avatars/9060ab4c485827c839bdd022d35bcb76.svg","fullname":"Bayron Vladimir cortes realpe","name":"bVladimir","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false},"createdAt":"2025-04-01T01:40:36.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","hiddenReason":"Off-Topic","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2025-04-01T01:42:37.674Z","author":{"_id":"67a4235df09fe7a0c5fd5087","avatarUrl":"/avatars/9060ab4c485827c839bdd022d35bcb76.svg","fullname":"Bayron Vladimir cortes realpe","name":"bVladimir","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[]}}],"primaryEmailConfirmed":false,"paper":{"id":"2503.14476","authors":[{"_id":"67da2b54e5335651349e262c","user":{"_id":"6341375be5071b0c5cc946a3","avatarUrl":"/avatars/c26a32b54a3f7731854059d247b5c523.svg","isPro":false,"fullname":"Qiying Yu","user":"qiying","type":"user"},"name":"Qiying Yu","status":"admin_assigned","statusLastChangedAt":"2025-03-19T13:19:18.079Z","hidden":false},{"_id":"67da2b54e5335651349e262d","user":{"_id":"648e9244e6598457da62a180","avatarUrl":"/avatars/60329251f08e93260196ff67a842c188.svg","isPro":false,"fullname":"Zheng Zhang","user":"zhengzhang","type":"user"},"name":"Zheng Zhang","status":"admin_assigned","statusLastChangedAt":"2025-03-19T13:19:31.824Z","hidden":false},{"_id":"67da2b54e5335651349e262e","name":"Ruofei Zhu","hidden":false},{"_id":"67da2b54e5335651349e262f","user":{"_id":"64b755e500bac1088cea02f1","avatarUrl":"/avatars/f4adf70175e84e945436d1f091ba338b.svg","isPro":false,"fullname":"Yufeng Yuan","user":"yufeng10","type":"user"},"name":"Yufeng Yuan","status":"admin_assigned","statusLastChangedAt":"2025-03-19T13:19:54.321Z","hidden":false},{"_id":"67da2b54e5335651349e2630","name":"Xiaochen Zuo","hidden":false},{"_id":"67da2b54e5335651349e2631","name":"Yu Yue","hidden":false},{"_id":"67da2b54e5335651349e2632","name":"Tiantian Fan","hidden":false},{"_id":"67da2b54e5335651349e2633","user":{"_id":"61512f1b3e89795099b6fced","avatarUrl":"/avatars/8f19253df7343e9d95ad7d0b3d429c06.svg","isPro":false,"fullname":"Gaohong Liu","user":"Cheimu","type":"user"},"name":"Gaohong Liu","status":"admin_assigned","statusLastChangedAt":"2025-03-19T13:20:22.957Z","hidden":false},{"_id":"67da2b54e5335651349e2634","user":{"_id":"6626274011772517e5469d48","avatarUrl":"/avatars/5f2606614740da04a7fb9d80fa628fb1.svg","isPro":false,"fullname":"Lingjun Liu","user":"EEchollj","type":"user"},"name":"Lingjun Liu","status":"admin_assigned","statusLastChangedAt":"2025-03-19T13:20:29.992Z","hidden":false},{"_id":"67da2b54e5335651349e2635","name":"Xin Liu","hidden":false},{"_id":"67da2b54e5335651349e2636","user":{"_id":"65e809f67c1853c8c180baff","avatarUrl":"/avatars/e489eaebbbf86f8826fe0e9d5423be01.svg","isPro":false,"fullname":"haibin","user":"haibinlin","type":"user"},"name":"Haibin Lin","status":"admin_assigned","statusLastChangedAt":"2025-03-19T13:20:36.744Z","hidden":false},{"_id":"67da2b54e5335651349e2637","name":"Zhiqi Lin","hidden":false},{"_id":"67da2b54e5335651349e2638","name":"Bole Ma","hidden":false},{"_id":"67da2b54e5335651349e2639","name":"Guangming Sheng","hidden":false},{"_id":"67da2b54e5335651349e263a","user":{"_id":"6448e1fbe988635a3d6aa97d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/eG4R9-3hgrimttP7ep3dN.jpeg","isPro":false,"fullname":"Shawn/Yuxuan Tong","user":"tongyx361","type":"user"},"name":"Yuxuan Tong","status":"admin_assigned","statusLastChangedAt":"2025-03-19T13:21:29.044Z","hidden":false},{"_id":"67da2b54e5335651349e263b","name":"Chi Zhang","hidden":false},{"_id":"67da2b54e5335651349e263c","user":{"_id":"64f33235bbf2fd1b4816632b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/sVN-M1Hj-aB4QqytdpUsn.jpeg","isPro":false,"fullname":"Mofan Zhang","user":"NewMomo","type":"user"},"name":"Mofan Zhang","status":"admin_assigned","statusLastChangedAt":"2025-03-19T13:21:51.391Z","hidden":false},{"_id":"67da2b54e5335651349e263d","name":"Wang Zhang","hidden":false},{"_id":"67da2b54e5335651349e263e","name":"Hang Zhu","hidden":false},{"_id":"67da2b54e5335651349e263f","user":{"_id":"650b0009bddcb62e458ebadf","avatarUrl":"/avatars/0431a38b9c9ca0e11ec505ce81183be8.svg","isPro":false,"fullname":"Jinhua Zhu","user":"teslazhu","type":"user"},"name":"Jinhua Zhu","status":"admin_assigned","statusLastChangedAt":"2025-03-19T13:22:23.684Z","hidden":false},{"_id":"67da2b54e5335651349e2640","name":"Jiaze Chen","hidden":false},{"_id":"67da2b54e5335651349e2641","user":{"_id":"606ed1884ffe81d1e03e81e5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1639375346654-606ed1884ffe81d1e03e81e5.png","isPro":false,"fullname":"Jiangjie Chen","user":"jiangjiechen","type":"user"},"name":"Jiangjie Chen","status":"claimed_verified","statusLastChangedAt":"2025-05-27T07:56:06.181Z","hidden":false},{"_id":"67da2b54e5335651349e2642","name":"Chengyi Wang","hidden":false},{"_id":"67da2b54e5335651349e2643","user":{"_id":"66c58a7e960ea4e66c632075","avatarUrl":"/avatars/527ad90f605e5aaf76298204dfa67d55.svg","isPro":false,"fullname":"hlyu","user":"huiyeruzhou","type":"user"},"name":"Hongli Yu","status":"claimed_verified","statusLastChangedAt":"2025-07-08T08:07:59.640Z","hidden":false},{"_id":"67da2b54e5335651349e2644","name":"Weinan Dai","hidden":false},{"_id":"67da2b54e5335651349e2645","user":{"_id":"66275b53d138af2d2eeb9326","avatarUrl":"/avatars/1103f68b978238d5bce604294d467e00.svg","isPro":false,"fullname":"Yuxuan Song","user":"yxsong","type":"user"},"name":"Yuxuan Song","status":"claimed_verified","statusLastChangedAt":"2025-08-06T19:38:24.674Z","hidden":false},{"_id":"67da2b54e5335651349e2646","name":"Xiangpeng Wei","hidden":false},{"_id":"67da2b54e5335651349e2647","name":"Hao Zhou","hidden":false},{"_id":"67da2b54e5335651349e2648","name":"Jingjing Liu","hidden":false},{"_id":"67da2b54e5335651349e2649","name":"Wei-Ying Ma","hidden":false},{"_id":"67da2b54e5335651349e264a","user":{"_id":"656c1600665d15428a8bdc60","avatarUrl":"/avatars/dfdf5f8712c769896b4ab655c4ce32e4.svg","isPro":false,"fullname":"Ya-Qin Zhang","user":"Ya-Qin","type":"user"},"name":"Ya-Qin Zhang","status":"admin_assigned","statusLastChangedAt":"2025-03-19T13:22:33.159Z","hidden":false},{"_id":"67da2b54e5335651349e264b","name":"Lin Yan","hidden":false},{"_id":"67da2b54e5335651349e264c","name":"Mu Qiao","hidden":false},{"_id":"67da2b54e5335651349e264d","user":{"_id":"647a13ba5f3450e1ded4ca85","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/647a13ba5f3450e1ded4ca85/0ryiqG2gpFlffOFNq3jpj.jpeg","isPro":false,"fullname":"Yonghui Wu","user":"yonghuiwu","type":"user"},"name":"Yonghui Wu","status":"admin_assigned","statusLastChangedAt":"2025-03-19T13:22:16.717Z","hidden":false},{"_id":"67da2b54e5335651349e264e","name":"Mingxuan Wang","hidden":false}],"publishedAt":"2025-03-18T17:49:06.000Z","submittedOnDailyAt":"2025-03-19T00:56:39.773Z","title":"DAPO: An Open-Source LLM Reinforcement Learning System at Scale","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Inference scaling empowers LLMs with unprecedented reasoning ability, with\nreinforcement learning as the core technique to elicit complex reasoning.\nHowever, key technical details of state-of-the-art reasoning LLMs are concealed\n(such as in OpenAI o1 blog and DeepSeek R1 technical report), thus the\ncommunity still struggles to reproduce their RL training results. We propose\nthe Decoupled Clip and Dynamic sAmpling\nPolicy Optimization (DAPO) algorithm, and\nfully open-source a state-of-the-art large-scale RL system that achieves 50\npoints on AIME 2024 using Qwen2.5-32B base model. Unlike previous works that\nwithhold training details, we introduce four key techniques of our algorithm\nthat make large-scale LLM RL a success. In addition, we open-source our\ntraining code, which is built on the verl framework, along with a carefully\ncurated and processed dataset. These components of our open-source system\nenhance reproducibility and support future research in large-scale LLM RL.","upvotes":140,"discussionId":"67da2b55e5335651349e26c7","ai_summary":"The DAPO algorithm, which includes decoupled clip and dynamic sampling policy optimization, enables open-source, high-performance reinforcement learning training for large-scale language models, enhancing reproducibility in the field.","ai_keywords":["LLMs","reinforcement learning","decoupled clip","dynamic sampling policy optimization","DAPO","large-scale RL","open-source","AIME 2024","Qwen2.5-32B","verl framework"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63517973338fbe52443b46a4","avatarUrl":"/avatars/939822acec980d2b7e752bd492acc0db.svg","isPro":false,"fullname":"ZHOU","user":"TOBI-X","type":"user"},{"_id":"637c7503fe115289cfecbe6b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676361945047-637c7503fe115289cfecbe6b.jpeg","isPro":false,"fullname":"Wenhao Chai","user":"wchai","type":"user"},{"_id":"63c3b67ec7d7f4c63a4eea3a","avatarUrl":"/avatars/4a5f98cb6b0c1e37a2c09af72f7a9946.svg","isPro":false,"fullname":"Xinyu Ma","user":"MaxyLee","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"63913b120cf6b11c487ca31d","avatarUrl":"/avatars/aec44edd5470dd6e767e0a25efd6fb5d.svg","isPro":true,"fullname":"Xin Li","user":"lixin4ever","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"646def60df618b303b419323","avatarUrl":"/avatars/97aa761d5255abf230304cfeade87835.svg","isPro":false,"fullname":"Lei Wang","user":"demolei","type":"user"},{"_id":"60078446e55258e41786a959","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60078446e55258e41786a959/UGPCE4YqG9BVMSf0YauxL.png","isPro":true,"fullname":"Motoki Wu","user":"tokestermw","type":"user"},{"_id":"65c20ee58aedd6edd2b89000","avatarUrl":"/avatars/db1bbf4c8f6a88459da967ec83e9bc08.svg","isPro":false,"fullname":"Chmielewski","user":"Eryk-Chmielewski","type":"user"},{"_id":"6323f399462470712720c155","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6323f399462470712720c155/SWsMNa7vETUSrOt9Qf-oe.png","isPro":false,"fullname":"Yinxu Pan","user":"cppowboy","type":"user"},{"_id":"67cfb1aacab06bf62b250940","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/BKGqFF7Vg5iIusujryJr2.png","isPro":false,"fullname":"Lohan Cheung","user":"LohanCheung","type":"user"},{"_id":"66446d2fe7ca43b97c6f41fe","avatarUrl":"/avatars/b9936fd8bce78160bfa362e26594001c.svg","isPro":false,"fullname":"Boyuan Chen","user":"BoyuanChen","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":2}">
Papers
arxiv:2503.14476

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Published on Mar 18
· Submitted by AK on Mar 19
#2 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

The DAPO algorithm, which includes decoupled clip and dynamic sampling policy optimization, enables open-source, high-performance reinforcement learning training for large-scale language models, enhancing reproducibility in the field.

AI-generated summary

Inference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical details of state-of-the-art reasoning LLMs are concealed (such as in OpenAI o1 blog and DeepSeek R1 technical report), thus the community still struggles to reproduce their RL training results. We propose the Decoupled Clip and Dynamic sAmpling Policy Optimization (DAPO) algorithm, and fully open-source a state-of-the-art large-scale RL system that achieves 50 points on AIME 2024 using Qwen2.5-32B base model. Unlike previous works that withhold training details, we introduce four key techniques of our algorithm that make large-scale LLM RL a success. In addition, we open-source our training code, which is built on the verl framework, along with a carefully curated and processed dataset. These components of our open-source system enhance reproducibility and support future research in large-scale LLM RL.

Community

Paper submitter

Screenshot 2025-03-18 at 10.26.16 PM.png

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

A video and written explanation - https://aipapersacademy.com/dapo/

This comment has been hidden
This comment has been hidden (marked as Off-Topic)

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 5

Browse 5 datasets citing this paper

Spaces citing this paper 1

Collections including this paper 25

Лучший частный хостинг