Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2024-02-07T01:21:51.880Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7435516715049744},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"65c72687469efddc1ccea585","author":{"_id":"64bd167fb7375f6b8460f52d","avatarUrl":"/avatars/2036be85499a9d3002d38416a7cfaa31.svg","fullname":"zaid zameer shaikh","name":"zaidzameer010","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false},"createdAt":"2024-02-10T07:32:23.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"deepseek ain't no joke","html":"
deepseek ain't no joke
\n","updatedAt":"2024-02-10T07:32:42.741Z","author":{"_id":"64bd167fb7375f6b8460f52d","avatarUrl":"/avatars/2036be85499a9d3002d38416a7cfaa31.svg","fullname":"zaid zameer shaikh","name":"zaidzameer010","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.2873454689979553},"editors":["zaidzameer010"],"editorAvatarUrls":["/avatars/2036be85499a9d3002d38416a7cfaa31.svg"],"reactions":[{"reaction":"π","users":["DINESHKUMARM","wjmcat"],"count":2}],"isReport":false}},{"id":"664e9268f79b50be1443ab35","author":{"_id":"664422b5044a16758a21cc7f","avatarUrl":"/avatars/5701da5bb1a7189d2230350688530401.svg","fullname":"Yang","name":"fai1165","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false},"createdAt":"2024-05-23T00:48:40.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"So can we use GRPO to tune LLM models now? \nI mean is GRPO supported and open-source on Huggingface?","html":"
So can we use GRPO to tune LLM models now? I mean is GRPO supported and open-source on Huggingface?
\n","updatedAt":"2024-06-09T01:46:22.680Z","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":142}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5055437684059143},"editors":["blanchon"],"editorAvatarUrls":["/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2402.03300","authors":[{"_id":"65c19a1f55c4f06fa9692c52","user":{"_id":"65db64f8b62d242ed8711701","avatarUrl":"/avatars/753e9f980eb6786c6b53b2f1becbf745.svg","isPro":false,"fullname":"Zhihong Shao","user":"ZhihongShao","type":"user"},"name":"Zhihong Shao","status":"claimed_verified","statusLastChangedAt":"2025-01-26T11:41:46.616Z","hidden":false},{"_id":"65c19a1f55c4f06fa9692c53","user":{"_id":"656873f33fd0bf1f82558695","avatarUrl":"/avatars/7a085da2e2a91d7f41988501a573ebf9.svg","isPro":false,"fullname":"PEIYI, WANG","user":"peiyiwang89","type":"user"},"name":"Peiyi Wang","status":"admin_assigned","statusLastChangedAt":"2024-02-13T08:13:15.437Z","hidden":false},{"_id":"65c19a1f55c4f06fa9692c54","user":{"_id":"63cd76b4374057a338e8e703","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63cd76b4374057a338e8e703/i4Qk5-0aYx3oRhC8b50aJ.jpeg","isPro":false,"fullname":"zhuqihao","user":"zqh11","type":"user"},"name":"Qihao Zhu","status":"admin_assigned","statusLastChangedAt":"2024-02-13T08:13:25.841Z","hidden":false},{"_id":"65c19a1f55c4f06fa9692c55","name":"Runxin Xu","hidden":false},{"_id":"65c19a1f55c4f06fa9692c56","user":{"_id":"6565a2dd131d13ccc5d8cb12","avatarUrl":"/avatars/f5c5441ba74791b64c9740911f952bac.svg","isPro":false,"fullname":"Junxiao Song","user":"haha-point","type":"user"},"name":"Junxiao Song","status":"admin_assigned","statusLastChangedAt":"2024-02-13T08:13:39.870Z","hidden":false},{"_id":"65c19a1f55c4f06fa9692c57","name":"Mingchuan Zhang","hidden":false},{"_id":"65c19a1f55c4f06fa9692c58","name":"Y. K. Li","hidden":false},{"_id":"65c19a1f55c4f06fa9692c59","name":"Y. Wu","hidden":false},{"_id":"65c19a1f55c4f06fa9692c5a","user":{"_id":"653df20eaa1f487614da4db1","avatarUrl":"/avatars/12b27ce2c59f53b7e464039deab36a5d.svg","isPro":false,"fullname":"Daya Guo","user":"guoday","type":"user"},"name":"Daya Guo","status":"admin_assigned","statusLastChangedAt":"2024-02-13T08:14:02.332Z","hidden":false}],"publishedAt":"2024-02-05T18:55:32.000Z","submittedOnDailyAt":"2024-02-06T00:57:34.722Z","title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open\n Language Models","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Mathematical reasoning poses a significant challenge for language models due\nto its complex and structured nature. In this paper, we introduce DeepSeekMath\n7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B\nmath-related tokens sourced from Common Crawl, together with natural language\nand code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the\ncompetition-level MATH benchmark without relying on external toolkits and\nvoting techniques, approaching the performance level of Gemini-Ultra and GPT-4.\nSelf-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH.\nThe mathematical reasoning capability of DeepSeekMath is attributed to two key\nfactors: First, we harness the significant potential of publicly available web\ndata through a meticulously engineered data selection pipeline. Second, we\nintroduce Group Relative Policy Optimization (GRPO), a variant of Proximal\nPolicy Optimization (PPO), that enhances mathematical reasoning abilities while\nconcurrently optimizing the memory usage of PPO.","upvotes":129,"discussionId":"65c19a2055c4f06fa9692c9b","ai_summary":"DeepSeekMath 7B improves mathematical reasoning through enhanced data pre-training and Group Relative Policy Optimization, achieving high scores on MATH benchmark without external tools.","ai_keywords":["DeepSeekMath 7B","DeepSeek-Coder-Base-v1.5 7B","MATH benchmark","Group Relative Policy Optimization","Proximal Policy Optimization","mathematical reasoning","data selection pipeline"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64b26c035e1230a79f897880","avatarUrl":"/avatars/5427b05b3ef627d4d8281f9a33bb98ab.svg","isPro":false,"fullname":"zhangwenbin","user":"ExceedZhang","type":"user"},{"_id":"6447384627a3e9e0b7c7b9aa","avatarUrl":"/avatars/5d0ae0d669554e5133c7b4f5ca83efb2.svg","isPro":false,"fullname":"Mr Jack Tung","user":"MrJackTung","type":"user"},{"_id":"647c84d9e07cf9bb2d467f69","avatarUrl":"/avatars/c66f2a4e0da3ba24aa7d4c050026fe6d.svg","isPro":false,"fullname":"Maksym Sutkovenko","user":"Subuday","type":"user"},{"_id":"60d418c8d7e9cf17e5265ae4","avatarUrl":"/avatars/9f1d9a44b79e0c81d594b50844f0aeda.svg","isPro":false,"fullname":"Qin","user":"Lonnie","type":"user"},{"_id":"6555125a4f361968f0e3aad7","avatarUrl":"/avatars/e7692d82804338f21ecdc6e731f5c5ea.svg","isPro":false,"fullname":"marinaretikof","user":"marinaretik","type":"user"},{"_id":"625026b7d2d191ac43320c5e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/625026b7d2d191ac43320c5e/2ExzHlZ-Bk8SQMyBjeY6N.jpeg","isPro":false,"fullname":"Jingcheng Hu","user":"reign12","type":"user"},{"_id":"5e32d89653d2a72512789cdc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e32d89653d2a72512789cdc/anNXBAYRGuh9jFw2eOUEb.jpeg","isPro":false,"fullname":"Arunkumar Venkataramanan","user":"ArunkumarVR","type":"user"},{"_id":"654448bb26612e5469a4d672","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/1O3niyiqaKPTNkGpPJSKU.png","isPro":false,"fullname":"Marat Babaev","user":"maratFC","type":"user"},{"_id":"636cee7acfb49b46822006a1","avatarUrl":"/avatars/928b1cc75ff8e768bfe0cbdd40d7b11f.svg","isPro":false,"fullname":"Attila LukΓ‘cs","user":"attilalukacs","type":"user"},{"_id":"6032802e1f993496bc14d9e3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6032802e1f993496bc14d9e3/w6hr-DEQot4VVkoyRIBiy.png","isPro":false,"fullname":"Omar Sanseviero","user":"osanseviero","type":"user"},{"_id":"632289a7909ac44b572bf51a","avatarUrl":"/avatars/7ebca310419ea7c3e0f5a4b57fbbdf6e.svg","isPro":false,"fullname":"galen","user":"AlphaGalen","type":"user"},{"_id":"64ca7c04710645aa7bdbbfff","avatarUrl":"/avatars/c12f4cb6dc1ff0010edb3ef4cfcccd7c.svg","isPro":false,"fullname":"Lize Pirenne","user":"Inversta","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":1}">
DeepSeekMath 7B improves mathematical reasoning through enhanced data pre-training and Group Relative Policy Optimization, achieving high scores on MATH benchmark without external tools.
AI-generated summary
Mathematical reasoning poses a significant challenge for language models due
to its complex and structured nature. In this paper, we introduce DeepSeekMath
7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B
math-related tokens sourced from Common Crawl, together with natural language
and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the
competition-level MATH benchmark without relying on external toolkits and
voting techniques, approaching the performance level of Gemini-Ultra and GPT-4.
Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH.
The mathematical reasoning capability of DeepSeekMath is attributed to two key
factors: First, we harness the significant potential of publicly available web
data through a meticulously engineered data selection pipeline. Second, we
introduce Group Relative Policy Optimization (GRPO), a variant of Proximal
Policy Optimization (PPO), that enhances mathematical reasoning abilities while
concurrently optimizing the memory usage of PPO.