lynx   »   [go: up one dir, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-10-31T01:34:19.148Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.778410017490387},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2410.22304","authors":[{"_id":"672196c94ed4ff86a1a2d479","user":{"_id":"642f4c789b2484d7d8551a93","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/642f4c789b2484d7d8551a93/0lH4YXcbZa-Xlzj6ESo7F.jpeg","isPro":true,"fullname":"Yihe Deng","user":"ydeng9","type":"user"},"name":"Yihe Deng","status":"claimed_verified","statusLastChangedAt":"2024-10-30T09:52:44.867Z","hidden":false},{"_id":"672196c94ed4ff86a1a2d47a","name":"Paul Mineiro","hidden":false}],"publishedAt":"2024-10-29T17:50:31.000Z","submittedOnDailyAt":"2024-10-30T00:46:03.166Z","title":"Flow-DPO: Improving LLM Mathematical Reasoning through Online\n Multi-Agent Learning","submittedOnDailyBy":{"_id":"642f4c789b2484d7d8551a93","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/642f4c789b2484d7d8551a93/0lH4YXcbZa-Xlzj6ESo7F.jpeg","isPro":true,"fullname":"Yihe Deng","user":"ydeng9","type":"user"},"summary":"Mathematical reasoning is a crucial capability for Large Language Models\n(LLMs), yet generating detailed and accurate reasoning traces remains a\nsignificant challenge. This paper introduces a novel approach to produce\nhigh-quality reasoning traces for LLM fine-tuning using online learning\nFlows. Our method employs an incremental output production Flow, where\ncomponent LLMs collaboratively construct solutions through iterative\ncommunication. We train the Flow using online Direct Preference Optimization\n(DPO) learning with rollouts, generating DPO pairs for each training example\nand updating models in real-time. We directly compare the quality of reasoning\ntraces generated by our method with those produced through direct model\ninference, demonstrating the effectiveness of our approach in improving LLM\nperformance in mathematical reasoning tasks.","upvotes":18,"discussionId":"672196ca4ed4ff86a1a2d4a0","ai_summary":"A novel approach using online learning Flows with Direct Preference Optimization enhances the quality of reasoning traces for LLMs in mathematical reasoning tasks.","ai_keywords":["Flows","incremental output production","Direct Preference Optimization (DPO)","rollouts","DPO pairs","reasoning traces","mathematical reasoning","LLM fine-tuning"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"642f4c789b2484d7d8551a93","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/642f4c789b2484d7d8551a93/0lH4YXcbZa-Xlzj6ESo7F.jpeg","isPro":true,"fullname":"Yihe Deng","user":"ydeng9","type":"user"},{"_id":"62f82e52870a3f98bbf9e302","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62f82e52870a3f98bbf9e302/5pN3oNBouZWlYu-uKa7lA.jpeg","isPro":false,"fullname":"Yu Yang","user":"yuyangy","type":"user"},{"_id":"6721b40b2c9459ac872c5eb7","avatarUrl":"/avatars/c4b3d33e22bae8170db5c5fa25273fe7.svg","isPro":false,"fullname":"Data Explorer","user":"qwerty9904","type":"user"},{"_id":"640b39d37a241b3e49663fe0","avatarUrl":"/avatars/fe3ed5527a83bacadce16419acab47ef.svg","isPro":false,"fullname":"lisass123","user":"linsa11","type":"user"},{"_id":"671f89cc36c533b6aba9ab61","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/671f89cc36c533b6aba9ab61/cJ07Be6GlRtUeHOSdgS0C.png","isPro":false,"fullname":"Tang","user":"lzZzZx328","type":"user"},{"_id":"646def60df618b303b419323","avatarUrl":"/avatars/97aa761d5255abf230304cfeade87835.svg","isPro":false,"fullname":"Lei Wang","user":"demolei","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"641b754d1911d3be6745cce9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641b754d1911d3be6745cce9/DxjZG1XT4H3ZHF7qHxWxk.jpeg","isPro":true,"fullname":"atayloraerospace","user":"Taylor658","type":"user"},{"_id":"62deb6c3520a9fae78bb9bc3","avatarUrl":"/avatars/5d75fffa9bad36d20adb8f47141d1f0b.svg","isPro":false,"fullname":"Literate Goggles","user":"literate-goggles","type":"user"},{"_id":"654e024de113b04ba5c71e2f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/654e024de113b04ba5c71e2f/WH6S_gpQU6OXqDaiPpheK.jpeg","isPro":true,"fullname":"Rui Sun","user":"ThreeSR","type":"user"},{"_id":"633a00248f27255b6b54ea5f","avatarUrl":"/avatars/8ad54c2d8a42093923cbdd6f15e0d7a7.svg","isPro":false,"fullname":"dfuhoiysOHSVFh82934gfjklb","user":"huba-buba","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2410.22304

Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning

Published on Oct 29, 2024
· Submitted by Yihe Deng on Oct 30, 2024
Authors:

Abstract

A novel approach using online learning Flows with Direct Preference Optimization enhances the quality of reasoning traces for LLMs in mathematical reasoning tasks.

AI-generated summary

Mathematical reasoning is a crucial capability for Large Language Models (LLMs), yet generating detailed and accurate reasoning traces remains a significant challenge. This paper introduces a novel approach to produce high-quality reasoning traces for LLM fine-tuning using online learning Flows. Our method employs an incremental output production Flow, where component LLMs collaboratively construct solutions through iterative communication. We train the Flow using online Direct Preference Optimization (DPO) learning with rollouts, generating DPO pairs for each training example and updating models in real-time. We directly compare the quality of reasoning traces generated by our method with those produced through direct model inference, demonstrating the effectiveness of our approach in improving LLM performance in mathematical reasoning tasks.

Community

Paper author Paper submitter

Mathematical reasoning is a crucial capability for Large Language Models (LLMs), yet generating detailed and accurate reasoning traces remains a significant challenge. This paper introduces a novel approach to produce high-quality reasoning traces for LLM fine-tuning using online learning Flows. Our method employs an incremental output production Flow, where component LLMs collaboratively construct solutions through iterative communication. We train the Flow using online Direct Preference Optimization (DPO) learning with rollouts, generating DPO pairs for each training example and updating models in real-time. We directly compare the quality of reasoning traces generated by our method with those produced through direct model inference, demonstrating the effectiveness of our approach in improving LLM performance in mathematical reasoning tasks.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2410.22304 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2410.22304 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2410.22304 in a Space README.md to link it from this page.

Collections including this paper 7

Лучший частный хостинг