lynx   »   [go: up one dir, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-09-11T01:34:36.705Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7049508690834045},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2509.07980","authors":[{"_id":"68c0d8e13912ed54cf543209","user":{"_id":"6623ea65b642e29cdf90a1b4","avatarUrl":"/avatars/e32e90574c1162b2be87ed78604e3e4d.svg","isPro":true,"fullname":"TongZheng","user":"TongZheng1999","type":"user"},"name":"Tong Zheng","status":"claimed_verified","statusLastChangedAt":"2025-09-12T16:18:27.526Z","hidden":false},{"_id":"68c0d8e13912ed54cf54320a","name":"Hongming Zhang","hidden":false},{"_id":"68c0d8e13912ed54cf54320b","name":"Wenhao Yu","hidden":false},{"_id":"68c0d8e13912ed54cf54320c","name":"Xiaoyang Wang","hidden":false},{"_id":"68c0d8e13912ed54cf54320d","name":"Xinyu Yang","hidden":false},{"_id":"68c0d8e13912ed54cf54320e","user":{"_id":"65037565da2d88e201f63b7a","avatarUrl":"/avatars/d1b6ce17236360e9583b8bb4cb87e506.svg","isPro":true,"fullname":"Runpeng Dai","user":"Leo-Dai","type":"user"},"name":"Runpeng Dai","status":"claimed_verified","statusLastChangedAt":"2025-09-12T16:18:31.918Z","hidden":false},{"_id":"68c0d8e13912ed54cf54320f","user":{"_id":"6656bf615b203a05a1f0968c","avatarUrl":"/avatars/1ee0b0099c10dd76c8e3b7d312221b15.svg","isPro":false,"fullname":"Rui Liu","user":"lr10260","type":"user"},"name":"Rui Liu","status":"claimed_verified","statusLastChangedAt":"2025-09-12T16:18:24.186Z","hidden":false},{"_id":"68c0d8e13912ed54cf543210","user":{"_id":"67c406f987a7f49a82811c49","avatarUrl":"/avatars/fa0cab9c296addcab8979904491d6dcd.svg","isPro":false,"fullname":"baohuiwen","user":"baohuiwen","type":"user"},"name":"Huiwen Bao","status":"claimed_verified","statusLastChangedAt":"2025-09-15T15:09:10.334Z","hidden":false},{"_id":"68c0d8e13912ed54cf543211","user":{"_id":"62ea79dd01ed9b0e8f61ccd3","avatarUrl":"/avatars/70af83e0e267be39fcd5f23b85e2dafa.svg","isPro":false,"fullname":"Chengsong Huang","user":"ChengsongHuang","type":"user"},"name":"Chengsong Huang","status":"claimed_verified","statusLastChangedAt":"2025-09-12T16:18:36.269Z","hidden":false},{"_id":"68c0d8e13912ed54cf543212","name":"Heng Huang","hidden":false},{"_id":"68c0d8e13912ed54cf543213","name":"Dong Yu","hidden":false}],"publishedAt":"2025-09-09T17:59:35.000Z","submittedOnDailyAt":"2025-09-10T00:20:34.994Z","title":"Parallel-R1: Towards Parallel Thinking via Reinforcement Learning","submittedOnDailyBy":{"_id":"64f58b970b24e548a85522bc","avatarUrl":"/avatars/c8ca1294b5a1edd609694877e335b22f.svg","isPro":false,"fullname":"Xinyu Yang","user":"Hanyuezhuohua","type":"user"},"summary":"Parallel thinking has emerged as a novel approach for enhancing the reasoning\ncapabilities of large language models (LLMs) by exploring multiple reasoning\npaths concurrently. However, activating such capabilities through training\nremains challenging, as existing methods predominantly rely on supervised\nfine-tuning (SFT) over synthetic data, which encourages teacher-forced\nimitation rather than exploration and generalization. Different from them, we\npropose Parallel-R1, the first reinforcement learning (RL) framework\nthat enables parallel thinking behaviors for complex real-world reasoning\ntasks. Our framework employs a progressive curriculum that explicitly addresses\nthe cold-start problem in training parallel thinking with RL. We first use SFT\non prompt-generated trajectories from easier tasks to instill the parallel\nthinking ability, then transition to RL to explore and generalize this skill on\nharder problems. Experiments on various math benchmarks, including MATH, AMC23,\nand AIME, show that Parallel-R1 successfully instills parallel thinking,\nleading to 8.4% accuracy improvements over the sequential thinking model\ntrained directly on challenging tasks with RL. Further analysis reveals a clear\nshift in the model's thinking behavior: at an early stage, it uses parallel\nthinking as an exploration strategy, while in a later stage, it uses the same\ncapability for multi-perspective verification. Most significantly, we validate\nparallel thinking as a mid-training exploration scaffold, where this\ntemporary exploratory phase unlocks a higher performance ceiling after RL,\nyielding a 42.9% improvement over the baseline on AIME25. Our model, data, and\ncode will be open-source at https://github.com/zhengkid/Parallel-R1.","upvotes":96,"discussionId":"68c0d8e23912ed54cf543214","githubRepo":"https://github.com/zhengkid/Parallel-R1","ai_summary":"Parallel-R1, a reinforcement learning framework, enhances large language models' reasoning capabilities by enabling parallel thinking through a progressive curriculum, leading to significant performance improvements on math benchmarks.","ai_keywords":["parallel thinking","reinforcement learning","progressive curriculum","cold-start problem","supervised fine-tuning","prompt-generated trajectories","sequential thinking","multi-perspective verification","mid-training exploration scaffold"],"githubStars":177},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"657cd228138b7e391444a65d","avatarUrl":"/avatars/c7c984ae483144fab627aa2c54d91d0f.svg","isPro":false,"fullname":"Xiaoyang Wang","user":"xywang1","type":"user"},{"_id":"62ea79dd01ed9b0e8f61ccd3","avatarUrl":"/avatars/70af83e0e267be39fcd5f23b85e2dafa.svg","isPro":false,"fullname":"Chengsong Huang","user":"ChengsongHuang","type":"user"},{"_id":"6623ea65b642e29cdf90a1b4","avatarUrl":"/avatars/e32e90574c1162b2be87ed78604e3e4d.svg","isPro":true,"fullname":"TongZheng","user":"TongZheng1999","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"65037565da2d88e201f63b7a","avatarUrl":"/avatars/d1b6ce17236360e9583b8bb4cb87e506.svg","isPro":true,"fullname":"Runpeng Dai","user":"Leo-Dai","type":"user"},{"_id":"64f58b970b24e548a85522bc","avatarUrl":"/avatars/c8ca1294b5a1edd609694877e335b22f.svg","isPro":false,"fullname":"Xinyu Yang","user":"Hanyuezhuohua","type":"user"},{"_id":"65f69f926ea40b9a29d9f833","avatarUrl":"/avatars/318fe0a16989b57f8105df0ad68998e5.svg","isPro":false,"fullname":"Cenyuan Zhang","user":"zhangcen456","type":"user"},{"_id":"63337b0073c07e8aebb3df0d","avatarUrl":"/avatars/f0e55c3c61538d9521a28eda78cf31db.svg","isPro":false,"fullname":"GUANGZENG HAN","user":"kwangju","type":"user"},{"_id":"5ec82854968f6028e0559f70","avatarUrl":"/avatars/45b58d912f7d00cb351947cd79d5eeb4.svg","isPro":true,"fullname":"Xueguang Ma","user":"MrLight","type":"user"},{"_id":"6440b38d3e0374802e1acc5e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6440b38d3e0374802e1acc5e/w-ZpW_9gCSHUeDKyGSeMt.jpeg","isPro":false,"fullname":"luoyingfeng","user":"luoyingfeng","type":"user"},{"_id":"673bb4924442393f721f905c","avatarUrl":"/avatars/38a79ff86128195ef92dd64762aa4b5d.svg","isPro":false,"fullname":"Wentao Guo","user":"Garl","type":"user"},{"_id":"638efcf4c67af472d316d424","avatarUrl":"/avatars/97a57859d7d87a3a8f1bb41d32a72bc2.svg","isPro":false,"fullname":"Ge Zhang","user":"zhangysk","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":2}">
Papers
arxiv:2509.07980

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Published on Sep 9
· Submitted by Xinyu Yang on Sep 10
#2 Paper of the day
Authors:
,
,
,
,
,

Abstract

Parallel-R1, a reinforcement learning framework, enhances large language models' reasoning capabilities by enabling parallel thinking through a progressive curriculum, leading to significant performance improvements on math benchmarks.

AI-generated summary

Parallel thinking has emerged as a novel approach for enhancing the reasoning capabilities of large language models (LLMs) by exploring multiple reasoning paths concurrently. However, activating such capabilities through training remains challenging, as existing methods predominantly rely on supervised fine-tuning (SFT) over synthetic data, which encourages teacher-forced imitation rather than exploration and generalization. Different from them, we propose Parallel-R1, the first reinforcement learning (RL) framework that enables parallel thinking behaviors for complex real-world reasoning tasks. Our framework employs a progressive curriculum that explicitly addresses the cold-start problem in training parallel thinking with RL. We first use SFT on prompt-generated trajectories from easier tasks to instill the parallel thinking ability, then transition to RL to explore and generalize this skill on harder problems. Experiments on various math benchmarks, including MATH, AMC23, and AIME, show that Parallel-R1 successfully instills parallel thinking, leading to 8.4% accuracy improvements over the sequential thinking model trained directly on challenging tasks with RL. Further analysis reveals a clear shift in the model's thinking behavior: at an early stage, it uses parallel thinking as an exploration strategy, while in a later stage, it uses the same capability for multi-perspective verification. Most significantly, we validate parallel thinking as a mid-training exploration scaffold, where this temporary exploratory phase unlocks a higher performance ceiling after RL, yielding a 42.9% improvement over the baseline on AIME25. Our model, data, and code will be open-source at https://github.com/zhengkid/Parallel-R1.

Community

Paper submitter

Parallel thinking has emerged as a novel approach for enhancing the reasoning capabilities of large language models (LLMs) by exploring multiple reasoning paths concurrently. However, activating such capabilities through training remains challenging, as existing methods predominantly rely on supervised fine-tuning (SFT) over synthetic data, which encourages teacher-forced imitation rather than exploration and generalization. Different from them, we propose Parallel-R1, the first reinforcement learning (RL) framework that enables parallel thinking behaviors for complex real-world reasoning tasks. Our framework employs a progressive curriculum that explicitly addresses the cold-start problem in training parallel thinking with RL. We first use SFT on prompt-generated trajectories from easier tasks to instill the parallel thinking ability, then transition to RL to explore and generalize this skill on harder problems. Experiments on various math benchmarks, including MATH, AMC23, and AIME, show that Parallel-R1 successfully instills parallel thinking, leading to 8.4% accuracy improvements over the sequential thinking model trained directly on challenging tasks with RL. Further analysis reveals a clear shift in the model’s thinking behavior: at an early stage, it uses parallel thinking as an exploration strategy, while in a later stage, it uses the same capability for multi-perspective verification. Most significantly, we validate parallel thinking as a mid-training exploration scaffold, where this temporary exploratory phase unlocks a higher performance ceiling after RL, yielding a 42.9% improvement over the baseline on AIME25.

done ~ - ~ whitout reading

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.07980 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.07980 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.07980 in a Space README.md to link it from this page.

Collections including this paper 16

Лучший частный хостинг