https://github.com/PeterGriffinJin/Search-R1\n","updatedAt":"2025-03-13T21:21:30.784Z","author":{"_id":"5f1158120c833276f61f1a84","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1608042047613-5f1158120c833276f61f1a84.jpeg","fullname":"Niels Rogge","name":"nielsr","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":978}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7829396724700928},"editors":["nielsr"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1608042047613-5f1158120c833276f61f1a84.jpeg"],"reactions":[],"isReport":false}},{"id":"67d387bf1eb6b21d39859085","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264},"createdAt":"2025-03-14T01:34:55.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning](https://huggingface.co/papers/2503.05592) (2025)\n* [RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision](https://huggingface.co/papers/2502.13957) (2025)\n* [Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning](https://huggingface.co/papers/2503.06034) (2025)\n* [C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation](https://huggingface.co/papers/2502.06205) (2025)\n* [LLM Post-Training: A Deep Dive into Reasoning Large Language Models](https://huggingface.co/papers/2502.21321) (2025)\n* [Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling](https://huggingface.co/papers/2501.11651) (2025)\n* [Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search](https://huggingface.co/papers/2502.02508) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-03-14T01:34:55.167Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7192125916481018},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2503.09516","authors":[{"_id":"67d238ab7d0fc37e671feaa8","user":{"_id":"64b8222749bde5d948104627","avatarUrl":"/avatars/e179dc68aebae503dcd7ea6b65b4a4b7.svg","isPro":false,"fullname":"Bowen","user":"PeterJinGo","type":"user"},"name":"Bowen Jin","status":"claimed_verified","statusLastChangedAt":"2025-03-13T08:24:22.835Z","hidden":false},{"_id":"67d238ab7d0fc37e671feaa9","user":{"_id":"64395fa0d45e8db3e28667a4","avatarUrl":"/avatars/cd050f59e2ac79f02ee01dcbb3be00e5.svg","isPro":false,"fullname":"Hansi Zeng","user":"hzeng","type":"user"},"name":"Hansi Zeng","status":"admin_assigned","statusLastChangedAt":"2025-03-14T11:24:04.316Z","hidden":false},{"_id":"67d238ab7d0fc37e671feaaa","user":{"_id":"644c4eff9a7c19cfaf0c8f4a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/2gHtfhPFjSsnwxVtBnxe2.jpeg","isPro":false,"fullname":"Zhenrui Yue","user":"yueeeeeeee2837","type":"user"},"name":"Zhenrui Yue","status":"admin_assigned","statusLastChangedAt":"2025-03-14T11:24:10.545Z","hidden":false},{"_id":"67d238ab7d0fc37e671feaab","name":"Dong Wang","hidden":false},{"_id":"67d238ab7d0fc37e671feaac","name":"Hamed Zamani","hidden":false},{"_id":"67d238ab7d0fc37e671feaad","name":"Jiawei Han","hidden":false}],"publishedAt":"2025-03-12T16:26:39.000Z","submittedOnDailyAt":"2025-03-13T19:51:30.764Z","title":"Search-R1: Training LLMs to Reason and Leverage Search Engines with\n Reinforcement Learning","submittedOnDailyBy":{"_id":"5f1158120c833276f61f1a84","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1608042047613-5f1158120c833276f61f1a84.jpeg","isPro":true,"fullname":"Niels Rogge","user":"nielsr","type":"user"},"summary":"Efficiently acquiring external knowledge and up-to-date information is\nessential for effective reasoning and text generation in large language models\n(LLMs). Retrieval augmentation and tool-use training approaches where a search\nengine is treated as a tool lack complex multi-turn retrieval flexibility or\nrequire large-scale supervised data. Prompting advanced LLMs with reasoning\ncapabilities during inference to use search engines is not optimal, since the\nLLM does not learn how to optimally interact with the search engine. This paper\nintroduces Search-R1, an extension of the DeepSeek-R1 model where the LLM\nlearns -- solely through reinforcement learning (RL) -- to autonomously\ngenerate (multiple) search queries during step-by-step reasoning with real-time\nretrieval. Search-R1 optimizes LLM rollouts with multi-turn search\ninteractions, leveraging retrieved token masking for stable RL training and a\nsimple outcome-based reward function. Experiments on seven question-answering\ndatasets show that Search-R1 improves performance by 26% (Qwen2.5-7B), 21%\n(Qwen2.5-3B), and 10% (LLaMA3.2-3B) over SOTA baselines. This paper further\nprovides empirical insights into RL optimization methods, LLM choices, and\nresponse length dynamics in retrieval-augmented reasoning. The code and model\ncheckpoints are available at https://github.com/PeterGriffinJin/Search-R1.","upvotes":36,"discussionId":"67d238ae7d0fc37e671feb7c","githubRepo":"https://github.com/PeterGriffinJin/Search-R1","ai_summary":"Search-R1, an extension of DeepSeek-R1, enhances large language models by autonomously generating multiple search queries during step-by-step reasoning using reinforcement learning, improving performance in retrieval-augmented question-answering tasks.","ai_keywords":["retrieval augmentation","reinforcement learning","search queries","step-by-step reasoning","real-time retrieval","outcome-based reward function","multi-turn search interactions","token masking"],"githubStars":3232},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6434b6619bd5a84b5dcfa4de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6434b6619bd5a84b5dcfa4de/h8Q6kPNjFNc03wmdboHzq.jpeg","isPro":false,"fullname":"Young-Jun Lee","user":"passing2961","type":"user"},{"_id":"6584ed47cf1596d00290dfb5","avatarUrl":"/avatars/05c5320d2f27e8e13cbde5ff45b638e7.svg","isPro":false,"fullname":"name","user":"MaximusAI","type":"user"},{"_id":"64b8222749bde5d948104627","avatarUrl":"/avatars/e179dc68aebae503dcd7ea6b65b4a4b7.svg","isPro":false,"fullname":"Bowen","user":"PeterJinGo","type":"user"},{"_id":"665b133508d536a8ac804f7d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Uwi0OnANdTbRbHHQvGqvR.png","isPro":false,"fullname":"Paulson","user":"Pnaomi","type":"user"},{"_id":"646def60df618b303b419323","avatarUrl":"/avatars/97aa761d5255abf230304cfeade87835.svg","isPro":false,"fullname":"Lei Wang","user":"demolei","type":"user"},{"_id":"6358edff3b3638bdac83f7ac","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1666772404424-noauth.jpeg","isPro":false,"fullname":"Pratyay Banerjee","user":"Neilblaze","type":"user"},{"_id":"64d4615cf8082bf19b916492","avatarUrl":"/avatars/8e1b59565ec5e4b31090cf1b911781b9.svg","isPro":false,"fullname":"wongyukim","user":"wongyukim","type":"user"},{"_id":"65015e90f355a888f95f7a03","avatarUrl":"/avatars/5d1e9cd856772671a4f31e492b23f2b9.svg","isPro":false,"fullname":"M","user":"Aneerudh","type":"user"},{"_id":"64bbe9b236eb058cd9d6a5b9","avatarUrl":"/avatars/c7c01a3fa8809e73800392679abff6d5.svg","isPro":false,"fullname":"Kai Zuberbühler","user":"kaizuberbuehler","type":"user"},{"_id":"67a37810ad1ebf3c241496c2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/azNl-spzRfn4YPsxkZpw7.png","isPro":false,"fullname":"Eric","user":"FightMilk69","type":"user"},{"_id":"64395fa0d45e8db3e28667a4","avatarUrl":"/avatars/cd050f59e2ac79f02ee01dcbb3be00e5.svg","isPro":false,"fullname":"Hansi Zeng","user":"hzeng","type":"user"},{"_id":"6581f9514adaee05cf640f81","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6581f9514adaee05cf640f81/sXvEEraq2QlSIyWHlSmpa.jpeg","isPro":false,"fullname":"Xi","user":"xi0v","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":3}">
Search-R1, an extension of DeepSeek-R1, enhances large language models by autonomously generating multiple search queries during step-by-step reasoning using reinforcement learning, improving performance in retrieval-augmented question-answering tasks.
AI-generated summary
Efficiently acquiring external knowledge and up-to-date information is
essential for effective reasoning and text generation in large language models
(LLMs). Retrieval augmentation and tool-use training approaches where a search
engine is treated as a tool lack complex multi-turn retrieval flexibility or
require large-scale supervised data. Prompting advanced LLMs with reasoning
capabilities during inference to use search engines is not optimal, since the
LLM does not learn how to optimally interact with the search engine. This paper
introduces Search-R1, an extension of the DeepSeek-R1 model where the LLM
learns -- solely through reinforcement learning (RL) -- to autonomously
generate (multiple) search queries during step-by-step reasoning with real-time
retrieval. Search-R1 optimizes LLM rollouts with multi-turn search
interactions, leveraging retrieved token masking for stable RL training and a
simple outcome-based reward function. Experiments on seven question-answering
datasets show that Search-R1 improves performance by 26% (Qwen2.5-7B), 21%
(Qwen2.5-3B), and 10% (LLaMA3.2-3B) over SOTA baselines. This paper further
provides empirical insights into RL optimization methods, LLM choices, and
response length dynamics in retrieval-augmented reasoning. The code and model
checkpoints are available at https://github.com/PeterGriffinJin/Search-R1.