lynx   »   [go: up one dir, main page]

https://github.com/PeterGriffinJin/Search-R1

\n","updatedAt":"2025-03-13T21:21:30.784Z","author":{"_id":"5f1158120c833276f61f1a84","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1608042047613-5f1158120c833276f61f1a84.jpeg","fullname":"Niels Rogge","name":"nielsr","type":"user","isPro":true,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":978}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7829396724700928},"editors":["nielsr"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1608042047613-5f1158120c833276f61f1a84.jpeg"],"reactions":[],"isReport":false}},{"id":"67d387bf1eb6b21d39859085","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264},"createdAt":"2025-03-14T01:34:55.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning](https://huggingface.co/papers/2503.05592) (2025)\n* [RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision](https://huggingface.co/papers/2502.13957) (2025)\n* [Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning](https://huggingface.co/papers/2503.06034) (2025)\n* [C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation](https://huggingface.co/papers/2502.06205) (2025)\n* [LLM Post-Training: A Deep Dive into Reasoning Large Language Models](https://huggingface.co/papers/2502.21321) (2025)\n* [Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling](https://huggingface.co/papers/2501.11651) (2025)\n* [Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search](https://huggingface.co/papers/2502.02508) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-03-14T01:34:55.167Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7192125916481018},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2503.09516","authors":[{"_id":"67d238ab7d0fc37e671feaa8","user":{"_id":"64b8222749bde5d948104627","avatarUrl":"/avatars/e179dc68aebae503dcd7ea6b65b4a4b7.svg","isPro":false,"fullname":"Bowen","user":"PeterJinGo","type":"user"},"name":"Bowen Jin","status":"claimed_verified","statusLastChangedAt":"2025-03-13T08:24:22.835Z","hidden":false},{"_id":"67d238ab7d0fc37e671feaa9","user":{"_id":"64395fa0d45e8db3e28667a4","avatarUrl":"/avatars/cd050f59e2ac79f02ee01dcbb3be00e5.svg","isPro":false,"fullname":"Hansi Zeng","user":"hzeng","type":"user"},"name":"Hansi Zeng","status":"admin_assigned","statusLastChangedAt":"2025-03-14T11:24:04.316Z","hidden":false},{"_id":"67d238ab7d0fc37e671feaaa","user":{"_id":"644c4eff9a7c19cfaf0c8f4a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/2gHtfhPFjSsnwxVtBnxe2.jpeg","isPro":false,"fullname":"Zhenrui Yue","user":"yueeeeeeee2837","type":"user"},"name":"Zhenrui Yue","status":"admin_assigned","statusLastChangedAt":"2025-03-14T11:24:10.545Z","hidden":false},{"_id":"67d238ab7d0fc37e671feaab","name":"Dong Wang","hidden":false},{"_id":"67d238ab7d0fc37e671feaac","name":"Hamed Zamani","hidden":false},{"_id":"67d238ab7d0fc37e671feaad","name":"Jiawei Han","hidden":false}],"publishedAt":"2025-03-12T16:26:39.000Z","submittedOnDailyAt":"2025-03-13T19:51:30.764Z","title":"Search-R1: Training LLMs to Reason and Leverage Search Engines with\n Reinforcement Learning","submittedOnDailyBy":{"_id":"5f1158120c833276f61f1a84","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1608042047613-5f1158120c833276f61f1a84.jpeg","isPro":true,"fullname":"Niels Rogge","user":"nielsr","type":"user"},"summary":"Efficiently acquiring external knowledge and up-to-date information is\nessential for effective reasoning and text generation in large language models\n(LLMs). Retrieval augmentation and tool-use training approaches where a search\nengine is treated as a tool lack complex multi-turn retrieval flexibility or\nrequire large-scale supervised data. Prompting advanced LLMs with reasoning\ncapabilities during inference to use search engines is not optimal, since the\nLLM does not learn how to optimally interact with the search engine. This paper\nintroduces Search-R1, an extension of the DeepSeek-R1 model where the LLM\nlearns -- solely through reinforcement learning (RL) -- to autonomously\ngenerate (multiple) search queries during step-by-step reasoning with real-time\nretrieval. Search-R1 optimizes LLM rollouts with multi-turn search\ninteractions, leveraging retrieved token masking for stable RL training and a\nsimple outcome-based reward function. Experiments on seven question-answering\ndatasets show that Search-R1 improves performance by 26% (Qwen2.5-7B), 21%\n(Qwen2.5-3B), and 10% (LLaMA3.2-3B) over SOTA baselines. This paper further\nprovides empirical insights into RL optimization methods, LLM choices, and\nresponse length dynamics in retrieval-augmented reasoning. The code and model\ncheckpoints are available at https://github.com/PeterGriffinJin/Search-R1.","upvotes":36,"discussionId":"67d238ae7d0fc37e671feb7c","githubRepo":"https://github.com/PeterGriffinJin/Search-R1","ai_summary":"Search-R1, an extension of DeepSeek-R1, enhances large language models by autonomously generating multiple search queries during step-by-step reasoning using reinforcement learning, improving performance in retrieval-augmented question-answering tasks.","ai_keywords":["retrieval augmentation","reinforcement learning","search queries","step-by-step reasoning","real-time retrieval","outcome-based reward function","multi-turn search interactions","token masking"],"githubStars":3232},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6434b6619bd5a84b5dcfa4de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6434b6619bd5a84b5dcfa4de/h8Q6kPNjFNc03wmdboHzq.jpeg","isPro":false,"fullname":"Young-Jun Lee","user":"passing2961","type":"user"},{"_id":"6584ed47cf1596d00290dfb5","avatarUrl":"/avatars/05c5320d2f27e8e13cbde5ff45b638e7.svg","isPro":false,"fullname":"name","user":"MaximusAI","type":"user"},{"_id":"64b8222749bde5d948104627","avatarUrl":"/avatars/e179dc68aebae503dcd7ea6b65b4a4b7.svg","isPro":false,"fullname":"Bowen","user":"PeterJinGo","type":"user"},{"_id":"665b133508d536a8ac804f7d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Uwi0OnANdTbRbHHQvGqvR.png","isPro":false,"fullname":"Paulson","user":"Pnaomi","type":"user"},{"_id":"646def60df618b303b419323","avatarUrl":"/avatars/97aa761d5255abf230304cfeade87835.svg","isPro":false,"fullname":"Lei Wang","user":"demolei","type":"user"},{"_id":"6358edff3b3638bdac83f7ac","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1666772404424-noauth.jpeg","isPro":false,"fullname":"Pratyay Banerjee","user":"Neilblaze","type":"user"},{"_id":"64d4615cf8082bf19b916492","avatarUrl":"/avatars/8e1b59565ec5e4b31090cf1b911781b9.svg","isPro":false,"fullname":"wongyukim","user":"wongyukim","type":"user"},{"_id":"65015e90f355a888f95f7a03","avatarUrl":"/avatars/5d1e9cd856772671a4f31e492b23f2b9.svg","isPro":false,"fullname":"M","user":"Aneerudh","type":"user"},{"_id":"64bbe9b236eb058cd9d6a5b9","avatarUrl":"/avatars/c7c01a3fa8809e73800392679abff6d5.svg","isPro":false,"fullname":"Kai Zuberbühler","user":"kaizuberbuehler","type":"user"},{"_id":"67a37810ad1ebf3c241496c2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/azNl-spzRfn4YPsxkZpw7.png","isPro":false,"fullname":"Eric","user":"FightMilk69","type":"user"},{"_id":"64395fa0d45e8db3e28667a4","avatarUrl":"/avatars/cd050f59e2ac79f02ee01dcbb3be00e5.svg","isPro":false,"fullname":"Hansi Zeng","user":"hzeng","type":"user"},{"_id":"6581f9514adaee05cf640f81","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6581f9514adaee05cf640f81/sXvEEraq2QlSIyWHlSmpa.jpeg","isPro":false,"fullname":"Xi","user":"xi0v","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":3}">
Papers
arxiv:2503.09516

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Published on Mar 12
· Submitted by Niels Rogge on Mar 13
#3 Paper of the day
Authors:
,
,

Abstract

Search-R1, an extension of DeepSeek-R1, enhances large language models by autonomously generating multiple search queries during step-by-step reasoning using reinforcement learning, improving performance in retrieval-augmented question-answering tasks.

AI-generated summary

Efficiently acquiring external knowledge and up-to-date information is essential for effective reasoning and text generation in large language models (LLMs). Retrieval augmentation and tool-use training approaches where a search engine is treated as a tool lack complex multi-turn retrieval flexibility or require large-scale supervised data. Prompting advanced LLMs with reasoning capabilities during inference to use search engines is not optimal, since the LLM does not learn how to optimally interact with the search engine. This paper introduces Search-R1, an extension of the DeepSeek-R1 model where the LLM learns -- solely through reinforcement learning (RL) -- to autonomously generate (multiple) search queries during step-by-step reasoning with real-time retrieval. Search-R1 optimizes LLM rollouts with multi-turn search interactions, leveraging retrieved token masking for stable RL training and a simple outcome-based reward function. Experiments on seven question-answering datasets show that Search-R1 improves performance by 26% (Qwen2.5-7B), 21% (Qwen2.5-3B), and 10% (LLaMA3.2-3B) over SOTA baselines. This paper further provides empirical insights into RL optimization methods, LLM choices, and response length dynamics in retrieval-augmented reasoning. The code and model checkpoints are available at https://github.com/PeterGriffinJin/Search-R1.

Community

Paper submitter

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.09516 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.09516 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.09516 in a Space README.md to link it from this page.

Collections including this paper 14

Лучший частный хостинг