lynx   »   [go: up one dir, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-09-26T01:34:58.252Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7435486912727356},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2509.16990","authors":[{"_id":"68d4e0e836950a9dff156933","user":{"_id":"644662145004f2cb3af08b27","avatarUrl":"/avatars/5f2af24c7410a5db46374d0b84fb479d.svg","isPro":false,"fullname":"Avishai Elmakies","user":"avishai-elmakies","type":"user"},"name":"Avishai Elmakies","status":"claimed_verified","statusLastChangedAt":"2025-09-25T07:10:44.431Z","hidden":false},{"_id":"68d4e0e836950a9dff156934","user":{"_id":"643425b4a4c9c55871a7a02b","avatarUrl":"/avatars/eb2f357888159f5120bbf70a40cb089d.svg","isPro":false,"fullname":"Hagai Aronowitz","user":"hagaia","type":"user"},"name":"Hagai Aronowitz","status":"admin_assigned","statusLastChangedAt":"2025-09-25T11:36:38.678Z","hidden":false},{"_id":"68d4e0e836950a9dff156935","user":{"_id":"62bedff7304b82a773bf8c1b","avatarUrl":"/avatars/f9e79dc196caa95c220127c6212e9944.svg","isPro":false,"fullname":"Nimrod Shabtay","user":"NimrodShabtay1986","type":"user"},"name":"Nimrod Shabtay","status":"claimed_verified","statusLastChangedAt":"2025-09-25T07:10:41.442Z","hidden":false},{"_id":"68d4e0e836950a9dff156936","name":"Eli Schwartz","hidden":false},{"_id":"68d4e0e836950a9dff156937","user":{"_id":"68b3fe80a9f319567d039ec0","avatarUrl":"/avatars/672f169213139d21c7767021a0110402.svg","isPro":false,"fullname":"Ron Hoory","user":"rhoory","type":"user"},"name":"Ron Hoory","status":"admin_assigned","statusLastChangedAt":"2025-09-25T11:36:49.661Z","hidden":false},{"_id":"68d4e0e836950a9dff156938","user":{"_id":"63b7e09359060ca9f4c4de35","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63b7e09359060ca9f4c4de35/0PH1dWNfcXTQ9H0QAeD91.jpeg","isPro":false,"fullname":"Avihu Dekel","user":"Avihu","type":"user"},"name":"Avihu Dekel","status":"admin_assigned","statusLastChangedAt":"2025-09-25T11:36:56.726Z","hidden":false}],"publishedAt":"2025-09-21T09:09:36.000Z","submittedOnDailyAt":"2025-09-25T05:04:00.310Z","title":"Advancing Speech Understanding in Speech-Aware Language Models with GRPO","submittedOnDailyBy":{"_id":"644662145004f2cb3af08b27","avatarUrl":"/avatars/5f2af24c7410a5db46374d0b84fb479d.svg","isPro":false,"fullname":"Avishai Elmakies","user":"avishai-elmakies","type":"user"},"summary":"In this paper, we introduce a Group Relative Policy Optimization (GRPO)-based\nmethod for training Speech-Aware Large Language Models (SALLMs) on open-format\nspeech understanding tasks, such as Spoken Question Answering and Automatic\nSpeech Translation. SALLMs have proven highly effective for speech\nunderstanding tasks. GRPO has recently gained traction for its efficiency in\ntraining LLMs, and prior work has explored its application to SALLMs, primarily\nin multiple-choice tasks. Building on this, we focus on open-format tasks that\nbetter reflect the generative abilities of the models. Our approach leverages\nGRPO with BLEU as the reward signal to optimize SALLMs, and we demonstrate\nempirically that it surpasses standard SFT across several key metrics. Finally,\nwe explore the potential of incorporating off-policy samples within GRPO for\nthese tasks, highlighting avenues for further improvement and further research.","upvotes":14,"discussionId":"68d4e0e836950a9dff156939","ai_summary":"A Group Relative Policy Optimization (GRPO)-based method using BLEU as a reward signal outperforms standard SFT for open-format speech understanding tasks like Spoken Question Answering and Automatic Speech Translation.","ai_keywords":["Group Relative Policy Optimization","GRPO","Speech-Aware Large Language Models","SALLMs","Spoken Question Answering","Automatic Speech Translation","BLEU","standard SFT","off-policy samples"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"644662145004f2cb3af08b27","avatarUrl":"/avatars/5f2af24c7410a5db46374d0b84fb479d.svg","isPro":false,"fullname":"Avishai Elmakies","user":"avishai-elmakies","type":"user"},{"_id":"63b7e09359060ca9f4c4de35","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63b7e09359060ca9f4c4de35/0PH1dWNfcXTQ9H0QAeD91.jpeg","isPro":false,"fullname":"Avihu Dekel","user":"Avihu","type":"user"},{"_id":"62bedff7304b82a773bf8c1b","avatarUrl":"/avatars/f9e79dc196caa95c220127c6212e9944.svg","isPro":false,"fullname":"Nimrod Shabtay","user":"NimrodShabtay1986","type":"user"},{"_id":"66b9bc2dacdbc1d0b39c3b50","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/hwR0pVfP_E8XjimXIxDOU.jpeg","isPro":false,"fullname":"Gallil Maimon","user":"gallilmaimon","type":"user"},{"_id":"62deb6c3520a9fae78bb9bc3","avatarUrl":"/avatars/5d75fffa9bad36d20adb8f47141d1f0b.svg","isPro":false,"fullname":"Literate Goggles","user":"literate-goggles","type":"user"},{"_id":"630dd4218df86f1e5beb2ed7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/630dd4218df86f1e5beb2ed7/fKvNWyWv6CVBdbXXUlrYv.jpeg","isPro":false,"fullname":"Eliahu Horwitz","user":"Eliahu","type":"user"},{"_id":"65c43b8e61c8e6d06ab4bd41","avatarUrl":"/avatars/c97b98252ec3a0e27ea4e561fc901042.svg","isPro":false,"fullname":"NivCohen","user":"NivC","type":"user"},{"_id":"646d239f4220471ca0c6471c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646d239f4220471ca0c6471c/sRwzko8XEUVCkeD7jXceH.jpeg","isPro":false,"fullname":"Guy Yariv","user":"GuyYariv","type":"user"},{"_id":"677f666097f081a35147a998","avatarUrl":"/avatars/c36b839f440348aaf1ab6c7417f9f98f.svg","isPro":false,"fullname":"Noam Ben Sason","user":"Noam-BS-HF","type":"user"},{"_id":"6647502703f5c570d6c44d6c","avatarUrl":"/avatars/afaf34830e18762717f57c89117b8851.svg","isPro":false,"fullname":"Shahaf","user":"shahafvl","type":"user"},{"_id":"6422eab8e2029ade06eeee2c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6422eab8e2029ade06eeee2c/Gai3BHr2WJ0YuhdumqQ_z.png","isPro":false,"fullname":"Mahmud ElHuseyni 🇵🇸","user":"MElHuseyni","type":"user"},{"_id":"66c8916afafc0fc87cd6e9ca","avatarUrl":"/avatars/627cabfbe5fba7393c5e4bba4aa3f07f.svg","isPro":false,"fullname":"Niv Eckhaus","user":"nive-huji","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2509.16990

Advancing Speech Understanding in Speech-Aware Language Models with GRPO

Published on Sep 21
· Submitted by Avishai Elmakies on Sep 25

Abstract

A Group Relative Policy Optimization (GRPO)-based method using BLEU as a reward signal outperforms standard SFT for open-format speech understanding tasks like Spoken Question Answering and Automatic Speech Translation.

AI-generated summary

In this paper, we introduce a Group Relative Policy Optimization (GRPO)-based method for training Speech-Aware Large Language Models (SALLMs) on open-format speech understanding tasks, such as Spoken Question Answering and Automatic Speech Translation. SALLMs have proven highly effective for speech understanding tasks. GRPO has recently gained traction for its efficiency in training LLMs, and prior work has explored its application to SALLMs, primarily in multiple-choice tasks. Building on this, we focus on open-format tasks that better reflect the generative abilities of the models. Our approach leverages GRPO with BLEU as the reward signal to optimize SALLMs, and we demonstrate empirically that it surpasses standard SFT across several key metrics. Finally, we explore the potential of incorporating off-policy samples within GRPO for these tasks, highlighting avenues for further improvement and further research.

Community

Paper author Paper submitter

This paper presents a simple, yet effective GRPO-based method for improving Speech Understanding tasks in Speech Aware Large Language Models. Showing very promising results with performance better than supervised fine tunning (SFT).
It also investigates the effect of including off-policy samples inside the GRPO setup for those same tasks, showing the promise of mixed-policy GRPO, but also the need for further reasearch and refinment of the method.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.16990 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.16990 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.16990 in a Space README.md to link it from this page.

Collections including this paper 1

Лучший частный хостинг