Librarian Bot. I found the following papers similar to this paper. \n
The following papers were recommended by the Semantic Scholar API
\n
\n
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-09-26T01:34:58.252Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7435486912727356},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2509.16990","authors":[{"_id":"68d4e0e836950a9dff156933","user":{"_id":"644662145004f2cb3af08b27","avatarUrl":"/avatars/5f2af24c7410a5db46374d0b84fb479d.svg","isPro":false,"fullname":"Avishai Elmakies","user":"avishai-elmakies","type":"user"},"name":"Avishai Elmakies","status":"claimed_verified","statusLastChangedAt":"2025-09-25T07:10:44.431Z","hidden":false},{"_id":"68d4e0e836950a9dff156934","user":{"_id":"643425b4a4c9c55871a7a02b","avatarUrl":"/avatars/eb2f357888159f5120bbf70a40cb089d.svg","isPro":false,"fullname":"Hagai Aronowitz","user":"hagaia","type":"user"},"name":"Hagai Aronowitz","status":"admin_assigned","statusLastChangedAt":"2025-09-25T11:36:38.678Z","hidden":false},{"_id":"68d4e0e836950a9dff156935","user":{"_id":"62bedff7304b82a773bf8c1b","avatarUrl":"/avatars/f9e79dc196caa95c220127c6212e9944.svg","isPro":false,"fullname":"Nimrod Shabtay","user":"NimrodShabtay1986","type":"user"},"name":"Nimrod Shabtay","status":"claimed_verified","statusLastChangedAt":"2025-09-25T07:10:41.442Z","hidden":false},{"_id":"68d4e0e836950a9dff156936","name":"Eli Schwartz","hidden":false},{"_id":"68d4e0e836950a9dff156937","user":{"_id":"68b3fe80a9f319567d039ec0","avatarUrl":"/avatars/672f169213139d21c7767021a0110402.svg","isPro":false,"fullname":"Ron Hoory","user":"rhoory","type":"user"},"name":"Ron Hoory","status":"admin_assigned","statusLastChangedAt":"2025-09-25T11:36:49.661Z","hidden":false},{"_id":"68d4e0e836950a9dff156938","user":{"_id":"63b7e09359060ca9f4c4de35","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63b7e09359060ca9f4c4de35/0PH1dWNfcXTQ9H0QAeD91.jpeg","isPro":false,"fullname":"Avihu Dekel","user":"Avihu","type":"user"},"name":"Avihu Dekel","status":"admin_assigned","statusLastChangedAt":"2025-09-25T11:36:56.726Z","hidden":false}],"publishedAt":"2025-09-21T09:09:36.000Z","submittedOnDailyAt":"2025-09-25T05:04:00.310Z","title":"Advancing Speech Understanding in Speech-Aware Language Models with GRPO","submittedOnDailyBy":{"_id":"644662145004f2cb3af08b27","avatarUrl":"/avatars/5f2af24c7410a5db46374d0b84fb479d.svg","isPro":false,"fullname":"Avishai Elmakies","user":"avishai-elmakies","type":"user"},"summary":"In this paper, we introduce a Group Relative Policy Optimization (GRPO)-based\nmethod for training Speech-Aware Large Language Models (SALLMs) on open-format\nspeech understanding tasks, such as Spoken Question Answering and Automatic\nSpeech Translation. SALLMs have proven highly effective for speech\nunderstanding tasks. GRPO has recently gained traction for its efficiency in\ntraining LLMs, and prior work has explored its application to SALLMs, primarily\nin multiple-choice tasks. Building on this, we focus on open-format tasks that\nbetter reflect the generative abilities of the models. Our approach leverages\nGRPO with BLEU as the reward signal to optimize SALLMs, and we demonstrate\nempirically that it surpasses standard SFT across several key metrics. Finally,\nwe explore the potential of incorporating off-policy samples within GRPO for\nthese tasks, highlighting avenues for further improvement and further research.","upvotes":14,"discussionId":"68d4e0e836950a9dff156939","ai_summary":"A Group Relative Policy Optimization (GRPO)-based method using BLEU as a reward signal outperforms standard SFT for open-format speech understanding tasks like Spoken Question Answering and Automatic Speech Translation.","ai_keywords":["Group Relative Policy Optimization","GRPO","Speech-Aware Large Language Models","SALLMs","Spoken Question Answering","Automatic Speech Translation","BLEU","standard SFT","off-policy samples"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"644662145004f2cb3af08b27","avatarUrl":"/avatars/5f2af24c7410a5db46374d0b84fb479d.svg","isPro":false,"fullname":"Avishai Elmakies","user":"avishai-elmakies","type":"user"},{"_id":"63b7e09359060ca9f4c4de35","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63b7e09359060ca9f4c4de35/0PH1dWNfcXTQ9H0QAeD91.jpeg","isPro":false,"fullname":"Avihu Dekel","user":"Avihu","type":"user"},{"_id":"62bedff7304b82a773bf8c1b","avatarUrl":"/avatars/f9e79dc196caa95c220127c6212e9944.svg","isPro":false,"fullname":"Nimrod Shabtay","user":"NimrodShabtay1986","type":"user"},{"_id":"66b9bc2dacdbc1d0b39c3b50","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/hwR0pVfP_E8XjimXIxDOU.jpeg","isPro":false,"fullname":"Gallil Maimon","user":"gallilmaimon","type":"user"},{"_id":"62deb6c3520a9fae78bb9bc3","avatarUrl":"/avatars/5d75fffa9bad36d20adb8f47141d1f0b.svg","isPro":false,"fullname":"Literate Goggles","user":"literate-goggles","type":"user"},{"_id":"630dd4218df86f1e5beb2ed7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/630dd4218df86f1e5beb2ed7/fKvNWyWv6CVBdbXXUlrYv.jpeg","isPro":false,"fullname":"Eliahu Horwitz","user":"Eliahu","type":"user"},{"_id":"65c43b8e61c8e6d06ab4bd41","avatarUrl":"/avatars/c97b98252ec3a0e27ea4e561fc901042.svg","isPro":false,"fullname":"NivCohen","user":"NivC","type":"user"},{"_id":"646d239f4220471ca0c6471c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646d239f4220471ca0c6471c/sRwzko8XEUVCkeD7jXceH.jpeg","isPro":false,"fullname":"Guy Yariv","user":"GuyYariv","type":"user"},{"_id":"677f666097f081a35147a998","avatarUrl":"/avatars/c36b839f440348aaf1ab6c7417f9f98f.svg","isPro":false,"fullname":"Noam Ben Sason","user":"Noam-BS-HF","type":"user"},{"_id":"6647502703f5c570d6c44d6c","avatarUrl":"/avatars/afaf34830e18762717f57c89117b8851.svg","isPro":false,"fullname":"Shahaf","user":"shahafvl","type":"user"},{"_id":"6422eab8e2029ade06eeee2c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6422eab8e2029ade06eeee2c/Gai3BHr2WJ0YuhdumqQ_z.png","isPro":false,"fullname":"Mahmud ElHuseyni 🇵🇸","user":"MElHuseyni","type":"user"},{"_id":"66c8916afafc0fc87cd6e9ca","avatarUrl":"/avatars/627cabfbe5fba7393c5e4bba4aa3f07f.svg","isPro":false,"fullname":"Niv Eckhaus","user":"nive-huji","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Advancing Speech Understanding in Speech-Aware Language Models with GRPO
Abstract
A Group Relative Policy Optimization (GRPO)-based method using BLEU as a reward signal outperforms standard SFT for open-format speech understanding tasks like Spoken Question Answering and Automatic Speech Translation.
In this paper, we introduce a Group Relative Policy Optimization (GRPO)-based
method for training Speech-Aware Large Language Models (SALLMs) on open-format
speech understanding tasks, such as Spoken Question Answering and Automatic
Speech Translation. SALLMs have proven highly effective for speech
understanding tasks. GRPO has recently gained traction for its efficiency in
training LLMs, and prior work has explored its application to SALLMs, primarily
in multiple-choice tasks. Building on this, we focus on open-format tasks that
better reflect the generative abilities of the models. Our approach leverages
GRPO with BLEU as the reward signal to optimize SALLMs, and we demonstrate
empirically that it surpasses standard SFT across several key metrics. Finally,
we explore the potential of incorporating off-policy samples within GRPO for
these tasks, highlighting avenues for further improvement and further research.