Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-04-29T01:34:59.922Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7316564321517944},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2504.15716","authors":[{"_id":"680dcc5d3478de07603a8036","user":{"_id":"642656cbad1e3b0e6e91b752","avatarUrl":"/avatars/3bf0ee15fd528e09b2b889f5cce3cbd0.svg","isPro":false,"fullname":"Jie Zhu","user":"amazingj","type":"user"},"name":"Jie Zhu","status":"claimed_verified","statusLastChangedAt":"2025-04-28T07:39:02.713Z","hidden":false},{"_id":"680dcc5d3478de07603a8037","name":"Qian Chen","hidden":false},{"_id":"680dcc5d3478de07603a8038","name":"Huaixia Dou","hidden":false},{"_id":"680dcc5d3478de07603a8039","name":"Junhui Li","hidden":false},{"_id":"680dcc5d3478de07603a803a","user":{"_id":"62f0bb7f47cf7b3344fe9ad6","avatarUrl":"/avatars/e509f31af551cfd30e319b03d4c92740.svg","isPro":false,"fullname":"guo","user":"lifan","type":"user"},"name":"Lifan Guo","status":"admin_assigned","statusLastChangedAt":"2025-04-29T12:41:24.106Z","hidden":false},{"_id":"680dcc5d3478de07603a803b","name":"Feng Chen","hidden":false},{"_id":"680dcc5d3478de07603a803c","name":"Chi Zhang","hidden":false}],"publishedAt":"2025-04-22T09:01:04.000Z","submittedOnDailyAt":"2025-04-28T06:16:26.234Z","title":"DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large\n Language Models","submittedOnDailyBy":{"_id":"642656cbad1e3b0e6e91b752","avatarUrl":"/avatars/3bf0ee15fd528e09b2b889f5cce3cbd0.svg","isPro":false,"fullname":"Jie Zhu","user":"amazingj","type":"user"},"summary":"Effective reasoning remains a core challenge for large language models (LLMs)\nin the financial domain, where tasks often require domain-specific knowledge,\nprecise numerical calculations, and strict adherence to compliance rules. We\npropose DianJin-R1, a reasoning-enhanced framework designed to address these\nchallenges through reasoning-augmented supervision and reinforcement learning.\nCentral to our approach is DianJin-R1-Data, a high-quality dataset constructed\nfrom CFLUE, FinQA, and a proprietary compliance corpus (Chinese Compliance\nCheck, CCC), combining diverse financial reasoning scenarios with verified\nannotations. Our models, DianJin-R1-7B and DianJin-R1-32B, are fine-tuned from\nQwen2.5-7B-Instruct and Qwen2.5-32B-Instruct using a structured format that\ngenerates both reasoning steps and final answers. To further refine reasoning\nquality, we apply Group Relative Policy Optimization (GRPO), a reinforcement\nlearning method that incorporates dual reward signals: one encouraging\nstructured outputs and another rewarding answer correctness. We evaluate our\nmodels on five benchmarks: three financial datasets (CFLUE, FinQA, and CCC) and\ntwo general reasoning benchmarks (MATH-500 and GPQA-Diamond). Experimental\nresults show that DianJin-R1 models consistently outperform their non-reasoning\ncounterparts, especially on complex financial tasks. Moreover, on the\nreal-world CCC dataset, our single-call reasoning models match or even surpass\nthe performance of multi-agent systems that require significantly more\ncomputational cost. These findings demonstrate the effectiveness of DianJin-R1\nin enhancing financial reasoning through structured supervision and\nreward-aligned learning, offering a scalable and practical solution for\nreal-world applications.","upvotes":12,"discussionId":"680dcc5e3478de07603a807e","ai_summary":"A reasoning-enhanced framework, DianJin-R1, is proposed to improve financial domain reasoning in LLMs through augmented supervision and reinforcement learning, achieving superior performance on financial benchmarks.","ai_keywords":["reasoning-augmented supervision","reinforcement learning","DianJin-R1-Data","Qwen2.5-7B-Instruct","Qwen2.5-32B-Instruct","Group Relative Policy Optimization","GRPO","structured outputs","answer correctness","CFLUE","FinQA","CCC","MATH-500","GPQA-Diamond","single-call reasoning","multi-agent systems"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"642656cbad1e3b0e6e91b752","avatarUrl":"/avatars/3bf0ee15fd528e09b2b889f5cce3cbd0.svg","isPro":false,"fullname":"Jie Zhu","user":"amazingj","type":"user"},{"_id":"67b1a0d17c5844c6bcf091db","avatarUrl":"/avatars/bf933b81eb35b529741c1704ad8ef73b.svg","isPro":false,"fullname":"benqi","user":"Aidabenk","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"67b19ed4626cd81034706905","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67b19ed4626cd81034706905/VZrmCTkaBtT0IiRCVKATJ.png","isPro":false,"fullname":"Daixiny","user":"DaiyuXX","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"665b133508d536a8ac804f7d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Uwi0OnANdTbRbHHQvGqvR.png","isPro":false,"fullname":"Paulson","user":"Pnaomi","type":"user"},{"_id":"666aad704649c647cfa4a87e","avatarUrl":"/avatars/da24a587a58d65c1f92e27d2f779eee1.svg","isPro":false,"fullname":"Muthuraj Ponnusamy","user":"radirajjj","type":"user"},{"_id":"64d4615cf8082bf19b916492","avatarUrl":"/avatars/8e1b59565ec5e4b31090cf1b911781b9.svg","isPro":false,"fullname":"wongyukim","user":"wongyukim","type":"user"},{"_id":"5f43448a79c1ba4c353d0d8f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5f43448a79c1ba4c353d0d8f/DiSygV3dn7A_OjmGVTrHD.jpeg","isPro":true,"fullname":"Sugato Ray","user":"sugatoray","type":"user"},{"_id":"5e4d7e6a37cb5b49818287c1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e4d7e6a37cb5b49818287c1/-lu-COKEu04rLFkowvJ93.png","isPro":false,"fullname":"Pratik Bhavsar","user":"pratikbhavsar","type":"user"},{"_id":"663ccbff3a74a20189d4aa2e","avatarUrl":"/avatars/83a54455e0157480f65c498cd9057cf2.svg","isPro":false,"fullname":"Nguyen Van Thanh","user":"NguyenVanThanhHust","type":"user"},{"_id":"6578f357e390cfd409bda675","avatarUrl":"/avatars/c513b5f953dd787ed8bdb9e7cf31ed9e.svg","isPro":false,"fullname":"Chris Concannon","user":"choncan","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
A reasoning-enhanced framework, DianJin-R1, is proposed to improve financial domain reasoning in LLMs through augmented supervision and reinforcement learning, achieving superior performance on financial benchmarks.
AI-generated summary
Effective reasoning remains a core challenge for large language models (LLMs)
in the financial domain, where tasks often require domain-specific knowledge,
precise numerical calculations, and strict adherence to compliance rules. We
propose DianJin-R1, a reasoning-enhanced framework designed to address these
challenges through reasoning-augmented supervision and reinforcement learning.
Central to our approach is DianJin-R1-Data, a high-quality dataset constructed
from CFLUE, FinQA, and a proprietary compliance corpus (Chinese Compliance
Check, CCC), combining diverse financial reasoning scenarios with verified
annotations. Our models, DianJin-R1-7B and DianJin-R1-32B, are fine-tuned from
Qwen2.5-7B-Instruct and Qwen2.5-32B-Instruct using a structured format that
generates both reasoning steps and final answers. To further refine reasoning
quality, we apply Group Relative Policy Optimization (GRPO), a reinforcement
learning method that incorporates dual reward signals: one encouraging
structured outputs and another rewarding answer correctness. We evaluate our
models on five benchmarks: three financial datasets (CFLUE, FinQA, and CCC) and
two general reasoning benchmarks (MATH-500 and GPQA-Diamond). Experimental
results show that DianJin-R1 models consistently outperform their non-reasoning
counterparts, especially on complex financial tasks. Moreover, on the
real-world CCC dataset, our single-call reasoning models match or even surpass
the performance of multi-agent systems that require significantly more
computational cost. These findings demonstrate the effectiveness of DianJin-R1
in enhancing financial reasoning through structured supervision and
reward-aligned learning, offering a scalable and practical solution for
real-world applications.