What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT Paper • 2509.19284 • Published 4 days ago • 19
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks? Paper • 2509.16941 • Published 7 days ago • 19
Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels Paper • 2509.16596 • Published 7 days ago • 11
Developer-LLM Conversations: An Empirical Study of Interactions and Generated Code Quality Paper • 2509.10402 • Published 15 days ago • 4
Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation Paper • 2509.15185 • Published 9 days ago • 27
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation Paper • 2509.15194 • Published 9 days ago • 32
FlowRL: Matching Reward Distributions for LLM Reasoning Paper • 2509.15207 • Published 9 days ago • 100
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning Paper • 2509.13761 • Published 11 days ago • 14
EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving Paper • 2509.12603 • Published 12 days ago • 8
The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward Paper • 2509.07430 • Published 19 days ago • 3
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning Paper • 2509.08755 • Published 17 days ago • 54
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published 17 days ago • 167
Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding Paper • 2509.06923 • Published 19 days ago • 21
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search Paper • 2509.07969 • Published 18 days ago • 59
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published 18 days ago • 96
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers Paper • 2509.06493 • Published 19 days ago • 10