TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them Paper • 2509.21117 • Published 5 days ago • 25
MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML Paper • 2509.06806 • Published 22 days ago • 63
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper • 2508.13167 • Published Aug 6 • 127
AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators Paper • 2508.09101 • Published Aug 12 • 8
Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study Paper • 2508.09776 • Published Aug 13 • 3
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models Paper • 2508.09968 • Published Aug 13 • 15
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models Paper • 2508.09138 • Published Aug 12 • 36
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL Paper • 2508.07976 • Published Aug 11 • 49
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens Paper • 2508.01191 • Published Aug 2 • 236
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published Jul 22 • 119
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR Paper • 2507.15778 • Published Jul 21 • 20
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data Paper • 2507.07095 • Published Jul 9 • 54