Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients Paper • 2606.18216 • Published 9 days ago • 60
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling Paper • 2606.18023 • Published 9 days ago • 203
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research Paper • 2605.26114 • Published about 1 month ago • 65