pangpangxuan's picture

pangpangxuan

pangxuan

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 10 hours ago

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

upvoted a paper 1 day ago

DanceOPD: On-Policy Generative Field Distillation

upvoted a paper 3 days ago

OPERA: Aligning Open-Ended Reasoning via Objective Perplexity-based Reinforcement Learning

View all activity

Organizations

None yet

upvoted a paper about 10 hours ago

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

Paper • 2606.26790 • Published 3 days ago • 40

upvoted a paper 1 day ago

DanceOPD: On-Policy Generative Field Distillation

Paper • 2606.27377 • Published 3 days ago • 64

upvoted 2 papers 3 days ago

OPERA: Aligning Open-Ended Reasoning via Objective Perplexity-based Reinforcement Learning

Paper • 2606.25757 • Published 4 days ago • 1

NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?

Paper • 2606.24530 • Published 5 days ago • 61

upvoted a paper 4 days ago

Qwen-AgentWorld: Language World Models for General Agents

Paper • 2606.24597 • Published 5 days ago • 133

upvoted 2 papers 23 days ago

GrepSeek: Training Search Agents for Direct Corpus Interaction

Paper • 2605.29307 • Published about 1 month ago • 115

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Paper • 2606.02437 • Published 27 days ago • 235

upvoted 2 papers 26 days ago

COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

Paper • 2605.31264 • Published 30 days ago • 120

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

Paper • 2605.31584 • Published 30 days ago • 43

upvoted 2 papers 28 days ago

Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning

Paper • 2605.28424 • Published May 27 • 32

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

Paper • 2605.29801 • Published about 1 month ago • 144

upvoted 9 papers about 1 month ago

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

Paper • 2605.25874 • Published May 25 • 103

Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation

Paper • 2605.19833 • Published May 19 • 137

π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows

Paper • 2605.14678 • Published May 19 • 108

Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining

Paper • 2605.14747 • Published May 14 • 147

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

Paper • 2605.11609 • Published May 12 • 196

OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond

Paper • 2605.19660 • Published May 19 • 40

Code as Agent Harness

Paper • 2605.18747 • Published May 18 • 223

MMSkills: Towards Multimodal Skills for General Visual Agents

Paper • 2605.13527 • Published May 14 • 122

Self-Distilled Agentic Reinforcement Learning

Paper • 2605.15155 • Published May 14 • 116