Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 19 days ago • 195
🧬 Carbon Collection Carbon 500M, 3B, 8B genomic models and GGUF variants for llama.cpp • 7 items • Updated 2 days ago • 38
Stabilizing Efficient Reasoning with Step-Level Advantage Selection Paper • 2604.24003 • Published Apr 27 • 8