Transferability for General Reasoning: An Automated Curriculum for Multi-Domain RLVR
Abstract
Transfer-Aware Curriculum (TAC) improves multi-domain reinforcement learning by prioritizing domains that provide broad benefits to other domains, using gradient-geometry alignment to estimate cross-domain transferability.
Reinforcement learning with verifiable rewards (RLVR) has been extended from single-domain training to multi-domain reasoning suites spanning mathematics, programming, and science. However, the training curriculum (how often each domain is sampled) is typically fixed or hand-tuned, even though reasoning skills transfer unevenly across domains. Existing learnability-based curricula adapt to where the policy is currently improving, but are blind to whether a gradient step on the selected domain benefits the remaining domains. In this paper, we propose Transfer-Aware Curriculum (TAC), a bandit-style online curriculum that prioritizes domains whose updates broadly benefit the rest of the training suite. TAC repurposes signals already produced by RL training: per-domain advantages capture local learnability, and projected gradients, taken from the GRPO step being computed, estimate cross-domain transferability via gradient-geometry alignment, at negligible cost (<1% wall-clock overhead). Across a six-domain reasoning suite, TAC achieves the best macro-averaged accuracy on both Qwen3-1.7B and Llama3.2-3B, outperforming proportional random sampling, a hand-designed schedule, and a learnability-only bandit, and improving over the last of these by up to 2.8 points (10% relative). Ablations show performance degrades sharply when the transferability term is removed, and TAC remains robust on imbalanced training mixtures where learnability-only curricula over-commit to dominant domains. Our findings establish cross-domain transferability as a key signal for curriculum design in multi-domain RLVR.
Community
TAC studies general reasoning through the lens of transferability: instead of asking whether post-training improves performance on its source domain, we ask how well the learned behavior transfers across held-out domains.
Across 14 benchmarks in 6 domains and two backbones, TAC improves macro-average accuracy and reveals a surprising pattern: math, often treated as a central RLVR domain, is among the least transferable.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning (2026)
- Harmony in Diversity: Multi-domain Contrastive Policy Optimization for Large Reasoning Models (2026)
- CARE-RL: Capability-Aware Reinforcement Learning for Mitigating Cross-Domain Conflicts (2026)
- TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL (2026)
- General Preference Reinforcement Learning (2026)
- A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL (2026)
- What Training Data Teaches RL Memory Agents: An Empirical Study of Curriculum Effects in Memory-Augmented QA (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2606.25178 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper