Hi everyone,
I am seeking an arXiv endorsement for cs.LG (Machine Learning) to submit my first paper on RL fine-tuning for vision-language models.
Background:
MS in AI (Purdue), working on RL + VLM training systems.
Paper:
A Case Study of Staged Metric-Gated GRPO for Visual Numeric Reasoning
PDF:
https://github.com/kgaero/RL_GSPO_Qwen2.5VLM/blob/main/paper/staged_metric_gated_grpo.pdf
Short summary:
-
Staged RL fine-tuning pipeline for VLMs (GRPO-based)
-
Curriculum over MathVista subsets
-
Metric-gated reward adaptation (structure → correctness)
-
Checkpoint-aware continuation via alias-based selection
Main result:
Exact-match improves 0.375 → 0.75 with stable structure under constrained compute.
If you’re eligible to endorse (cs.LG or related), I’d greatly appreciate it.
Happy to share endorsement details via DM.
Thanks!