TRL documentation
GSPO-token
Getting started
Conceptual Guides
How-to guides
Command Line Interface (CLI)Training using JobsCustomizing the TrainingReducing Memory UsageSpeeding Up TrainingDistributing TrainingUsing Trained Models
Integrations
Examples
API
Trainers
Experimental
You are viewing v0.25.0 version. A newer version v1.5.1 is available.
GSPO-token
In the paper Group Sequence Policy Optimization, the authors propose a token-level objective variant to GSPO, called GSPO-token. To use GSPO-token, you can use the GRPOTrainer class in trl.experimental.gspo_token.
Usage
from trl.experimental.gspo_token import GRPOTrainer
from trl import GRPOConfig
training_args = GRPOConfig(
importance_sampling_level="sequence_token",
...
)Update on GitHubTo leverage GSPO-token, the user will need to provide the per-token advantage for each token in the sequence (i.e., make varies with —which isn’t the case here, ). Otherwise, GSPO-Token gradient is just equivalent to the original GSPO implementation.