Deprecation of assistant_only_loss

kegintheai · April 7, 2026, 8:18am

Hi. Why was assistant_only_loss deprecated? which forces the use of completion_only_loss for jinja2 chattemplates and thus the conversion of datasets to prompt/completion format? I’m just interested to know for educational purposes. Thanks!

John6666 · April 7, 2026, 9:21am

Hmm…?

This is mainly a TRL SFTTrainer story, not a plain “Transformers deprecated a flag” story. In current TRL docs, assistant_only_loss is still supported for conversational datasets. What changed is the preferred training path: TRL now centers prompt-completion datasets and completion_only_loss, while assistant_only_loss is treated as a narrower option that only works when the chat template can return assistant-token masks via {% generation %} / {% endgeneration %}. (Hugging Face)

The idea in one sentence

The stack moved from:

old: “figure out the trainable region from rendered chat text afterward”

to:

new: “define the trainable region explicitly in the dataset when possible” (GitHub)

Chronological timeline

Period	What changed	Why it mattered	Sources
2023	The common pattern was still the old collator-based masking workflow, especially `DataCollatorForCompletionOnlyLM`. Around this time, users were already hitting practical limitations such as incompatibility with `packing=True`.	This showed that masking “after formatting” did not fit cleanly with modern efficient SFT pipelines.	(Hugging Face)
Early 2024	A Transformers feature request asked for `apply_chat_template(..., tokenize=True)` to return token masks so users could compute loss only on assistant tokens in multi-message chat.	This was the first clear signal that delimiter-based masking was too weak for real chat data with multiple turns.	(GitHub)
Mid to late 2024	Transformers added assistant-token-mask support in chat templating, but only for templates that support it. In practice, tokenizer/template bugs appeared for some models, including Llama 3 and Qwen2.5, and truncation could also break assistant masks.	The feature existed, but it proved fragile because it depended on template markup, tokenization, and truncation all lining up correctly.	(Hugging Face)
2025	TRL formalized dataset-type-aware SFT: conversational datasets can use `assistant_only_loss`; prompt-completion datasets use `completion_only_loss`, and for prompt-completion data that is the default behavior unless overridden.	This is the architectural pivot. The training target moved from “infer it from text” to “read it from the dataset schema.”	(Hugging Face)
Late 2025	The old `DataCollatorForCompletionOnlyLM` was removed. A TRL maintainer explicitly told users to switch to `completion_only_loss=True` with a prompt-completion dataset. Around the same period, users reported prompt-completion labeling issues while migrating, which led to fixes and clearer warnings.	This is the practical migration point most users noticed. The old masking tool was gone, and the new expected path was explicit prompt/completion supervision.	(GitHub)
2026 / current state	Current TRL docs still support `assistant_only_loss=True`, but only for conversational datasets with templates that can return assistant-token masks. They also say completion-only training is compatible with assistant-only training when using a conversational prompt-completion dataset.	So the correct reading is not “assistant_only_loss disappeared.” The correct reading is “it became a specialized, template-dependent option, while prompt/completion became the safer default.”	(Hugging Face)

Old workflow vs new workflow vs why the warning appears

Topic	Old workflow	New workflow	Why the warning appears	Sources
Where the target is defined	The target span was often inferred from rendered text using templates or delimiters.	The target span is preferably defined in the dataset itself as `prompt` + `completion`.	If TRL cannot reliably recover assistant spans from the template, it warns or errors instead of silently guessing.	(GitHub)
Typical data shape	Often `messages` or already-rendered chat text.	Prefer `{"prompt": ..., "completion": ...}` or conversational prompt-completion.	A plain conversational dataset does not automatically make assistant masking reliable; the template must expose assistant spans.	(Hugging Face)
Loss mode	Old code often used `DataCollatorForCompletionOnlyLM` to mask labels after formatting.	`completion_only_loss` is the intended path for prompt-completion datasets. `assistant_only_loss` remains available for conversational datasets.	The warning often appears when users expect assistant-only masking to work on a template that does not support `return_assistant_tokens_mask`.	(GitHub)
Dependency on Jinja template	High, but often hidden. The boundary was recovered indirectly from formatting.	Still relevant for conversational data, but less central for prompt-completion because the boundary is already explicit in the dataset.	If the template lacks `{% generation %}`, assistant masks can be empty, and TRL will complain.	(Hugging Face)
Common failure modes	Delimiter mismatch, tokenization quirks, packing incompatibility.	Fewer boundary-inference problems for prompt-completion, though template-related issues still exist for assistant masks.	The warning can also appear if truncation causes all assistant tokens to fall outside the retained sequence.	(GitHub)
What the library now prefers	Heuristic masking on top of rendered text.	Explicit prompt/completion supervision, then optional assistant masking when the template supports it.	The warning is TRL nudging you away from a fragile path toward the explicit one.	(Hugging Face)

What the warning usually means

In plain English, the warning usually means one of these:

Your chat template does not emit assistant masks.
Transformers’ tokenizer docs say return_assistant_tokens_mask=True only works for chat templates that support it via {% generation %}. TRL’s docs say the same thing for assistant_only_loss=True. (Hugging Face)
The assistant mask came back empty.
TRL has an error path that says: if assistant_only_loss=True but an example has no assistant tokens, that usually means the template does not generate assistant masks and may be missing {% generation %}. (GitHub)
Truncation can also cause it.
There are issues showing that assistant masks can become effectively empty after truncation, even when the template conceptually supports them. (GitHub)

Bottom line

The clean takeaway is:

Not really deprecated: assistant_only_loss still exists and is documented. (Hugging Face)
What actually changed: the old collator-centric workflow was removed, and TRL now prefers prompt / completion data with completion_only_loss. (GitHub)
Why: explicit dataset boundaries are more robust than inferring assistant spans from Jinja-rendered chat text. (GitHub)

kegintheai · April 7, 2026, 1:01pm

Brilliant! thank you so much @John6666 for your amazing explanation. I truly appreciate it! It makes perfect sense now. I will convert my code to prompt / completion.

Topic		Replies	Views
SFT Conversation llama3-8b-Instruct fails with assistant_only_loss=True 🤗Transformers	2	226	February 5, 2026
Automatic -100 masking of the questions in Labels 🤗Transformers	1	23	May 21, 2026
SFTTrainerflags blocks assistant_only_loss=True 🤗Transformers	3	58	May 26, 2026
Best practice for usage of Data Collator For CompletionOnlyLM in multi-turn chat 🤗Transformers	2	1646	May 25, 2025
SFTTrainer loss function and formatting_func Beginners	8	590	January 17, 2026