Hi. Why was assistant_only_loss deprecated? which forces the use of completion_only_loss for jinja2 chattemplates and thus the conversion of datasets to prompt/completion format? I’m just interested to know for educational purposes. Thanks!
Hmm…?
This is mainly a TRL SFTTrainer story, not a plain “Transformers deprecated a flag” story. In current TRL docs, assistant_only_loss is still supported for conversational datasets. What changed is the preferred training path: TRL now centers prompt-completion datasets and completion_only_loss, while assistant_only_loss is treated as a narrower option that only works when the chat template can return assistant-token masks via {% generation %} / {% endgeneration %}. (Hugging Face)
The idea in one sentence
The stack moved from:
- old: “figure out the trainable region from rendered chat text afterward”
to:
- new: “define the trainable region explicitly in the dataset when possible” (GitHub)
Chronological timeline
| Period | What changed | Why it mattered | Sources |
|---|---|---|---|
| 2023 | The common pattern was still the old collator-based masking workflow, especially DataCollatorForCompletionOnlyLM. Around this time, users were already hitting practical limitations such as incompatibility with packing=True. |
This showed that masking “after formatting” did not fit cleanly with modern efficient SFT pipelines. | (Hugging Face) |
| Early 2024 | A Transformers feature request asked for apply_chat_template(..., tokenize=True) to return token masks so users could compute loss only on assistant tokens in multi-message chat. |
This was the first clear signal that delimiter-based masking was too weak for real chat data with multiple turns. | (GitHub) |
| Mid to late 2024 | Transformers added assistant-token-mask support in chat templating, but only for templates that support it. In practice, tokenizer/template bugs appeared for some models, including Llama 3 and Qwen2.5, and truncation could also break assistant masks. | The feature existed, but it proved fragile because it depended on template markup, tokenization, and truncation all lining up correctly. | (Hugging Face) |
| 2025 | TRL formalized dataset-type-aware SFT: conversational datasets can use assistant_only_loss; prompt-completion datasets use completion_only_loss, and for prompt-completion data that is the default behavior unless overridden. |
This is the architectural pivot. The training target moved from “infer it from text” to “read it from the dataset schema.” | (Hugging Face) |
| Late 2025 | The old DataCollatorForCompletionOnlyLM was removed. A TRL maintainer explicitly told users to switch to completion_only_loss=True with a prompt-completion dataset. Around the same period, users reported prompt-completion labeling issues while migrating, which led to fixes and clearer warnings. |
This is the practical migration point most users noticed. The old masking tool was gone, and the new expected path was explicit prompt/completion supervision. | (GitHub) |
| 2026 / current state | Current TRL docs still support assistant_only_loss=True, but only for conversational datasets with templates that can return assistant-token masks. They also say completion-only training is compatible with assistant-only training when using a conversational prompt-completion dataset. |
So the correct reading is not “assistant_only_loss disappeared.” The correct reading is “it became a specialized, template-dependent option, while prompt/completion became the safer default.” | (Hugging Face) |
Old workflow vs new workflow vs why the warning appears
| Topic | Old workflow | New workflow | Why the warning appears | Sources |
|---|---|---|---|---|
| Where the target is defined | The target span was often inferred from rendered text using templates or delimiters. | The target span is preferably defined in the dataset itself as prompt + completion. |
If TRL cannot reliably recover assistant spans from the template, it warns or errors instead of silently guessing. | (GitHub) |
| Typical data shape | Often messages or already-rendered chat text. |
Prefer {"prompt": ..., "completion": ...} or conversational prompt-completion. |
A plain conversational dataset does not automatically make assistant masking reliable; the template must expose assistant spans. | (Hugging Face) |
| Loss mode | Old code often used DataCollatorForCompletionOnlyLM to mask labels after formatting. |
completion_only_loss is the intended path for prompt-completion datasets. assistant_only_loss remains available for conversational datasets. |
The warning often appears when users expect assistant-only masking to work on a template that does not support return_assistant_tokens_mask. |
(GitHub) |
| Dependency on Jinja template | High, but often hidden. The boundary was recovered indirectly from formatting. | Still relevant for conversational data, but less central for prompt-completion because the boundary is already explicit in the dataset. | If the template lacks {% generation %}, assistant masks can be empty, and TRL will complain. |
(Hugging Face) |
| Common failure modes | Delimiter mismatch, tokenization quirks, packing incompatibility. | Fewer boundary-inference problems for prompt-completion, though template-related issues still exist for assistant masks. | The warning can also appear if truncation causes all assistant tokens to fall outside the retained sequence. | (GitHub) |
| What the library now prefers | Heuristic masking on top of rendered text. | Explicit prompt/completion supervision, then optional assistant masking when the template supports it. | The warning is TRL nudging you away from a fragile path toward the explicit one. | (Hugging Face) |
What the warning usually means
In plain English, the warning usually means one of these:
-
Your chat template does not emit assistant masks.
Transformers’ tokenizer docs sayreturn_assistant_tokens_mask=Trueonly works for chat templates that support it via{% generation %}. TRL’s docs say the same thing forassistant_only_loss=True. (Hugging Face) -
The assistant mask came back empty.
TRL has an error path that says: ifassistant_only_loss=Truebut an example has no assistant tokens, that usually means the template does not generate assistant masks and may be missing{% generation %}. (GitHub) -
Truncation can also cause it.
There are issues showing that assistant masks can become effectively empty after truncation, even when the template conceptually supports them. (GitHub)
Bottom line
The clean takeaway is:
- Not really deprecated:
assistant_only_lossstill exists and is documented. (Hugging Face) - What actually changed: the old collator-centric workflow was removed, and TRL now prefers
prompt/completiondata withcompletion_only_loss. (GitHub) - Why: explicit dataset boundaries are more robust than inferring assistant spans from Jinja-rendered chat text. (GitHub)
Brilliant! thank you so much @John6666 for your amazing explanation. I truly appreciate it! It makes perfect sense now. I will convert my code to prompt / completion.
This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.