SftTrainer and mps (validation loss nan)

NiallRooney · March 26, 2024, 3:37pm

Whe I finetune a tinyLlama model using a sample of Alpaca data, the process trains ok in Colab however when I try to run this locally on a Macbook Ventura 13.6.4 using MPS , the validation loss is nan at the first step?

model = AutoModelForCausalLM.from_pretrained(
pretrained_model_name_or_path=“TinyLlama/TinyLlama-1.1B-Chat-v1.0”,
device_map=‘mps’,
trust_remote_code=True,
low_cpu_mem_usage=True,
torch_dtype=torch.float16
)

peft_config = LoraConfig(
r=16,
lora_alpha=16,
lora_dropout=0.1,
bias=“none”,
task_type=“CAUSAL_LM”,
target_modules=[“q_proj”, “k_proj”,“v_proj”,“o_proj”],
modules_to_save=None,
)

training_args = TrainingArguments(
output_dir=“./alpaca_output/”,
report_to=“none”,
per_device_train_batch_size=4,
per_device_eval_batch_size=4,
learning_rate=2e-4,
lr_scheduler_type=“cosine”,
num_train_epochs=1,
evaluation_strategy=“steps”,
# logging strategies
logging_strategy=“steps”,
logging_steps=1,
gradient_checkpointing=True,
gradient_accumulation_steps=1,
seed=1,
save_strategy=“epoch”,
)

trainer = SFTTrainer(
model,
peft_config=peft_config,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
packing=True,
max_seq_length=1024,
args=training_args,
formatting_func=create_alpaca_prompt

)

Topic		Replies	Views
Cannot train train transformer on Mac/MPS Beginners	0	354	June 1, 2024
Training Loss = 0.0, Validation Loss = nan Intermediate	6	14792	September 5, 2023
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead 🤗Transformers	2	9064	July 6, 2023
T5 variants return Training Loss 0 and Validation loss nan while fine tuning 🤗Transformers	8	5889	November 10, 2024
When Fine-Tune the google/vit-base-patch16-384, the train loss is 0 and the eval loss is NaN 🤗Transformers	9	890	January 19, 2024

SftTrainer and mps (validation loss nan)

Related topics