ValueError: Mixed precision training with AMP or APEX (`--fp16`) and FP16 evaluation can only be used on CUDA devices

I am trying to tune Wav2Vec2 Model with a dataset on my local device using my CPU (I don’t have a GPU or Google Colab pro), I am using this as my reference. When I try to execute

from transformers import TrainingArguments

training_args = TrainingArguments(
  # output_dir="/content/gdrive/MyDrive/wav2vec2-base-timit-demo",
  output_dir="./wav2vec2-medical",
  group_by_length=True,
  per_device_train_batch_size=32,
  evaluation_strategy="steps",
  num_train_epochs=30,
  fp16=True,
  save_steps=500,
  eval_steps=500,
  logging_steps=500,
  learning_rate=1e-4,
  weight_decay=0.005,
  warmup_steps=1000,
  save_total_limit=2,
)

I am getting following error:


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-f9014a6221db> in <module>
      1 from transformers import TrainingArguments
      2 
----> 3 training_args = TrainingArguments(
      4   # output_dir="/content/gdrive/MyDrive/wav2vec2-base-timit-demo",
      5   output_dir="./wav2vec2-medical",

~/Library/Python/3.8/lib/python/site-packages/transformers/training_args.py in __init__(self, output_dir, overwrite_output_dir, do_train, do_eval, do_predict, evaluation_strategy, prediction_loss_only, per_device_train_batch_size, per_device_eval_batch_size, per_gpu_train_batch_size, per_gpu_eval_batch_size, gradient_accumulation_steps, eval_accumulation_steps, learning_rate, weight_decay, adam_beta1, adam_beta2, adam_epsilon, max_grad_norm, num_train_epochs, max_steps, lr_scheduler_type, warmup_ratio, warmup_steps, logging_dir, logging_strategy, logging_first_step, logging_steps, save_strategy, save_steps, save_total_limit, no_cuda, seed, fp16, fp16_opt_level, fp16_backend, fp16_full_eval, local_rank, tpu_num_cores, tpu_metrics_debug, debug, dataloader_drop_last, eval_steps, dataloader_num_workers, past_index, run_name, disable_tqdm, remove_unused_columns, label_names, load_best_model_at_end, metric_for_best_model, greater_is_better, ignore_data_skip, sharded_ddp, deepspeed, label_smoothing_factor, adafactor, group_by_length, length_column_name, report_to, ddp_find_unused_parameters, dataloader_pin_memory, skip_memory_metrics, use_legacy_prediction_loop, push_to_hub, resume_from_checkpoint, mp_parameters)

~/Library/Python/3.8/lib/python/site-packages/transformers/training_args.py in __post_init__(self)
    609 
    610         if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
--> 611             raise ValueError(
    612                 "Mixed precision training with AMP or APEX (`--fp16`) and FP16 evaluation can only be used on CUDA devices."
    613             )

ValueError: Mixed precision training with AMP or APEX (`--fp16`) and FP16 evaluation can only be used on CUDA devices.

I understand that the error is because I am not using a GPU, as mixed precision can not be carried out without a GPU. I want to run it on my CPU, how can I resolve the error

You should remove fp16=True then.

Issue solved, thank you for replying

how?? Have you removed fp16=True ?? or you have made some other changes?

I change the fp16 parameter to become False, but the consequence is the training process took much longer time…

Me too. Is there any other way?

It’s settled. It’s because I’m not in a cuda environment. You need to execute the command conda activate ‘your environment’

I got the same error running on a Macbook Pro with M1 CPU. Here the device is “mps”, not “cuda”. I presume that’s the root cause for the error. The device is correctly set up, but doesn’t seem to be supported by PyTorch, yet. There is a discussion around this here:

Not sure how to fix the problem, though. Setting both fp16 and fp16_full_eval=False didn’t help

yes working now, thank you

Might be a bit late to answer this question now, but could be useful for others.
I have a bash script which cleans up the cache in GPUs:

sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm

After running these commands F16 error disappears.