ValueError: Mixed precision training with AMP or APEX (`--fp16`) and FP16 evaluation can only be used on CUDA devices

ayushmehta · June 17, 2021, 2:07am

I am trying to tune Wav2Vec2 Model with a dataset on my local device using my CPU (I don’t have a GPU or Google Colab pro), I am using this as my reference. When I try to execute

from transformers import TrainingArguments

training_args = TrainingArguments(
  # output_dir="/content/gdrive/MyDrive/wav2vec2-base-timit-demo",
  output_dir="./wav2vec2-medical",
  group_by_length=True,
  per_device_train_batch_size=32,
  evaluation_strategy="steps",
  num_train_epochs=30,
  fp16=True,
  save_steps=500,
  eval_steps=500,
  logging_steps=500,
  learning_rate=1e-4,
  weight_decay=0.005,
  warmup_steps=1000,
  save_total_limit=2,
)

I am getting following error:


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-f9014a6221db> in <module>
      1 from transformers import TrainingArguments
      2 
----> 3 training_args = TrainingArguments(
      4   # output_dir="/content/gdrive/MyDrive/wav2vec2-base-timit-demo",
      5   output_dir="./wav2vec2-medical",

~/Library/Python/3.8/lib/python/site-packages/transformers/training_args.py in __init__(self, output_dir, overwrite_output_dir, do_train, do_eval, do_predict, evaluation_strategy, prediction_loss_only, per_device_train_batch_size, per_device_eval_batch_size, per_gpu_train_batch_size, per_gpu_eval_batch_size, gradient_accumulation_steps, eval_accumulation_steps, learning_rate, weight_decay, adam_beta1, adam_beta2, adam_epsilon, max_grad_norm, num_train_epochs, max_steps, lr_scheduler_type, warmup_ratio, warmup_steps, logging_dir, logging_strategy, logging_first_step, logging_steps, save_strategy, save_steps, save_total_limit, no_cuda, seed, fp16, fp16_opt_level, fp16_backend, fp16_full_eval, local_rank, tpu_num_cores, tpu_metrics_debug, debug, dataloader_drop_last, eval_steps, dataloader_num_workers, past_index, run_name, disable_tqdm, remove_unused_columns, label_names, load_best_model_at_end, metric_for_best_model, greater_is_better, ignore_data_skip, sharded_ddp, deepspeed, label_smoothing_factor, adafactor, group_by_length, length_column_name, report_to, ddp_find_unused_parameters, dataloader_pin_memory, skip_memory_metrics, use_legacy_prediction_loop, push_to_hub, resume_from_checkpoint, mp_parameters)

~/Library/Python/3.8/lib/python/site-packages/transformers/training_args.py in __post_init__(self)
    609 
    610         if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
--> 611             raise ValueError(
    612                 "Mixed precision training with AMP or APEX (`--fp16`) and FP16 evaluation can only be used on CUDA devices."
    613             )

ValueError: Mixed precision training with AMP or APEX (`--fp16`) and FP16 evaluation can only be used on CUDA devices.

I understand that the error is because I am not using a GPU, as mixed precision can not be carried out without a GPU. I want to run it on my CPU, how can I resolve the error

sgugger · June 17, 2021, 12:10pm

You should remove fp16=True then.

ayushmehta · June 28, 2021, 5:01am

Issue solved, thank you for replying

NikhilKhodake · March 8, 2023, 5:09am

how?? Have you removed fp16=True ?? or you have made some other changes?

indrasary · May 31, 2023, 12:03am

I change the fp16 parameter to become False, but the consequence is the training process took much longer time…

happywill · July 22, 2023, 3:48am

Me too. Is there any other way?

happywill · July 22, 2023, 3:55am

It’s settled. It’s because I’m not in a cuda environment. You need to execute the command conda activate ‘your environment’

martinwunderlich · July 25, 2023, 7:31am

I got the same error running on a Macbook Pro with M1 CPU. Here the device is “mps”, not “cuda”. I presume that’s the root cause for the error. The device is correctly set up, but doesn’t seem to be supported by PyTorch, yet. There is a discussion around this here:

github.com/huggingface/transformers

TrainingArguments does not support `mps` device (Mac M1 GPU)

opened 06:55PM - 30 Jun 22 UTC

closed 11:04AM - 16 Aug 22 UTC

saattrupdan

bug

### System Info - `transformers` version: 4.21.0.dev0 - Platform: macOS-12.4-a…rm64-arm-64bit - Python version: 3.8.9 - Huggingface_hub version: 0.8.1 - PyTorch version (GPU?): 1.12.0 (False) - Tensorflow version (GPU?): not installed (NA) - Flax version (CPU?/GPU?/TPU?): not installed (NA) - Jax version: not installed - JaxLib version: not installed - Using GPU in script?: yes - Using distributed or parallel set-up in script?: no ### Who can help? @sgugger ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below) ### Reproduction ```bash export TASK_NAME=wnli python run_glue.py \ --model_name_or_path bert-base-cased \ --task_name $TASK_NAME \ --do_train \ --do_eval \ --max_seq_length 128 \ --per_device_train_batch_size 32 \ --learning_rate 2e-5 \ --num_train_epochs 3 \ --output_dir /tmp/$TASK_NAME/ ``` ### Expected behavior When running the `Trainer.train` on a machine with an MPS GPU, it still just uses the CPU. I expected it to use the MPS GPU. This is supported by `torch` in the newest version 1.12.0, and we can check if the MPS GPU is available using `torch.backends.mps.is_available()`. It seems like the issue lies in the [`TrainingArguments._setup_devices` method](https://github.com/huggingface/transformers/blob/49cd736a288a315d741e5c337790effa4c9fa689/src/transformers/training_args.py#L1266), which doesn't appear to allow for the case where `device = "mps"`.

Not sure how to fix the problem, though. Setting both fp16 and fp16_full_eval=False didn’t help

himanshusrivastava · August 9, 2023, 5:47am

yes working now, thank you

mehti · April 24, 2024, 7:25pm

Might be a bit late to answer this question now, but could be useful for others.
I have a bash script which cleans up the cache in GPUs:

sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm

After running these commands F16 error disappears.

Topic		Replies	Views
ValueError: Mixed precision training with AMP or APEX (`--fp16` or `--bf16`) and half precision evaluation (`--fp16_full_eval` or `--bf16_full_eval`) can only be used on CUDA devices 🤗Transformers	0	1997	May 17, 2022
Fine-tuning Wav2Vec2 for English ASR with 🤗 on local machine Transformers 🤗Transformers	1	476	August 10, 2021
Training GPT2 on CPUs? 🤗Transformers	4	1739	October 17, 2020
Quesiton about bf16 in Transformers 🤗Transformers	2	160	October 13, 2025
Tensor types mismatch when trying to enable GPU Beginners	0	1031	June 16, 2023

ValueError: Mixed precision training with AMP or APEX (`--fp16`) and FP16 evaluation can only be used on CUDA devices

Related topics