Whisper medium finetuning RTX 4090 mostly stays idle

truefriend2004 · December 1, 2024, 5:00pm

I am finetuning whisper medium using this guide. [Important detail: GPU is connected through a thunderbolt 4]. The data is mozilla common voice 17 (approx. 9000 test, 4000 train). The GPU mostly stays idle and CPU (core i7 13 gen) works at 10-30 percent all the time.
Does that mean I do not have enough CPU resources to feed the GPU?
Should I add something like dataloader_num_workers to 2 or 4 as suggested in this post?
Has thunderbolt bottleneck something to do with it?

bozden · December 1, 2024, 5:30pm

I don’t know anything about thunderbolt, but here are some ideas:

“medium” model is calculation intensive, so data will be quickly loaded and it should calculate much longer. So loading would not be a problem.
The dataset is quite small, GPU is quite powerful, so given some batch size, one epoch should finish relatively quickly.
Thunderbolt 4 has 40 Gbps theoretical limit, but you can easily get 2 GB/sec transfers, which is more than enough for your case.

I think we cannot deduce more without the following info:

Which language is it (language code)? Which splits do you use? Default ones?
What are your training parameters?
Are you sure you are using GPU version of pytorch?

truefriend2004 · December 1, 2024, 6:17pm

Thanks for your reply.
The language is Urdu, and the training parameters are exactly as in the original guide. Also pasting here:

training_args = Seq2SeqTrainingArguments(
    output_dir="./whisper-medium-ur",  # change to a repo name of your choice
    per_device_train_batch_size=16,
    gradient_accumulation_steps=1,  # increase by 2x for every 2x decrease in batch size
    learning_rate=1e-5,
    warmup_steps=500,
    max_steps=5000,
    gradient_checkpointing=True,
    fp16=True,
    evaluation_strategy="steps",
    per_device_eval_batch_size=8,
    predict_with_generate=True,
    generation_max_length=225,
    save_steps=1000,
    eval_steps=1000,
    logging_steps=25,
    report_to=["tensorboard"],
    load_best_model_at_end=True,
    metric_for_best_model="wer",
    greater_is_better=False,
    push_to_hub=False,
)

I tried to increase batch size to 32 or tweaked with gradient_accumulation_steps to increase batch size (for example a value of 4) but the progress stopped and the estimated time increased after every such attempt. Since I do not understand these, I kept them as they were. Based on these default settings, I have spent 7.5 hours to get to 93% of training at this moment (whisper medium). My GPU occasionally has a spike and that’s it (the graph in Task Manager is mostly empty; this is Windows 11), 19.5 of 24 GB VRAM is full and CPU is constantly at 10-30% (mostly around 20%). pytorch version is ‘2.5.1+cu124’ (I installed it by selecting CUDA 12.4 on start locally).
Edit: my dataset split:

from datasets import load_dataset, DatasetDict

common_voice = DatasetDict()

common_voice["train"] = load_dataset("mozilla-foundation/common_voice_17_0", "ur", split="train+validation", trust_remote_code=True)

common_voice["test"] = load_dataset("mozilla-foundation/common_voice_17_0", "ur", split="test", trust_remote_code=True)

bozden · December 7, 2024, 10:58am

Sorry for the late reply. I’m guessing: You look at general “utilization”.

Win 11 Task Manager does not show CUDA usage by default.
AFAIK the “utilization” does not take CUDA usage into account
By default it gives summary view

So that we are speaking of the same measure:

Disable summary view: Right click GPU on the left.
Disable HW acceleration

Now you can select CUDA to see actual utilization. IIRC stuff usually happens in cuda, copy 1 and copy 2, so select them from top left.

A better tool is nvidia-smi of course…

truefriend2004 · December 7, 2024, 6:23pm

Thanks. That’s what is happening. This is the CUDA utilization while I am running the training:

Topic		Replies	Views
Idle GPU when finetuning whisper tiny Beginners	1	469	May 29, 2023
[Open-to-the-community] Whisper fine-tuning event Community Calls	31	12511	December 10, 2023
Very Slow Fine Tuning Performance for Speech? 🤗Transformers	3	721	August 14, 2023
Cuda out of memory issue training whisper model on single GPU Intermediate	0	981	December 15, 2023
Help needed with issues while trying fine-tune Whisper Beginners	2	1498	April 19, 2024

Whisper medium finetuning RTX 4090 mostly stays idle

Related topics