Resuming training BERT from scratch with run_mlm.py

Initiated training BERT from scratch with run_mlm.py as follows:

python run_mlm.py --model_type bert
–train_file ./data/mk.txt --output_dir ./models/bert-base-uncased
–overwrite_output_dir --tokenizer_name ./models/bert-base-uncased
–line_by_line True --do_train
–per_device_train_batch_size 4 --num_train_epochs 100
–save_steps 100000 --save_total_limit 500
–max_seq_length 512 --logging_steps 500
–use_fast_tokenizer --report_to wandb
–disable_tqdm True `

Training stopped due to power outage, having saved latest checkpoint:
.\models\bert-base-uncased\checkpoint-1700000

Which is the most appropriate command, give initial one, to resume training from the last saved checkpoint, and preserving all of the parameters mentioned above?

hi @striki-ai

if you remove the --overwrite_output_dir option and run the same command again, then the script will detect the last checkpoint and resume training from there.

Related to this question, I’ve been trying to continue training, but with a new/lower learning rate. How do I do that?