We started compiling a wiki of how different models were pre-trained, please add your knowledge there - thanks!
stas
3
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Finetuning for fp16 compatibility | 2 | 1770 | June 17, 2021 | |
| Model pre-training precision database: fp16, fp32, bf16 | 4 | 7284 | December 3, 2022 | |
| Training Loss = 0.0, Validation Loss = nan | 6 | 14786 | September 5, 2023 | |
| T5 fp16 issue is fixed | 18 | 15598 | June 20, 2024 | |
| FP-16 training producing nans on t5-large/flan-t5-xl | 0 | 796 | June 1, 2023 |