I’m pretraining my dataset on Longformer. My loss function quickly dropped to around 8, but it hasn’t decreased further even after a long time (about 12 hours). I’ve tried several lr schedular and decrease my lr, it seems like the same… Can someone tell me what to do to decrease the lose?

