ModernBERT Pretraining using HuggingFace API

I am trying to train a ModernBERT model using the HF stack. Since this is for a new language, I will also train the tokenizer (which is the easy part). Since ModernBERT uses dynamic padding and global/local attention, I haven’t seen examples of such support in HF. Am I missing something or should I wait?

  • Dynamic Padding: This is already supported by the Hugging Face library, so you don’t need to worry about it.
  • Global/Local Attention: There is no direct out-of-the-box support for this in Hugging Face at the moment. You will need to subclass and modify the attention mechanism yourself.
  • Training a Tokenizer: This is the easy part, as you’ve already pointed out. Ensure that the tokenizer fits your new language and handle any specific tokenization nuances that come with it.

An updates on this?

This page?

Blog article from last year.