ModernBERT Pretraining using HuggingFace API

akhooli · January 5, 2025, 11:32am

I am trying to train a ModernBERT model using the HF stack. Since this is for a new language, I will also train the tokenizer (which is the easy part). Since ModernBERT uses dynamic padding and global/local attention, I haven’t seen examples of such support in HF. Am I missing something or should I wait?

ValdeJunior · January 6, 2025, 7:25am

Dynamic Padding: This is already supported by the Hugging Face library, so you don’t need to worry about it.
Global/Local Attention: There is no direct out-of-the-box support for this in Hugging Face at the moment. You will need to subclass and modify the attention mechanism yourself.
Training a Tokenizer: This is the easy part, as you’ve already pointed out. Ensure that the tokenizer fits your new language and handle any specific tokenization nuances that come with it.

Padajno · March 17, 2025, 8:26am

An updates on this?

John6666 · March 17, 2025, 11:34am

This page?

Blog article from last year.

Topic		Replies	Views
ModernBertForQuestionAnswering does not exist? 🤗Transformers	5	315	February 17, 2025
Training ModernBert+GPT2 Beginners	4	417	January 16, 2025
deBERTa v3 implementation in HuggingFace (with RTD training) 🤗Transformers	5	412	July 12, 2025
Further Pretrain Basic BERT for sequence classification 🤗Transformers	4	1897	October 9, 2020
Hugging Face pretrained models in android studio Beginners	0	963	August 28, 2023

ModernBERT Pretraining using HuggingFace API

Related topics