Training ModernBert+GPT2

Hello,
I am trying to train an encoder-decoder model that uses ModernBert as the encoder and GPT2 as the decoder. I had hoped that this would be straightforward enough using HF provided classes/trainers for Seq2Seq but have run into an error I have not been able to debug. Currently I do the following -

tokenizer_MBert = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-base", device_map = 'cuda:0')
model = EncoderDecoderModel.from_encoder_decoder_pretrained("answerdotai/ModernBERT-base", "gpt2",
                                                             pad_token_id=tokenizer_MBert.eos_token_id, 
                                                             device_map = 'cuda:0')
model.decoder.config.use_cache = False
model.gradient_checkpointing_enable()
           
tokenizer_MBert.bos_token = tokenizer_MBert.cls_token
tokenizer_MBert.eos_token = tokenizer_MBert.sep_token
tokenizer_MBert.pad_token = tokenizer_MBert.unk_token

def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):
    outputs = [self.bos_token_id] + token_ids_0 + [self.eos_token_id]
    return outputs

GPT2Tokenizer.build_inputs_with_special_tokens = build_inputs_with_special_tokens
gpt2_tokenizer = GPT2Tokenizer.from_pretrained("gpt2", device_map = 'cuda:0')
gpt2_tokenizer.pad_token = gpt2_tokenizer.unk_token

model.config.decoder_start_token_id = gpt2_tokenizer.bos_token_id
model.config.pad_token_id = tokenizer_MBert.pad_token_id
model.config.eos_token_id = gpt2_tokenizer.eos_token_id
model.config.no_repeat_ngram_size = 3
model.early_stopping = True
model.length_penalty = 3.0
model.num_beams = 2

data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer_MBert, model=model)
optimizer = 'adamw_torch'
lr_scheduler = 'linear'

training_args = Seq2SeqTrainingArguments(
    output_dir="./MBert_GPT2",
    eval_strategy="steps",
    eval_steps=2000,
    save_strategy="steps",
    save_steps=2000,
    logging_steps=100,
    max_steps=10000,
    do_eval=True,
    optim=optimizer,
    gradient_checkpointing=True,
    gradient_checkpointing_kwargs={'use_reentrant':False},
    learning_rate=2e-5,
    log_level="debug",
    per_device_train_batch_size=20,
    per_device_eval_batch_size=20,
    lr_scheduler_type=lr_scheduler,
    bf16=True,
    report_to="wandb",
    run_name="MBert_GPT2",
    seed=42,
    predict_with_generate=True,
    generation_max_length=300
)

trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset['train'],
    eval_dataset=tokenized_dataset['test'],
    tokenizer=tokenizer_MBert,
    data_collator=data_collator,
)
trainer.train()

The tokenized_dataset contains the input_ids and labels.
This is also using the latest version of transformers right from their git page.
The training is started using notebook_launcher from accelerate and then it gives this error -

TypeError: ModernBertModel.forward() got an unexpected keyword argument 'inputs_embeds'

I have looked at the modernBert forward code and have seen that it indeed does not take in inputs_embeds as an input, but I was under the impression that since I was providing the input_ids, no input_embds should have been passed through during the training. I am not sure if ModernBert is not meant to be used in an Encoder-Decoder setup or if I have just implemented it incorrectly. Any help would be appreciated.

It looks like the following will be quicker for questions about ModernBERT.

Can you please try again?

Thank you for the update, with the resolution of the git issue, the error that I was facing has also been resolved