Hello @lewtun, how do I tokensize the corpus after adding the new tokens? Also, this issue in conjunction with this issue I am having: [HELP] How to include emojis in masked language modelling?
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Getting Error while adding new tokens in vocab | 2 | 2726 | June 19, 2022 | |
| How to add new tokens for existing masked language modelling? | 3 | 723 | June 11, 2021 | |
| NameError: name 'BertTokenizer' is not defined | 2 | 8084 | June 11, 2021 | |
| Adding a new mask_token for BERT-like models/tokenizers | 0 | 583 | May 26, 2023 | |
| "IndexError: index out of range in self" for bert LM example on https://huggingface.co/transformers/quickstart.html | 2 | 6402 | October 29, 2020 |