course

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Tokenizers, check!

Great job finishing this chapter!

After this deep dive into tokenizers, you should:

Be able to train a new tokenizer using an old one as a template
Understand how to use offsets to map tokens’ positions to their original span of text
Know the differences between BPE, WordPiece, and Unigram
Be able to mix and match the blocks provided by the 🤗 Tokenizers library to build your own tokenizer
Be able to use that tokenizer inside the 🤗 Transformers library