@patrickvonplaten Thanks for your reply and advice. I also found you and pcueng’s discussion about Spanish ASR with out-of-vocabulary (non-spanish character handling) in the transcriptions. I linked this just in case other people may be interested. Thank you for discussions.
Su-Youn
3
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Leaving unknown words untokenized like in OpenMNT | 0 | 273 | October 18, 2023 | |
| WordLevel error: Missing [UNK] token from the vocabulary | 4 | 3444 | October 27, 2022 | |
| Inference of finetuned wav2vec2-xls-r-300m model using the ASR pipeline does not remove special tokens | 2 | 565 | January 22, 2022 | |
| Wav2vec - <s></s> tokens | 0 | 327 | January 18, 2022 | |
| Find which tokens are unknown in new data | 0 | 569 | September 2, 2022 |