How does XLSR-Wav2Vec2 behave on noisy data?

I would like to train own ASR system where the environment is very noisy.

If someone has experience on the same topic it would be great to listen to you here.

I’d like to give some feedback from myself to the subject.

We have noised the Common Voice 10 with Dmytro Chaplynsky and I successfully trained a model on the data.

The published model: Yehor/wav2vec2-xls-r-300m-uk-with-small-lm-noisy · Hugging Face

The noised data: GitHub - egorsmkv/speech-recognition-uk: Speech Recognition for Ukrainian

This model is trained for Ukrainian.

I have posted metrics in the HF page.