Nov 3 – 4, 2022
Europe/Prague timezone

Using large-scale pre-trained models for building speech applications

Nov 4, 2022, 9:00 AM
atrium (IT4Innovations)



Studentská 6231/1B 708 00 Ostrava-Poruba
Keynote Users' talks Keynote III


Dr Oldrich Plchot (Brno University of Technology)


Recently, self-supervised Transformer-based models have become an integral part of state-of-the-art speech modeling and are being integrated into many speech applications such as Automatic Speech Recognition (ASR), Speaker Verification (SV), Language Identification (LID), emotion detection, etc. These models are trained on datasets comprising tens or even hundreds of thousands of speech and can reach several hundreds of millions of parameters. In my talk, I will briefly overview their architecture and a self-supervised training paradigm based on masked speech prediction. Later on, I will describe a use case in speaker verification where we use these already pre-trained models, which we subsequently fine-tune to serve as powerful feature extractors for speaker embedding extraction. I will also discuss methods that can be employed for fine-tuning such large models when there is only a relatively small amount of target and labeled data available.

Primary author

Dr Oldrich Plchot (Brno University of Technology)

Presentation materials

There are no materials yet.