23. Jänner 2019
End-to end speech synthesis
Seminar Room, Wohllebengasse 12-14 / Ground Floor
Personalized (almost) end-to-end speech synthesis - Markus Toman
Text-To-Speech (TTS) systems tradtionally encode linguistic and acoustic domain knowledge in form of vast codebases, hand-crafted rules and statistical models. Recent advances in machine learning led to the gradual replacement of individual components of such systems with neural networks. This talk highlights the most important aspects of this shift towards end-to-end synthesis, where (almost) the whole process of generating waveforms from text is performed by a neural network, inferring domain knowledge exclusively from data. The mechanics of prominent model architectures like WaveNet and Tacotron are presentend and specific challenges of personalized speech synthesis, like speaker adaptation and multi-speaker models, are also addressed.