Play all

Intro

Outline

Text-to-speech as sequence-to-sequence mapping

Speech production process

Typical flow of TTS system

Speech synthesis approaches

Probabilistic formulation of TTS

Approximation (2)

Representation - Linguistic features

Representation - Acoustic features

Representation - Mapping

HMM-based generative acoustic model for TTS

Alternative acoustic model

FFNN-based acoustic model for TTS [6]

NN-based generative acoustic model for TTS

NN-based generative model for TTS

Learned features

WaveNet: A generative model for raw audio

WaveNet - Causal dilated convolution

WaveNet - Architecture

Softmax

WaveNet vs conventional audio generative models

Relax approximation

Generative model-based text-to-speech synthesis

Beyond text-to-speech synthesis

Beyond generative TTS

Description:

Explore the latest advancements in generative model-based approaches for speech synthesis in this 38-minute conference talk by Heiga Zen from Google. Gain insights into the significant improvements in synthesized speech naturalness, learn about the probabilistic formulation of text-to-speech systems, and discover various acoustic models including HMM-based, FFNN-based, and NN-based generative models. Delve into the architecture of WaveNet, a groundbreaking generative model for raw audio, and understand its advantages over conventional audio generative models. Examine the potential future directions in text-to-speech synthesis and its applications beyond traditional boundaries.

Generative Model-Based Text-to-Speech Synthesis

MITCBMM

Add to list

#Computer Science #Artificial Intelligence #Generative AI #Generative Modeling #Machine Learning #Neural Networks