Play all

Intro

Why is text-to-audio hard?

Comparison with VQ-GAN

Comparison with SoundStream

AudioGen overview

Deep dive: audio representation, LSTM

Losses explained

Complex-valued STFTs

Audio Language Modeling

Multi-stream audio inputs

Data and augmentations

Results

Outro

Description:

Dive deep into the world of text-guided audio synthesis with this comprehensive video explanation of the "AudioGen: Textually Guided Audio Generation" paper. Explore the challenges of text-to-audio conversion, compare AudioGen with VQ-GAN and SoundStream, and gain insights into audio representation, LSTM networks, and complex-valued STFTs. Learn about audio language modeling, multi-stream audio inputs, data augmentation techniques, and examine the impressive results of this innovative approach to audio generation.

AudioGen- Textually Guided Audio Generation - Paper Explained

Aleksa Gordić - The AI Epiphany

Add to list

#Computer Science #Artificial Intelligence #Neural Networks #Generative Adversarial Networks (GAN) #Recurrent Neural Networks (RNN) #Long short-term memory (LSTM) #Machine Learning #Data Augmentation #Art & Design #Music #Music Production #Audio Production #Sound Engineering #Audio generation