Главная
Study mode:
on
1
Intro
2
Paper overview
3
Collecting a large scale weakly supervised dataset
4
Evaluation metric issues WER
5
Effective robustness
6
Scaling laws in progress
7
Decoding is hacky
8
Code walk-through
9
Model architecture diagram vs code
10
Transcription task
11
Loading the audio, mel spectrograms
12
Language detection
13
Transcription task continued
14
Suppressing token logits
15
Voice activity detection
16
Decoding and heuristics
17
Outro
Description:
Dive into a comprehensive video lecture exploring OpenAI's Whisper, a robust speech recognition system developed through large-scale weak supervision. Examine the paper's key findings and delve into the code implementation. Learn about the collection of a vast multi-lingual dataset, evaluation metrics, effective robustness, and scaling laws. Explore the model architecture, transcription tasks, mel spectrograms, language detection, and decoding heuristics. Gain insights into voice activity detection and token logit suppression techniques used in this cutting-edge automatic speech recognition system.

OpenAI Whisper - Robust Speech Recognition via Large-Scale Weak Supervision

Aleksa Gordić - The AI Epiphany
Add to list
0:00 / 0:00