Play all

Intro

Paper overview

Collecting a large scale weakly supervised dataset

Evaluation metric issues WER

Effective robustness

Scaling laws in progress

Decoding is hacky

Code walk-through

Model architecture diagram vs code

Transcription task

Loading the audio, mel spectrograms

Language detection

Transcription task continued

Suppressing token logits

Voice activity detection

Decoding and heuristics

Outro

Description:

Dive into a comprehensive video lecture exploring OpenAI's Whisper, a robust speech recognition system developed through large-scale weak supervision. Examine the paper's key findings and delve into the code implementation. Learn about the collection of a vast multi-lingual dataset, evaluation metrics, effective robustness, and scaling laws. Explore the model architecture, transcription tasks, mel spectrograms, language detection, and decoding heuristics. Gain insights into voice activity detection and token logit suppression techniques used in this cutting-edge automatic speech recognition system.

OpenAI Whisper - Robust Speech Recognition via Large-Scale Weak Supervision

Aleksa Gordić - The AI Epiphany

Add to list

#Computer Science #Artificial Intelligence #Natural Language Processing (NLP) #Machine Learning #Scaling Laws #Language Detection

0:00 / 0:00