Play all

Main talk starts - intro & motivation

Behind the scenes: how Tri got started with Flash Attention

Motivation: modelling long sequences

Brief recap of attention

Memory bottleneck, IO awareness

Flash Attention 2.0 improvements

Behind the scenes of Flash Attention 2.0 refactor of CUTLASS 3

Future directions

Q&A

Description:

Dive into a comprehensive Discord server talk featuring Tri Dao from Stanford, discussing his groundbreaking work on Flash Attention 2.0. Explore the motivation behind modeling long sequences, gain insights into attention mechanisms, and understand the memory bottleneck and IO awareness challenges. Learn about the improvements in Flash Attention 2.0, including the refactoring of CUTLASS 3, and discover future directions in this field. Engage with an informative Q&A session to deepen your understanding of this cutting-edge technology in machine learning systems.

Flash Attention 2.0 with Tri Dao - Discord Server Talks

Aleksa Gordić - The AI Epiphany

Add to list

#Computer Science #Deep Learning #Attention Mechanisms #Machine Learning

0:00 / 0:00