Description:

Explore the computational power of transformers in this 57-minute lecture by Daniel Hsu from Columbia University. Delve into the relationship between self-attention layers and communication rounds in Massively Parallel Computation. Discover how logarithmic depth enables transformers to efficiently solve complex computational tasks that challenge other neural sequence models and sub-quadratic transformer approximations. Gain insights into parallelism as a crucial distinguishing feature of transformers. Learn about the collaborative research with Clayton Sanford from Google and Matus Telgarsky from NYU, focusing on the simulation capabilities between constant numbers of self-attention layers and communication rounds in Massively Parallel Computation.

Transformers, Parallel Computation, and Logarithmic Depth

Simons Institute

Add to list

#Computer Science #Machine Learning #Transformers #Artificial Intelligence #Algorithms #Computational Complexity #Deep Learning #Self-Attention

0:00 / 0:00

Transformers, Parallel Computation, and Logarithmic Depth

Transformers, parallel computation, and logarithmic depth