Главная
Study mode:
on
1
Transformers, parallel computation, and logarithmic depth
Description:
Explore the computational power of transformers in this 57-minute lecture by Daniel Hsu from Columbia University. Delve into the relationship between self-attention layers and communication rounds in Massively Parallel Computation. Discover how logarithmic depth enables transformers to efficiently solve complex computational tasks that challenge other neural sequence models and sub-quadratic transformer approximations. Gain insights into parallelism as a crucial distinguishing feature of transformers. Learn about the collaborative research with Clayton Sanford from Google and Matus Telgarsky from NYU, focusing on the simulation capabilities between constant numbers of self-attention layers and communication rounds in Massively Parallel Computation.

Transformers, Parallel Computation, and Logarithmic Depth

Simons Institute
Add to list
0:00 / 0:00