Play all

3 ways for infinite context lengths

Blockwise Parallel Transformers

Q, K, V explained in a library

BPT explained in a library

Maths for blockwise parallel transformers

Ring attention symmetries

Ring attention explained

Ring attention JAX code

Outlook: Infini Attention by Google

Description:

Explore a technical video lecture that delves into Ring Attention, a breakthrough technology enabling context lengths of 1 million tokens in Large Language Models (LLMs) and Vision Language Models (VLMs). Learn about the Block Parallel Transformer concept developed at UC Berkeley, from theoretical foundations to practical implementation. Understand the three approaches to achieving infinite context lengths, the mechanics of Q, K, V operations in libraries, and the mathematical principles behind blockwise parallel transformers. Examine ring attention symmetries, detailed explanations of ring attention mechanisms, and their implementation in JAX code. Discover how this technology is being applied in real-world applications like Google's Gemini 1.5 Pro on Vertex AI, and get insights into future developments with Google's Infini Attention. The comprehensive breakdown includes practical code examples and mathematical explanations, making complex concepts accessible to technical audiences interested in advancing their understanding of attention mechanisms in AI models. Read more

Ring Attention and Blockwise Transformers for Extended Context Length in Language Models

Discover AI

Add to list

#Computer Science #Machine Learning #Transformers #Artificial Intelligence #Neural Networks #Mathematics #Algebra #Linear Algebra #High Performance Computing #Parallel Computing #Deep Learning #Attention Mechanisms #JAX