Главная
Study mode:
on
1
Llama 3 inference and finetuning
2
New Language Model Dev
3
Local Attention
4
Linear complexity of RNN
5
Gated recurrent unit - GRU
6
Linear recurrent Unit - LRU
7
GRIFFIN architecture
8
Real-Gated Linear recurrent unit RG-LRU
9
Griffin Key Features
10
RecurrentGemma
11
Github code
12
Performance benchmark
Description:
Explore a comprehensive technical video that delves into Google's groundbreaking RecurrentLLM architecture with Griffin, presenting a significant shift from traditional transformer-based models. Learn about the innovative RecurrentGemma-2B model, which achieves an impressive throughput of 6000 tokens per second while maintaining performance comparable to transformer-based Gemma 2B. Discover the technical intricacies of new architectures like GRIFFIN and HAWK, with detailed explanations of their advantages over State Space Models such as Mamba-S6. Master concepts including local attention mechanisms, linear recurrences, GRU (Gated Recurrent Unit), LRU (Linear Recurrent Unit), and RG-LRU (Real-Gated Linear Recurrent Unit). Gain insights into the model's fixed-size state architecture, which offers superior memory efficiency for long sequences compared to traditional transformer models' growing key-value cache. Examine performance benchmarks, practical implementations through Github code examples, and understand how this architectural innovation maintains high throughput regardless of sequence length while requiring 33% fewer training tokens than its transformer counterpart. Read more

RecurrentGemma: Moving Past Transformers with Griffin Architecture for Long Context Length

Discover AI
Add to list
0:00 / 0:00