Play all

Llama 3 inference and finetuning

New Language Model Dev

Local Attention

Linear complexity of RNN

Gated recurrent unit - GRU

Linear recurrent Unit - LRU

GRIFFIN architecture

Real-Gated Linear recurrent unit RG-LRU

Griffin Key Features

RecurrentGemma

Github code

Performance benchmark

Description:

Explore a comprehensive technical video that delves into Google's groundbreaking RecurrentLLM architecture with Griffin, presenting a significant shift from traditional transformer-based models. Learn about the innovative RecurrentGemma-2B model, which achieves an impressive throughput of 6000 tokens per second while maintaining performance comparable to transformer-based Gemma 2B. Discover the technical intricacies of new architectures like GRIFFIN and HAWK, with detailed explanations of their advantages over State Space Models such as Mamba-S6. Master concepts including local attention mechanisms, linear recurrences, GRU (Gated Recurrent Unit), LRU (Linear Recurrent Unit), and RG-LRU (Real-Gated Linear Recurrent Unit). Gain insights into the model's fixed-size state architecture, which offers superior memory efficiency for long sequences compared to traditional transformer models' growing key-value cache. Examine performance benchmarks, practical implementations through Github code examples, and understand how this architectural innovation maintains high throughput regardless of sequence length while requiring 33% fewer training tokens than its transformer counterpart. Read more

RecurrentGemma: Moving Past Transformers with Griffin Architecture for Long Context Length

Discover AI

Add to list

#Computer Science #Machine Learning #Language Models #Artificial Intelligence #Neural Networks #Transformers #Deep Learning #Attention Mechanisms

0:00 / 0:00