Play all

Intro Green grasshoppers

What do attention heads focus on?

Long context Factuality by retrieval heads

Needle in a Haystack Benchmark

How many retrieval heads in a LLM?

What is a retrieval head?

Retrieval heatmap consistent with pre-trained base model

Retrieval heads and Chain-of-Thought Reasoning

Retrieval heads explain why LLMs hallucinate

How to generate more retrieval heads in LLMs?

Description:

Learn about groundbreaking research from MIT and Peking University in this 31-minute video exploring retrieval heads - a newly discovered element in transformer architecture. Dive into how these specialized attention heads significantly impact RAG performance, chain-of-thought reasoning, and long context length retrieval quality. Explore practical applications including enhanced RAG systems development, improved long context windows retrieval, and reduced factual hallucination in LLMs. Examine five detailed real-world use cases spanning legal document summarization, financial market analysis, educational technology, content moderation, and scientific research. Follow along with a structured breakdown of topics including attention head focus areas, factuality in long context, the Needle in a Haystack benchmark, retrieval head mechanics, heatmap analysis, and their relationship to Chain-of-Thought reasoning. Based on the arxiv pre-print "Retrieval Head Mechanistically Explains Long-Context Factuality" by Wenhao Wu and colleagues, gain insights into why LLMs hallucinate and methods for generating additional retrieval heads in language models. Read more

Understanding Retrieval Heads in Large Language Models - From Discovery to Applications

Discover AI

Add to list

#Computer Science #Machine Learning #Transformers #Artificial Intelligence #Data Science #Information Retrieval #Deep Learning #Attention Mechanisms

0:00 / 0:00