Главная
Study mode:
on
1
Intro
2
Outline for Efficient Transformer
3
Introduction
4
Transformer for Sequential Modeling
5
Transformer with Long Sequence
6
Contributions
7
High-level Approach
8
Weighted Exponential KDE
9
Adaptive KDE Algorithm
10
Algorithm Summary
11
Experiments
12
Conclusion
13
Future Work
Description:
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Explore efficient Transformer acceleration techniques in this Google TechTalk presented by Insu Han. Dive into the challenges of processing long sequences with dot-product attention mechanisms and discover innovative solutions using kernel density estimation (KDE). Learn about the KDEformer approach, which approximates attention in sub-quadratic time with provable spectral norm bounds. Examine experimental results comparing KDEformer's performance to other attention approximations in terms of accuracy, memory usage, and runtime on various pre-trained models. Gain insights into the potential applications and future directions of this research in accelerating large language models and sequence modeling tasks.

Accelerating Transformers via Kernel Density Estimation - Google TechTalk

Google TechTalks
Add to list
0:00 / 0:00