Play all

Intro

Outline for Efficient Transformer

Introduction

Transformer for Sequential Modeling

Transformer with Long Sequence

Contributions

High-level Approach

Weighted Exponential KDE

Adaptive KDE Algorithm

Algorithm Summary

Experiments

Conclusion

Future Work

Description:

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Explore efficient Transformer acceleration techniques in this Google TechTalk presented by Insu Han. Dive into the challenges of processing long sequences with dot-product attention mechanisms and discover innovative solutions using kernel density estimation (KDE). Learn about the KDEformer approach, which approximates attention in sub-quadratic time with provable spectral norm bounds. Examine experimental results comparing KDEformer's performance to other attention approximations in terms of accuracy, memory usage, and runtime on various pre-trained models. Gain insights into the potential applications and future directions of this research in accelerating large language models and sequence modeling tasks.

Accelerating Transformers via Kernel Density Estimation - Google TechTalk

Google TechTalks

Add to list

#Computer Science #Machine Learning #Transformers #Algorithms #Computational Complexity #Deep Learning #Attention Mechanisms #Mathematics #Algebra #Linear Algebra #Matrix Multiplication #Algorithm Optimization #Sequence Modeling

0:00 / 0:00