Play all

Intro

Encoder-decoder Models

Sentence Representations

Basic Idea (Bahdanau et al. 2015)

Calculating Attention (2)

A Graphical Example

Attention Score Functions (1)

Input Sentence

Hierarchical Structures (Yang et al. 2016)

Multiple Sources

Coverage • Problem: Neural models tends to drop or repeat

Incorporating Markov Properties (Cohn et al. 2015)

در Bidirectional Training

Hard Attention

Summary of the Transformer (Vaswani et al. 2017)

Attention Tricks

Training Tricks

Masking for Training . We want to perform training in as few operations as

Description:

Explore a comprehensive lecture on attention mechanisms in neural networks for natural language processing. Delve into the fundamentals of attention, including what to attend to, improvements to attention techniques, and specialized attention varieties. Examine a detailed case study on the "Attention is All You Need" paper. Learn about encoder-decoder models, sentence representations, and the basic idea behind attention as proposed by Bahdanau et al. in 2015. Discover various attention score functions, hierarchical structures, and techniques for handling multiple sources. Address common problems in neural models, such as dropping or repeating information, and explore solutions like incorporating Markov properties and bidirectional training. Gain insights into hard attention, the Transformer architecture, and various attention tricks. Conclude with training techniques, including masking for efficient training operations.

Neural Nets for NLP 2019 - Attention

Graham Neubig

Add to list

#Computer Science #Artificial Intelligence #Neural Networks #Machine Learning #Natural Language Processing (NLP) #Deep Learning #Attention Mechanisms #Encoder-Decoder Models

0:00 / 0:00