Главная
Study mode:
on
1
Transfer Learning via Pre-training
2
Pre-trained Contextualized Representations
3
BERT [Devlin et al. (2018)]
4
How can we do better?
5
Span-based Efficient Pre-training
6
Pre-training Span Representations
7
Why is this more efficient?
8
Random subword masks can be too easy
9
Which spans to mask?
10
Why SBO?
11
Single-sequence Inputs
12
Evaluation
13
Baselines
14
Extractive QA: SQUAD
15
GLUE
16
ROBERTA: Scaling BERT
17
The ROBERTA Recipe
18
What is still hard?
19
Next Big Thing: Few Shot Learning?
20
Next Big Thing: Non-parametric Memories?
Description:
Explore transfer learning and pre-trained contextualized representations in this 20-minute conference talk from KDD2020. Dive into BERT and its improvements, including span-based efficient pre-training and ROBERTA. Learn about extractive QA, GLUE, and the challenges that remain in the field. Discover potential future directions such as few-shot learning and non-parametric memories. Gain insights from Mandar Joshi on advancing natural language processing techniques through innovative pre-training approaches and model architectures.

KDD2020 - Transfer Learning Joshi

Association for Computing Machinery (ACM)
Add to list