Play all

Transfer Learning via Pre-training

Pre-trained Contextualized Representations

BERT [Devlin et al. (2018)]

How can we do better?

Span-based Efficient Pre-training

Pre-training Span Representations

Why is this more efficient?

Random subword masks can be too easy

Which spans to mask?

Why SBO?

Single-sequence Inputs

Evaluation

Baselines

Extractive QA: SQUAD

GLUE

ROBERTA: Scaling BERT

The ROBERTA Recipe

What is still hard?

Next Big Thing: Few Shot Learning?

Next Big Thing: Non-parametric Memories?

Description:

Explore transfer learning and pre-trained contextualized representations in this 20-minute conference talk from KDD2020. Dive into BERT and its improvements, including span-based efficient pre-training and ROBERTA. Learn about extractive QA, GLUE, and the challenges that remain in the field. Discover potential future directions such as few-shot learning and non-parametric memories. Gain insights from Mandar Joshi on advancing natural language processing techniques through innovative pre-training approaches and model architectures.

KDD2020 - Transfer Learning Joshi

Association for Computing Machinery (ACM)

Add to list

#Computer Science #Machine Learning #Transfer Learning #Artificial Intelligence #Natural Language Processing (NLP) #LLM (Large Language Model) #BERT #Software Engineering #Scalability