Play all

Introduction

About me

Outline

Why multilingual data

Tasks associated with language systems

Syntax mixing

Transliterated text

Language identification

Language identification in practice

Other examples

Lambda ID

Blanked

Python

Limitations

Data augmentation

Simple example

The Transformer

Multiheaded attention

Stateoftheart soda

Why is it special

Word Piece Processing

Statistics of Languages

Bird Masked Language Model

Prediction Function

Code Switched Example

Lyrics Example

Task Evaluation

Generation Evaluation

Summary

Description:

Explore the challenges and solutions for multilingual Natural Language Processing (NLP) models in this 45-minute PyCon US talk by Shreya Khurana. Dive into the complexities of language identification, transliterated and code-switched text, and the use of multilingual BERT models. Learn about existing Python frameworks for language identification tasks and their limitations. Discover approaches to handling the lack of annotated datasets for transliterated and code-switched text using web crawlers and self-generated datasets. Examine the performance of Google's multilingual BERT model trained in 104 languages through practical examples. Gain insights into evaluating NLP models for various tasks in a multilingual context. Access additional resources and code examples on GitHub to further enhance your understanding of multilingual NLP techniques.

How Multilingual Is Your NLP Model?

PyCon US

Add to list

#Conference Talks #PyCon US #Computer Science #Machine Learning #Transformer Models #Data Augmentation

0:00 / 0:00