Play all

- Intro

- What are sparse expert models?

- Start of Interview

- What do you mean by sparse experts?

- How does routing work in these models?

- What is the history of sparse experts?

- What does an individual expert learn?

- When are these models appropriate?

- How comparable are sparse to dense models?

- How does the pathways system connect to this?

- What improvements did GLAM make?

- The "designing sparse experts" paper

- Can experts be frozen during training?

- Can the routing function be improved?

- Can experts be distributed beyond data centers?

- Are there sparse experts for other domains than NLP?

- Are sparse and dense models in competition?

- Where do we go from here?

- How can people get started with this?

Description:

Explore the world of Sparse Expert Models in this comprehensive interview with Google Brain researchers Barret Zoph and William Fedus. Delve into the fundamentals, history, strengths, and weaknesses of these innovative models, including Switch Transformers and GLAM, which can scale up to trillions of parameters. Learn how sparse expert models distribute parts of Transformers across large arrays of machines, using routing functions to efficiently activate only specific parts of the model. Discover the advantages of this approach, its applications in natural language processing, and potential future developments. Gain insights into the comparison between sparse and dense models, the improvements made by GLAM, and the possibilities of distributing experts beyond data centers. Whether you're a machine learning enthusiast or a seasoned researcher, this in-depth discussion provides valuable knowledge on the current state of the art in sparse expert models and their potential impact on the field of artificial intelligence. Read more

Sparse Expert Models - Switch Transformers, GLAM, and More With the Authors

Yannic Kilcher

Add to list

#Computer Science #Machine Learning #Transformer Models