Play all

Intro

Early Adoption of Horovod

Deep Learning Refresher

Distributed Deep Learning

Early Distributed Training - Parameter Servers

Parameter Servers - Tradeoffs

Horovod Technique: Allreduce

Benchmarking

Deep Learning in Research

Deep Learning in Production

Feature Store

Model Training

Preprocessing

Spark ML Pipelines

Petastorm: Data Access for Deep Learning Training Challenges of Training on Large Datasets

Spark 3.0: Resource Aware Scheduling

What if my Spark cluster doesn't have GPUs? Horovod Lambda - Run data processing on CPUs with Spark

Online Prediction

Neuropod: Out-of-Process Execution

Workflow Authoring Can we ideate, define, evaluate and deploy a Deep Learning model all within a single script?

Feature Engineering

Model Construction

Model Deployment

Elastic Horovod: Control Flow

Description:

Explore distributed deep learning techniques and reliable MLOps practices at Uber in this 30-minute conference talk by Travis Addair. Dive into the early adoption of Horovod, understand distributed deep learning concepts, and compare parameter servers with the Allreduce technique. Examine benchmarking results, learn about deep learning applications in research and production environments, and discover feature stores for efficient model training. Investigate preprocessing techniques, Spark ML pipelines, and Petastorm for data access in deep learning. Address challenges of training on large datasets, explore Spark 3.0's resource-aware scheduling, and learn about Horovod Lambda for CPU-based data processing. Gain insights into online prediction using Neuropod, workflow authoring, and the process of ideating, defining, evaluating, and deploying deep learning models within a single script. Conclude with an overview of feature engineering, model construction, deployment, and Elastic Horovod's control flow capabilities. Read more

Horovod - Distributed Deep Learning for Reliable MLOps

Linux Foundation

Add to list

#Computer Science #Machine Learning #MLOps #Programming #Software Development #Software Testing #Performance Testing #Benchmarking #Deep Learning #Distributed Deep Learning #Distributed Computing #Horovod