Petastorm: Data Access for Deep Learning Training Challenges of Training on Large Datasets
16
Spark 3.0: Resource Aware Scheduling
17
What if my Spark cluster doesn't have GPUs? Horovod Lambda - Run data processing on CPUs with Spark
18
Online Prediction
19
Neuropod: Out-of-Process Execution
20
Workflow Authoring Can we ideate, define, evaluate and deploy a Deep Learning model all within a single script?
21
Feature Engineering
22
Model Construction
23
Model Deployment
24
Elastic Horovod: Control Flow
Description:
Explore distributed deep learning techniques and reliable MLOps practices at Uber in this 30-minute conference talk by Travis Addair. Dive into the early adoption of Horovod, understand distributed deep learning concepts, and compare parameter servers with the Allreduce technique. Examine benchmarking results, learn about deep learning applications in research and production environments, and discover feature stores for efficient model training. Investigate preprocessing techniques, Spark ML pipelines, and Petastorm for data access in deep learning. Address challenges of training on large datasets, explore Spark 3.0's resource-aware scheduling, and learn about Horovod Lambda for CPU-based data processing. Gain insights into online prediction using Neuropod, workflow authoring, and the process of ideating, defining, evaluating, and deploying deep learning models within a single script. Conclude with an overview of feature engineering, model construction, deployment, and Elastic Horovod's control flow capabilities.
Read more
Horovod - Distributed Deep Learning for Reliable MLOps