Play all

Training can take a long time

Data parallelism

Mirrored Variables

Ring All-reduce

Synchronous training

Performance on Multi-GPU

Setting up multi-node Environment

Deploy your Kubernetes cluster

Hierarchical All-Reduce

Model Code is Automatically Distributed

Configuring Cluster

Description:

Learn distributed TensorFlow training using Keras high-level APIs in this 33-minute conference talk from the O'Reilly AI Conference in San Francisco. Explore TensorFlow's distributed architecture, set up a distributed cluster with Kubeflow and Kubernetes, and discover how to distribute Keras models. Dive into concepts like data parallelism, mirrored variables, ring all-reduce, and synchronous training. Understand performance on multi-GPU setups and learn to configure and deploy Kubernetes clusters. Gain insights into hierarchical all-reduce and how model code is automatically distributed. Access additional resources on distribution strategies and APIs to enhance your understanding of distributed TensorFlow training.

Distributed TensorFlow - TensorFlow at O'Reilly AI Conference, San Francisco '18

TensorFlow

Add to list

#Computer Science #DevOps #Kubernetes #Deep Learning #Keras #Kubeflow

0:00 / 0:00