Play all

Intro

A class with multiple implementations

Data parallelism

Parameter servers and workers

Central Storage

Mirrored Variables

All-reduce algorithm

Ring all-reduce

Hierarchical all-reduce

OneDevice Strategy

Parallel input preprocessing: coming

What changes when you switch strategies?

# Training with Keras

# Training with Estimator

Concept: Mirrored vs. per-replica values

Support computations following this pattern

setup

loss, optimizer

# Custom training loop, part 3: each replica

Concept: Modes

all replicas

outer loop

Default Strategy

# Average loss using the global batch size

# Optimizer implementation, part 1

merge_call(fn, args) is our secret weapon

# Optimizer implementation, part 2

Concept: Replica vs. variable locality

One standard pattern for updating state

# Example: Mean metric

Questions?

Description:

Dive into an in-depth technical session on TensorFlow's tf.distribute.Strategy, presented by TensorFlow Software Engineer Josh Levenberg. Explore the design principles behind this powerful feature, which aims to simplify distribution across various use cases. Learn about data parallelism, parameter servers, central storage, mirrored variables, and all-reduce algorithms. Understand the differences between strategies, including OneDevice and Default, and how they affect training with Keras and Estimator. Discover key concepts such as mirrored vs. per-replica values, replica vs. variable locality, and the implementation of custom training loops. Gain insights into optimizer implementations, loss averaging, and metric calculations in distributed environments. Perfect for developers and researchers looking to leverage TensorFlow's distributed computing capabilities effectively.

Inside TensorFlow - tf.distribute.Strategy

TensorFlow

Add to list

#Computer Science #Machine Learning #TensorFlow #Deep Learning #Keras