Главная
Study mode:
on
1
Intro
2
A class with multiple implementations
3
Data parallelism
4
Parameter servers and workers
5
Central Storage
6
Mirrored Variables
7
All-reduce algorithm
8
Ring all-reduce
9
Hierarchical all-reduce
10
OneDevice Strategy
11
Parallel input preprocessing: coming
12
What changes when you switch strategies?
13
# Training with Keras
14
# Training with Estimator
15
Concept: Mirrored vs. per-replica values
16
Support computations following this pattern
17
setup
18
loss, optimizer
19
# Custom training loop, part 3: each replica
20
Concept: Modes
21
all replicas
22
outer loop
23
Default Strategy
24
# Average loss using the global batch size
25
# Optimizer implementation, part 1
26
merge_call(fn, args) is our secret weapon
27
# Optimizer implementation, part 2
28
Concept: Replica vs. variable locality
29
One standard pattern for updating state
30
# Example: Mean metric
31
Questions?
Description:
Dive into an in-depth technical session on TensorFlow's tf.distribute.Strategy, presented by TensorFlow Software Engineer Josh Levenberg. Explore the design principles behind this powerful feature, which aims to simplify distribution across various use cases. Learn about data parallelism, parameter servers, central storage, mirrored variables, and all-reduce algorithms. Understand the differences between strategies, including OneDevice and Default, and how they affect training with Keras and Estimator. Discover key concepts such as mirrored vs. per-replica values, replica vs. variable locality, and the implementation of custom training loops. Gain insights into optimizer implementations, loss averaging, and metric calculations in distributed environments. Perfect for developers and researchers looking to leverage TensorFlow's distributed computing capabilities effectively.

Inside TensorFlow - tf.distribute.Strategy

TensorFlow
Add to list