Главная
Study mode:
on
1
Intro
2
CUDA DEVELOPMENT ECOSYSTEM
3
POWERING THE DEEP LEARNING ECOSYSTEM
4
TESLA UNIVERSAL ACCELERATION PLATFORM
5
ACCELERATED COMPUTING IS FULL-STACK OPTIMIZATION
6
INTRODUCING CUDA 10,0
7
16 GPUS WITH 32GB MEMORY EACH
8
NVSWITCH: ALL-TO-ALL CONNECTIVITY
9
UNIFIED MEMORY + DGX-2
10
2X HIGHER PERFORMANCE WITH NVSWITCH
11
NEW PROGRAMMING MODEL FEATURES
12
ASYNCHRONOUS TASK GRAPHS
13
NEW EXECUTION MECHANISM
14
EXECUTION OPTIMIZATIONS
15
PERFORMANCE IMPACT
16
THE PATH TO FUSION ENERGY
17
VOLTA TENSOR CORE
18
NEW TURING TENSOR CORE
19
NEW TURING WARP MATRIX FUNCTIONS
20
CUTLASS 1.1
21
NVIDIA NGX: DL FOR CREATIVE APPLICATIONS
22
IN ADOBE PHOTOSHOP
23
CUDNN: GPU ACCELERATED DEEP LEARNING
24
IMPROVED HEURISTICS FOR CONVOLUTIONS
25
PERSISTENT RNN SPEEDUP ON V100
26
STRIDED ACTIVATION GRADIENTS
27
TENSORCORES WITH FP32 MODELS
28
MORE TENSORCORE PERFORMANCE IMPROVEMENTS
29
GENERAL PERFORMANCE IMPROVEMENTS
30
FUTURE UPDATES
Description:
Explore improvements to NVIDIA's core AI technologies in this session from the NVIDIA AI Tech Workshop at NeurIPS Expo 2018. Dive into advancements in CUDA Graphs, WMMA, and cuDNN, while learning about Performance Over-Time Tensor Cores, FP32 Fast Math, Small Batch Improvement, and Attention Support. Gain insights into the CUDA development ecosystem, deep learning acceleration, and the Tesla Universal Acceleration Platform. Discover new programming model features, including asynchronous task graphs and execution optimizations. Examine the evolution of Tensor Cores from Volta to Turing architectures, and understand how NVIDIA NGX enhances creative applications. Investigate cuDNN's improvements in convolution heuristics, persistent RNN speedup, and TensorCore performance for both FP16 and FP32 models.

Improvements to NVIDIA CUDA and Deep Learning Libraries - Session 1

Nvidia
Add to list
00:00
-00:27