Play all

Intro

CUDA DEVELOPMENT ECOSYSTEM

POWERING THE DEEP LEARNING ECOSYSTEM

TESLA UNIVERSAL ACCELERATION PLATFORM

ACCELERATED COMPUTING IS FULL-STACK OPTIMIZATION

INTRODUCING CUDA 10,0

16 GPUS WITH 32GB MEMORY EACH

NVSWITCH: ALL-TO-ALL CONNECTIVITY

UNIFIED MEMORY + DGX-2

2X HIGHER PERFORMANCE WITH NVSWITCH

NEW PROGRAMMING MODEL FEATURES

ASYNCHRONOUS TASK GRAPHS

NEW EXECUTION MECHANISM

EXECUTION OPTIMIZATIONS

PERFORMANCE IMPACT

THE PATH TO FUSION ENERGY

VOLTA TENSOR CORE

NEW TURING TENSOR CORE

NEW TURING WARP MATRIX FUNCTIONS

CUTLASS 1.1

NVIDIA NGX: DL FOR CREATIVE APPLICATIONS

IN ADOBE PHOTOSHOP

CUDNN: GPU ACCELERATED DEEP LEARNING

IMPROVED HEURISTICS FOR CONVOLUTIONS

PERSISTENT RNN SPEEDUP ON V100

STRIDED ACTIVATION GRADIENTS

TENSORCORES WITH FP32 MODELS

MORE TENSORCORE PERFORMANCE IMPROVEMENTS

GENERAL PERFORMANCE IMPROVEMENTS

FUTURE UPDATES

Description:

Explore improvements to NVIDIA's core AI technologies in this session from the NVIDIA AI Tech Workshop at NeurIPS Expo 2018. Dive into advancements in CUDA Graphs, WMMA, and cuDNN, while learning about Performance Over-Time Tensor Cores, FP32 Fast Math, Small Batch Improvement, and Attention Support. Gain insights into the CUDA development ecosystem, deep learning acceleration, and the Tesla Universal Acceleration Platform. Discover new programming model features, including asynchronous task graphs and execution optimizations. Examine the evolution of Tensor Cores from Volta to Turing architectures, and understand how NVIDIA NGX enhances creative applications. Investigate cuDNN's improvements in convolution heuristics, persistent RNN speedup, and TensorCore performance for both FP16 and FP32 models.

Improvements to NVIDIA CUDA and Deep Learning Libraries - Session 1

Nvidia

Add to list

#Programming #Software Development #CUDA #Computer Science #Deep Learning #Computer Hardware #GPU Acceleration