Explore improvements to NVIDIA's core AI technologies in this session from the NVIDIA AI Tech Workshop at NeurIPS Expo 2018. Dive into advancements in CUDA Graphs, WMMA, and cuDNN, while learning about Performance Over-Time Tensor Cores, FP32 Fast Math, Small Batch Improvement, and Attention Support. Gain insights into the CUDA development ecosystem, deep learning acceleration, and the Tesla Universal Acceleration Platform. Discover new programming model features, including asynchronous task graphs and execution optimizations. Examine the evolution of Tensor Cores from Volta to Turing architectures, and understand how NVIDIA NGX enhances creative applications. Investigate cuDNN's improvements in convolution heuristics, persistent RNN speedup, and TensorCore performance for both FP16 and FP32 models.
Improvements to NVIDIA CUDA and Deep Learning Libraries - Session 1