Главная
Study mode:
on
1
Intro
2
OUTLINE
3
4-BIT QUANTIZATION
4
QUANTIZATION FOR INFERENCE
5
BINARY NEURAL NETWORKS
6
USING TENSOR CORES
7
QUANTIZED NETWORK ACCURACY
8
MAINTAINING SPEED AT BEST ACCURACY
9
SCALE-ONLY QUANTIZATION
10
PER-CHANNEL SCALING
11
TRAINING FOR QUANTIZATION
12
CONCLUSION
13
POST-TRAINING CALIBRATION
14
MIXED PRECISION NETWORKS
15
THE ROOT CAUSE
16
BRING YOUR OWN CALIBRATION
17
SUMMARY
18
INT PERFORMANCE
19
ALSO IN TensorRT
20
TF-TRT RELATIVE PERFORMANCE
21
OBJECT DETECTION - NMS
22
USING THE NEW NMS OP
23
NOW AVAILABLE ON GITHUB
24
TENSORRT HYPERSCALE INFERENCE PLATFORM
25
INEFFICIENCY LIMITS INNOVATION
26
NVIDIA TENSORRT INFERENCE SERVER
27
CURRENT FEATURES
28
AVAILABLE METRICS
29
DYNAMIC BATCHING
30
CONCURRENT MODEL EXECUTION-RESNET 50
31
NVIDIA RESEARCH AI PLAYGROUND
32
NV LEARN MORE AND DOWNLOAD TO USE
33
ADDITIONAL RESOURCES
Description:
Explore advanced techniques for AI inference and quantization in this session from the NVIDIA AI Tech Workshop at NeurIPS Expo 2018. Dive into quantized inference, NVIDIA TensorRT™ 5 and TensorFlow integration, and the TensorRT Inference Server. Learn about 4-bit quantization, binary neural networks, tensor cores, and strategies for maintaining speed while optimizing accuracy. Discover post-training calibration techniques, mixed precision networks, and the benefits of per-channel scaling. Gain insights into object detection with NMS, the TensorRT hyperscale inference platform, and the NVIDIA TensorRT Inference Server's features including dynamic batching and concurrent model execution. Access additional resources and tools to enhance your AI inference capabilities.

Inference and Quantization for AI - Session 3

Nvidia
Add to list
0:00 / 0:00