Explore advanced techniques for AI inference and quantization in this session from the NVIDIA AI Tech Workshop at NeurIPS Expo 2018. Dive into quantized inference, NVIDIA TensorRT™ 5 and TensorFlow integration, and the TensorRT Inference Server. Learn about 4-bit quantization, binary neural networks, tensor cores, and strategies for maintaining speed while optimizing accuracy. Discover post-training calibration techniques, mixed precision networks, and the benefits of per-channel scaling. Gain insights into object detection with NMS, the TensorRT hyperscale inference platform, and the NVIDIA TensorRT Inference Server's features including dynamic batching and concurrent model execution. Access additional resources and tools to enhance your AI inference capabilities.