Unlocking Potential of Large Models in Production - Yuan Tang, Red Hat & Adam Tetelman, NVIDIA
Description:
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Grab it
Learn about the challenges and solutions for deploying large language models (LLMs) in production environments through this conference talk presented by Yuan Tang from Red Hat and Adam Tetelman from NVIDIA. Explore best practices for building scalable inference platforms using cloud native technologies like Kubernetes, Kubeflow, KServe, and Knative. Discover practical solutions for benchmarking LLMs, implementing efficient storage and caching mechanisms for quick auto-scaling, optimizing models for specialized accelerators, managing A/B testing with limited compute resources, and establishing effective monitoring systems. Using KServe as a case study, gain insights into addressing critical LLMOps challenges that arise during the transition from traditional machine learning to generative AI and large language models in production environments.
Unlocking the Potential of Large Models in Production - Best Practices and Solutions