The Use of TimeScaleDB for Storing Time-Series Data and Vectors
6
Migrating to an Open-Source LLM Inference Engine
7
Deploying Kubernetes and Setting Up Node Groups
8
Choosing VLLM as the Inference Engine
9
The Migration Process: Deploying Kubernetes and Setting Up Node Groups
10
Choosing the Right Level of Abstraction
11
Challenges in Evaluating Language Model Performance
12
Considerations for Adopting Kubernetes in Startups
Description:
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Grab it
Explore the intricacies of building a ChatGPT-style LLM AI infrastructure using Kubernetes in this comprehensive video featuring John McBride. Delve into the challenges and solutions of deploying open-source AI technologies at scale, with a focus on Kubernetes as a platform for running compute-intensive tasks. Learn about the decision-making process behind choosing TimeScaleDB for storing time-series data and vectors, and gain insights into migrating from OpenAI to an open-source large language model inference engine. Discover the importance of selecting the right level of abstraction, understanding trade-offs, and evaluating language model performance. The video also covers practical aspects such as deploying Kubernetes, setting up node groups with GPUs, and using VLLM as the inference engine. Whether you're a startup considering Kubernetes adoption or an experienced developer looking to optimize AI infrastructure, this talk provides valuable takeaways on building and managing AI-enabled applications at scale.
Read more
Building Your Own ChatGPT-style LLM AI Infrastructure with Kubernetes