Play all

John McBride

Introduction and Background

Summary of the Blog Post

The Role of Kubernetes in AI-Enabled Applications

The Use of TimeScaleDB for Storing Time-Series Data and Vectors

Migrating to an Open-Source LLM Inference Engine

Deploying Kubernetes and Setting Up Node Groups

Choosing VLLM as the Inference Engine

The Migration Process: Deploying Kubernetes and Setting Up Node Groups

Choosing the Right Level of Abstraction

Challenges in Evaluating Language Model Performance

Considerations for Adopting Kubernetes in Startups

Description:

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Explore the intricacies of building a ChatGPT-style LLM AI infrastructure using Kubernetes in this comprehensive video featuring John McBride. Delve into the challenges and solutions of deploying open-source AI technologies at scale, with a focus on Kubernetes as a platform for running compute-intensive tasks. Learn about the decision-making process behind choosing TimeScaleDB for storing time-series data and vectors, and gain insights into migrating from OpenAI to an open-source large language model inference engine. Discover the importance of selecting the right level of abstraction, understanding trade-offs, and evaluating language model performance. The video also covers practical aspects such as deploying Kubernetes, setting up node groups with GPUs, and using VLLM as the inference engine. Whether you're a startup considering Kubernetes adoption or an experienced developer looking to optimize AI infrastructure, this talk provides valuable takeaways on building and managing AI-enabled applications at scale. Read more

Building Your Own ChatGPT-style LLM AI Infrastructure with Kubernetes

Tejas Kumar

Add to list

#Computer Science #DevOps #Kubernetes #Data Science #Data Engineering #High Performance Computing #Parallel Computing #GPU Computing #Software Engineering #Scalability #Artificial Intelligence #OpenAI #Machine Learning #vLLM