[] A huge shout-out to Comet ML for sponsoring this episode!
4
[] Please like, share, leave a review, and subscribe to our MLOps channels!
5
[] Evaluation metrics in AI
6
[] LLM Evaluation in Practice
7
[] LLM testing methodologies
8
[] LLM as a judge
9
[] OPIC track function overview
10
[] Tracking user response value
11
[] Exploring AI metrics integration
12
[] Experiment tracking and LLMs
13
[] Micro Macro collaboration in AI
14
[] RAG Pipeline Reproducibility Snapshot
15
[] Collaborative experiment tracking
16
[] Feature flags in CI/CD
17
[] Labeling challenges and solutions
18
[] LLM output quality alerts
19
[] Anomaly detection in model outputs
20
[] Wrap up
Description:
Explore a comprehensive podcast episode featuring Gideon Mendels, CEO of Comet, discussing systematic testing and evaluation of LLM applications. Gain insights into hybrid approaches combining ML and software engineering best practices, defining evaluation metrics, and tracking experimentation for LLM app development. Learn about comprehensive unit testing strategies for confident deployment, and discover the importance of managing machine learning workflows from experimentation to production. Delve into topics such as LLM evaluation methodologies, AI metrics integration, experiment tracking, collaborative approaches, and anomaly detection in model outputs. Benefit from Mendels' expertise in NLP, speech recognition, and ML research as he shares valuable insights for developers working with LLM applications.
How to Systematically Test and Evaluate LLM Apps - MLOps Podcast