Главная
Study mode:
on
1
[] Shahul's preferred coffee
2
[] Takeaways
3
[] Please like, share, and subscribe to our MLOps channels!
4
[] Shahul's definition of Evaluation
5
[] Evaluation metrics and Benchmarks
6
[] Gamed leaderboards
7
[] Best at summarizing long text open-source models
8
[] Benchmarks
9
[] Recommending evaluation process
10
[] LLMs for other LLMs
11
[] Debugging failed evaluation models
12
[] Prompt injection
13
[] Alignment
14
[] Open Assist
15
[] Garbage in, garbage out
16
[] Ragas
17
[] Valuable use case besides Open AI
18
[] Fine-tuning LLMs
19
[] Connect with Shahul if you need help with Ragas @Shahules786 on Twitter
20
[] Wrap up
Description:
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Dive into a comprehensive 51-minute podcast episode featuring Shahul Es, a data science expert and Kaggle Grandmaster, as he explores the intricacies of evaluating Large Language Model (LLM) applications. Learn about debugging techniques, troubleshooting strategies, and the challenges associated with benchmarks in open-source models. Gain valuable insights on custom data distributions, the significance of fine-tuning in improving model performance, and the Ragas Project. Discover the importance of evaluation metrics, the impact of gamed leaderboards, and strategies for recommending effective evaluation processes. Explore topics such as prompt injection, alignment, and the concept of "garbage in, garbage out" in LLM applications. Connect with the MLOps community through various channels and access additional resources, including job boards and merchandise.

Evaluating LLM Applications - Insights from Shahul Es

MLOps.community
Add to list