Главная
Study mode:
on
1
Agenda
2
: Why do we need LLM evaluation?
3
Common evaluation axes
4
Why eval is more critical in Gen AI use cases
5
Why enterprises are often blocked on effective LLM evaluation
6
Common approaches to LLM evaluation
7
OSS benchmarks + metrics
8
LLM-as-a-judge
9
Annotation strategies
10
How can we do better than manual annotation strategies?
11
How data slices enable better LLM evaluation
12
How does LLM eval work with Snorkel?
13
Building a quality model
14
Using fine-grained benchmarks for next steps
15
Workflow overview review
16
Workflow—starting with the model
17
Workflow—Using an LLM as a judge
18
Workflow—the quality model
19
Chatbot demo
20
Annotating data in Snorkel Flow demo
21
Building labeling functions in Snorkel Flow demo
22
LLM evaluation in Snorkel Flow demo
23
Snorkel Flow jupyter notebook demo
24
Data slices in Snorkel Flow demo
25
Recap
26
Snorkel eval offer!
27
Q&A
Description:
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Explore the critical aspects of evaluating Large Language Model (LLM) performance for enterprise use cases in this comprehensive 57-minute video presentation. Delve into the nuances of LLM evaluation, learn techniques for assessing response accuracy at scale, and discover methods for identifying areas requiring additional fine-tuning. Gain insights into common challenges and approaches in LLM evaluation, understand the importance of data-centric evaluation methods, and see practical demonstrations of evaluation techniques using Snorkel AI's platform. Follow along as experts discuss topics ranging from OSS benchmarks and metrics to using LLMs as judges, and explore how data slices can enhance evaluation processes. Witness real-world applications through demos of chatbot evaluation, data annotation, and quality model building in Snorkel Flow.

How to Evaluate LLM Performance for Domain-Specific Use Cases

Snorkel AI
Add to list
0:00 / 0:00