Play all

Agenda

: Why do we need LLM evaluation?

Common evaluation axes

Why eval is more critical in Gen AI use cases

Why enterprises are often blocked on effective LLM evaluation

Common approaches to LLM evaluation

OSS benchmarks + metrics

LLM-as-a-judge

Annotation strategies

How can we do better than manual annotation strategies?

How data slices enable better LLM evaluation

How does LLM eval work with Snorkel?

Building a quality model

Using fine-grained benchmarks for next steps

Workflow overview review

Workflow—starting with the model

Workflow—Using an LLM as a judge

Workflow—the quality model

Chatbot demo

Annotating data in Snorkel Flow demo

Building labeling functions in Snorkel Flow demo

LLM evaluation in Snorkel Flow demo

Snorkel Flow jupyter notebook demo

Data slices in Snorkel Flow demo

Recap

Snorkel eval offer!

Q&A

Description:

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Explore the critical aspects of evaluating Large Language Model (LLM) performance for enterprise use cases in this comprehensive 57-minute video presentation. Delve into the nuances of LLM evaluation, learn techniques for assessing response accuracy at scale, and discover methods for identifying areas requiring additional fine-tuning. Gain insights into common challenges and approaches in LLM evaluation, understand the importance of data-centric evaluation methods, and see practical demonstrations of evaluation techniques using Snorkel AI's platform. Follow along as experts discuss topics ranging from OSS benchmarks and metrics to using LLMs as judges, and explore how data slices can enhance evaluation processes. Witness real-world applications through demos of chatbot evaluation, data annotation, and quality model building in Snorkel Flow.

How to Evaluate LLM Performance for Domain-Specific Use Cases

Snorkel AI

Add to list

#Computer Science #Artificial Intelligence #Generative AI #Programming #Software Development #Software Testing #Performance Testing #Benchmarking #Machine Learning #Snorkel AI

0:00 / 0:00