Главная
Study mode:
on
1
Intro
2
Case Study
3
Data Validation
4
Common Validations
5
Great Expectations - Detailed Results
6
Great Expectations - Data Documentation
7
Pandera-Sample Code
8
Comparison of Validation Frameworks
9
Fugue - Basic Code
10
Combining Fugue and Pandera
11
Example Data - Food Sloth's Pricing
12
Validation by Partition
Description:
Explore data validation techniques for large-scale data pipelines in this 22-minute Databricks conference talk. Learn about the importance of data validation in interconnected data pipelines and compare popular frameworks like Great Expectations with lightweight alternatives. Discover how to extend Pandas-based validation libraries to Spark workflows using Fugue, an open-source framework. Gain insights into applying different validation rules for each partition in big data scenarios, addressing a common deficiency in current frameworks. Follow along with an interactive demo that combines Fugue and Pandera to create a flexible and efficient data validation solution for Spark. Understand the trade-offs between robust features and performance, and learn how to tailor your validation approach to your specific needs.

Fully Utilizing Spark for Data Validation with Fugue and Pandera

Databricks
Add to list
0:00 / 0:00