Главная
Study mode:
on
1
Intro
2
Data Quality in the Modern Data Stack
3
Three Approaches to Data Quality Monitoring
4
Ticket Sales Data
5
Setup Monitoring in Anomalo
6
Anomalo Monitoring
7
Chaos Library
8
Check Log
9
Visualizations: Severity & Explanation
10
Visualizations Distribution
11
Visualizations: Root Cause Analysis
12
Encode Features Automatically
13
Build a Supervised Model
14
Generate Visualizations Using SHAP Values
15
Challenges
16
Testing
17
Get Started in Databricks
18
DATA+AI SUMMIT 2022
Description:
Explore how unsupervised machine learning can revolutionize data quality monitoring in Databricks in this 37-minute conference talk. Delve into the limitations of traditional rules and metrics approaches, and discover a set of fully unsupervised machine learning algorithms designed to monitor data quality at scale. Learn about the algorithms' functionality, strengths, and weaknesses, as well as their testing and calibration processes. Gain insights into unsupervised data quality monitoring techniques, their advantages and challenges, and practical steps to implement them in Databricks. Examine real-world examples using ticket sales data, and understand how to set up monitoring in Anomalo. Investigate various visualizations, including severity, explanation, distribution, and root cause analysis. Explore the process of encoding features automatically, building supervised models, and generating visualizations using SHAP values. Address challenges in implementation and testing, and learn how to get started with these techniques in Databricks. Read more

Unsupervised Machine Learning for Scaling Data Quality Monitoring in Databricks

Databricks
Add to list
0:00 / 0:00