Overview and Importance of Data Quality for Machine Learning Tasks
2
Acknowledgements
3
Data Preparation in Machine Learning
4
Challenges with Data Preparation
5
Data Quality Analysis can help..
6
Different personas in enterprise setting..
7
To put it all together
8
To summarize
9
Data Quality Metrics
10
Common Data Cleaning Techniques
11
Is data cleaning always helpful for ML pipeline?
12
Insights: Impact of different cleaning techniques
13
In conclusion
14
Why it happens?
15
Why Imbalanced Classification is Hard?
16
Evaluation Metrics for Imbalanced Datasets Accuracy Paradox
17
Factors affecting class imbalance
18
Affecting Factor: Imbalance Ratio
19
Affecting Factor: Overlap
20
Affecting Factor: Smaller sub-concepts
21
Affecting Factor: Dataset Size
22
Affecting Factor: Combined Effect
23
Modelling Strategies: Types
24
Resampling Techniques
25
Bayes Impact index
Description:
Explore the critical role of data quality in machine learning tasks through this comprehensive conference talk from KDD 2020. Delve into data preparation challenges, quality analysis techniques, and their impact on enterprise settings. Examine common data cleaning methods and their effectiveness in ML pipelines. Investigate the complexities of imbalanced classification, including evaluation metrics, affecting factors, and modeling strategies. Gain valuable insights from IBM Research experts on navigating data quality issues to improve machine learning outcomes.