Data Flywheel: an initial manually labeled dataset enables self- improvement with user data
3
Roadmap
4
Training the annotators is crucial
5
Sources of Labor
6
Service Companies
7
Software
8
Data Storage
9
Database scalable storage and retrieval of structured data
10
Data Lake
11
What goes where
12
Data Versioning
13
Level 2
14
Motivational Example We have to train a photo popularity predictor every night.
15
Task Dependencies
16
Makefile limitations
17
Luigi and Airflow
Description:
Explore data management essentials for deep learning projects in this comprehensive lecture. Delve into data labeling, storage, versioning, and processing techniques. Learn about the data flywheel concept, annotator training, labor sources, and service comparisons. Discover database scalability, data lake organization, and versioning strategies. Examine task dependencies and workflow management tools like Luigi and Airflow. Gain practical insights for implementing efficient data pipelines in machine learning projects.
Data Management - Full Stack Deep Learning - March 2019