Главная
Study mode:
on
1
Intro
2
Data Ecosystem
3
Data Scientists
4
Data Infrastructure
5
Data Analysts
6
Bumper Rail Model
7
Don't Build Your Own!!
8
What's in it for the Data Engineers?
9
Submitting a Spark Job
10
Can Abstract Many Spark System Configurations
11
Data Engineers Can Create Custom Operators
12
What's in it for the Analysts?
13
Building a Data Science Pipeline
14
Experiment
15
Jupyter Notebooks + Airflow
16
Parameterize
17
Getting involved with Apache Airflow
Description:
Explore the process of transforming a data science idea into a production-ready model using Apache Airflow in this 22-minute conference talk from Databricks. Learn how data engineers can build a flexible platform that satisfies the needs of various stakeholders, including data scientists, infrastructure engineers, and product owners. Discover how Apache Airflow serves as a collaborative tool between data scientists and infrastructure engineers, offering a pythonic interface that abstracts system complexities. Follow the journey of a single-machine notebook evolving into a cross-service Spark + Tensorflow pipeline, culminating in a canary-tested, hyper-parameter-tuned model deployed on Google Cloud Functions. Gain insights into Airflow's ability to connect different layers of a data team, enabling rapid results and efficient collaboration. Understand the benefits for both data engineers and analysts, including custom operator creation, job submission, and pipeline building. Delve into topics such as the data ecosystem, bumper rail models, and the advantages of using established tools over building from scratch. Read more

From Idea to Model: Productionizing Data Pipelines with Apache Airflow

Databricks
Add to list
0:00 / 0:00