Play all

Intro

Data Ecosystem

Data Scientists

Data Infrastructure

Data Analysts

Bumper Rail Model

Don't Build Your Own!!

What's in it for the Data Engineers?

Submitting a Spark Job

Can Abstract Many Spark System Configurations

Data Engineers Can Create Custom Operators

What's in it for the Analysts?

Building a Data Science Pipeline

Experiment

Jupyter Notebooks + Airflow

Parameterize

Getting involved with Apache Airflow

Description:

Explore the process of transforming a data science idea into a production-ready model using Apache Airflow in this 22-minute conference talk from Databricks. Learn how data engineers can build a flexible platform that satisfies the needs of various stakeholders, including data scientists, infrastructure engineers, and product owners. Discover how Apache Airflow serves as a collaborative tool between data scientists and infrastructure engineers, offering a pythonic interface that abstracts system complexities. Follow the journey of a single-machine notebook evolving into a cross-service Spark + Tensorflow pipeline, culminating in a canary-tested, hyper-parameter-tuned model deployed on Google Cloud Functions. Gain insights into Airflow's ability to connect different layers of a data team, enabling rapid results and efficient collaboration. Understand the benefits for both data engineers and analysts, including custom operator creation, job submission, and pipeline building. Delve into topics such as the data ecosystem, bumper rail models, and the advantages of using established tools over building from scratch. Read more

From Idea to Model: Productionizing Data Pipelines with Apache Airflow

Databricks

Add to list

#Programming #Cloud Computing #Apache Airflow #Data Science #Computer Science #Machine Learning #TensorFlow #Jupyter Notebooks #Data Engineering #Data Pipelines

0:00 / 0:00