Play all

Introduction

About Rustam

Pipeline definition

Data overview

tabular form

parallel processing

Paralysis

File Structure

Local Runner

Cloud Runner

Data Flow

File System

Build Success

Takeaways

Description:

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Explore the evolution of data pipelines in this 48-minute conference talk from Devoxx. Learn how open-source technologies like Apache Beam and Apache Airflow have revolutionized data processing, offering flexibility and cost-effectiveness compared to traditional monolithic stacks. Discover how to schedule and run both streaming and batch data processing jobs using the same underlying code. Follow a practical demonstration of building a data pipeline that connects Apache Kafka, Hadoop Flink, and Hive, and see how to easily transition to Pub/Sub, Dataflow, and BigQuery by modifying a few lines of Java in Apache Beam. Gain insights into deploying these solutions across various cloud platforms, including Oracle Cloud. Presented by Rustam Mehmandarov, a Java Champion and Google Developers Expert for Cloud, this talk covers pipeline definition, data overview, parallel processing, file structures, local and cloud runners, and key takeaways for implementing flexible, scalable data pipelines using open-source technologies. Read more

Using Open Source Tech to Swap Out Components of Your Data Pipeline

Devoxx

Add to list

#Conference Talks #Devoxx #Programming #Programming Languages #Java #Cloud Computing #Apache Airflow #Microservices #Apache Kafka #Data Science #Big Data #Apache Beam #Data Processing #Data Engineering #Data Pipelines

0:00 / 0:00