Play all

Introduction

Demo Overview

Han Wang Introduction

First Example

Spark

Transformation

Fugue Code

Model

Field Workflow

Results

Physical

Prediction

Pandas vs Spark

Lazy evaluation of Spark

Partitioning

Testing

Fugue

Decouple logic and execution

Demo

Notebook extension

Conclusion

Recap

Description:

Explore scaling machine learning workflows to big data using Fugue in this 29-minute conference talk from KubeCon + CloudNativeCon Europe 2022. Learn how to transition from Pandas to distributed computing frameworks like Spark or Dask without reimplementing code. Discover Fugue's open-source abstraction layer that allows data scientists to write framework-agnostic and scale-agnostic code. Follow along as the speakers demonstrate porting native Python code to Spark or Dask with minimal changes, and witness the scaling of data compute from a single machine to a Spark cluster on Kubernetes. Gain insights into lazy evaluation, partitioning, testing, and decoupling logic from execution in big data workflows.

Scaling Machine Learning Workflows to Big Data with Fugue

CNCF [Cloud Native Computing Foundation]

Add to list

#Conference Talks #Data Science #Big Data #Computer Science #Machine Learning #Programming #Programming Languages #Python #pandas #Data Processing #Data Transformation #Functional Programming #Lazy Evaluation #Dask