Play all

Intro

Pandas Limitations

How To Scale Out?

Distributed Computing Frameworks

Reducing Barrier to Entry

Introduction to Fugue

Fugue Transform

Bringing it to Spark

The DataFrame For Tests

Pandas Assumes Data Is Physically Together

Pandas Assumes Data Shuffle is Cheap

Pandas Assumes Eager Evaluation

Eager vs Lazy Evaluation

Expectation vs Reality

A Spark Solution Based On Traditional SOL Syntax

Fugue SQL

Leveraging Python

SQL Code Size & Execution Time

Description:

Explore the different approaches to scaling Python and Pandas code in this PyCon US talk. Learn about Fugue, an open-source unified interface for Pandas, Spark, and Dask that enables scale-agnostic compute workflows. Discover how to decouple logic and execution, allowing you to code in familiar languages like Python, Pandas, or SQL, and choose your preferred execution engine. Dive into the transform() function, which facilitates distributed execution of single functions. Understand Pandas limitations, distributed computing frameworks, and how Fugue reduces the barrier to entry for distributed computing. Compare eager and lazy evaluation, examine expectations versus reality in data processing, and explore Spark solutions using traditional SQL syntax. Gain insights into leveraging Python and SQL for efficient code size and execution time in large-scale data processing tasks.

Comparing the Different Ways to Scale Python and Pandas Code

PyCon US

Add to list

#Conference Talks #PyCon US #Programming #Programming Languages #Python #Domain-Specific Languages (DSL) #SQL #pandas #Data Science #Data Processing #Data Transformation #Computer Science #Distributed Computing #Dask