Главная
Study mode:
on
1
Intro
2
Pandas Limitations
3
How To Scale Out?
4
Distributed Computing Frameworks
5
Reducing Barrier to Entry
6
Introduction to Fugue
7
Fugue Transform
8
Bringing it to Spark
9
The DataFrame For Tests
10
Pandas Assumes Data Is Physically Together
11
Pandas Assumes Data Shuffle is Cheap
12
Pandas Assumes Eager Evaluation
13
Eager vs Lazy Evaluation
14
Expectation vs Reality
15
A Spark Solution Based On Traditional SOL Syntax
16
Fugue SQL
17
Leveraging Python
18
SQL Code Size & Execution Time
Description:
Explore the different approaches to scaling Python and Pandas code in this PyCon US talk. Learn about Fugue, an open-source unified interface for Pandas, Spark, and Dask that enables scale-agnostic compute workflows. Discover how to decouple logic and execution, allowing you to code in familiar languages like Python, Pandas, or SQL, and choose your preferred execution engine. Dive into the transform() function, which facilitates distributed execution of single functions. Understand Pandas limitations, distributed computing frameworks, and how Fugue reduces the barrier to entry for distributed computing. Compare eager and lazy evaluation, examine expectations versus reality in data processing, and explore Spark solutions using traditional SQL syntax. Gain insights into leveraging Python and SQL for efficient code size and execution time in large-scale data processing tasks.

Comparing the Different Ways to Scale Python and Pandas Code

PyCon US
Add to list