Play all

Intro

A typical data science workflow

Data Processing in Python

Challenges with Data Processing

Task 1: Toast 100 slices of bread

Sequential Processing

Parallel Processing

Task 2: Brew coffee

Synchronous Execution

Practical Considerations

Amdahl's Law and Parallelism

Multiprocessing vs Multithreading

Initialize Submission List

Using ThreadPoolExecutor

Initialize Python modules

Initialize image resize process

Initialize File List in Directory

Using List Comprehensions

Using Process PoolExecutor

Description:

Explore techniques to accelerate data processing in this 30-minute EuroPython 2020 conference talk. Learn about common bottlenecks in data science workflows and how to overcome them using parallel and asynchronous programming with Python's concurrent.futures module. Discover the differences between sequential and parallel processing, synchronous and asynchronous execution, and when to apply these concepts in network I/O operations and computation-driven workloads. Gain practical insights into implementing parallelism and asynchronous programming to optimize data processing pipelines, allowing more focus on extracting value from data. Through real-life analogies, understand concepts like Amdahl's Law, multiprocessing vs multithreading, and practical implementations using ThreadPoolExecutor and ProcessPoolExecutor. Suitable for data scientists, engineers, and anyone with basic Python knowledge interested in improving data processing efficiency.

Speed Up Your Data Processing

EuroPython Conference

Add to list

#Conference Talks #EuroPython #Programming #Programming Languages #Python #Computer Science #Concurrent Programming #Parallel Programming #Data Science #Data Processing #High Performance Computing #Parallel Computing #Multiprocessing #Asynchronous Programming

0:00 / 0:00