Главная
Study mode:
on
1
Intro
2
A typical data science workflow
3
Data Processing in Python
4
Challenges with Data Processing
5
Task 1: Toast 100 slices of bread
6
Sequential Processing
7
Parallel Processing
8
Task 2: Brew coffee
9
Synchronous Execution
10
Practical Considerations
11
Amdahl's Law and Parallelism
12
Multiprocessing vs Multithreading
13
Initialize Submission List
14
Using ThreadPoolExecutor
15
Initialize Python modules
16
Initialize image resize process
17
Initialize File List in Directory
18
Using List Comprehensions
19
Using Process PoolExecutor
Description:
Explore techniques to accelerate data processing in this 30-minute EuroPython 2020 conference talk. Learn about common bottlenecks in data science workflows and how to overcome them using parallel and asynchronous programming with Python's concurrent.futures module. Discover the differences between sequential and parallel processing, synchronous and asynchronous execution, and when to apply these concepts in network I/O operations and computation-driven workloads. Gain practical insights into implementing parallelism and asynchronous programming to optimize data processing pipelines, allowing more focus on extracting value from data. Through real-life analogies, understand concepts like Amdahl's Law, multiprocessing vs multithreading, and practical implementations using ThreadPoolExecutor and ProcessPoolExecutor. Suitable for data scientists, engineers, and anyone with basic Python knowledge interested in improving data processing efficiency.

Speed Up Your Data Processing

EuroPython Conference
Add to list
0:00 / 0:00