Главная
Study mode:
on
1
Introduction
2
The Task
3
Understanding the Task
4
Network Latency
5
File Size
6
The API
7
The Get API
8
Disclaimers
9
Synchronous
10
Multithreading
11
Coding
12
Main Loop
13
Performance
14
Why is this happening
15
Things to keep in mind
16
Multiprocessing
17
Multiprocessing code
18
Iterating over pages
19
Downloader
20
Speed Improvements
21
Async IO
22
List Call
23
Async IO Task
24
Different Libraries
25
UV Loop
26
Setup
27
IO HTTP
28
ItAll Files
29
Download Files
30
Summary
31
Multi Processing
32
Threading
33
Workflow
34
Interprocess communication overhead
35
Pagination token
36
Combo results
37
The real summary
38
Lessons learned
39
Thank you
Description:
Explore efficient strategies for downloading a billion small files using Python in this EuroPython 2019 conference talk. Dive into three concurrent downloading mechanisms: multithreading, multiprocessing, and asyncio. Learn design best practices, debugging techniques, error handling, and performance comparisons for each approach. Gain insights into network latency, file size considerations, and API interactions. Examine code examples and performance metrics to understand the trade-offs between different methods. Discover how to optimize your workflow, handle pagination, and improve download speeds. Apply lessons learned to choose the most suitable library for large-scale file downloading tasks.

Downloading a Billion Files in Python

EuroPython Conference
Add to list