Play all

Introduction

The Task

Understanding the Task

Network Latency

File Size

The API

The Get API

Disclaimers

Synchronous

Multithreading

Coding

Main Loop

Performance

Why is this happening

Things to keep in mind

Multiprocessing

Multiprocessing code

Iterating over pages

Downloader

Speed Improvements

Async IO

List Call

Async IO Task

Different Libraries

UV Loop

Setup

IO HTTP

ItAll Files

Download Files

Summary

Multi Processing

Threading

Workflow

Interprocess communication overhead

Pagination token

Combo results

The real summary

Lessons learned

Thank you

Description:

Explore efficient strategies for downloading a billion small files using Python in this EuroPython 2019 conference talk. Dive into three concurrent downloading mechanisms: multithreading, multiprocessing, and asyncio. Learn design best practices, debugging techniques, error handling, and performance comparisons for each approach. Gain insights into network latency, file size considerations, and API interactions. Examine code examples and performance metrics to understand the trade-offs between different methods. Discover how to optimize your workflow, handle pagination, and improve download speeds. Apply lessons learned to choose the most suitable library for large-scale file downloading tasks.

Downloading a Billion Files in Python

EuroPython Conference

Add to list

#Conference Talks #EuroPython #Programming #Programming Languages #Python #Computer Science #Concurrent Programming #High Performance Computing #Parallel Computing #Multiprocessing #Multithreading #Asyncio