Главная
Study mode:
on
1
Intro
2
Outline
3
Introduction
4
Data Science Project Stages
5
What is Distributed Web Scraping
6
Setting the Stage
7
Iteration - A Single Request
8
Looping Requests
9
Iteration 1 - Issues
10
Intermediate improvements
11
Iteration 2 - Issues
12
Distributed - Mental Model
13
Distributed - Controller
14
Distributed - Scraping Node
15
Distributed - Advantages
16
Distributed - Disadvantages
17
Distributed - Queues
18
Distributed - Message Brokers
19
Code Management
20
Useful Python Packages
21
Considerations
22
Conclusion
Description:
Explore distributed web scraping techniques in Python through this 24-minute PyCon US talk. Learn how to build a scalable and robust distributed web scraper to optimize large batch scraping jobs, reduce processing times, and enhance code durability. Discover the evolution from single requests to distributed systems, understand the advantages and disadvantages of distributed scraping, and gain insights into useful Python packages and considerations for implementation. Follow the speaker's journey through various iterations, addressing issues and implementing improvements along the way. Access accompanying slides for a comprehensive overview of the distributed web scraping process, from mental models to practical implementation using controllers, scraping nodes, queues, and message brokers.

Distributed Web Scraping in Python

PyCon US
Add to list
0:00 / 0:00