Главная
Study mode:
on
1
About me
2
What is Frontera
3
What is Terra
4
Motivation
5
Single threaded
6
Single integration
7
Real time
8
Unique content
9
Metadata storage
10
Architecture
11
Scrapping
12
Simple spider
13
Use cases
14
Architecture distributed
15
Features
16
Requirements
17
Quick start
18
Spanish crawl
19
Future plans
20
Questions
Description:
Explore the open-source Frontera framework for large-scale web crawling in this EuroPython 2015 conference talk. Discover how to build real-time distributed web crawlers and website-focused ones using Frontera's customizable URL metadata storage, crawling strategies management, and transport layer abstraction. Learn about integrating Frontera with Scrapy, Kafka, and HBase to create a powerful distributed crawler. Gain insights into the framework's architecture, features, and use cases, including a demonstration of collecting statistics from the Spanish internet. Understand the motivation behind Frontera, its single-threaded and real-time capabilities, and future development plans. Perfect for developers interested in advanced web crawling techniques and large-scale data collection.

Frontera - Open Source Large-Scale Web Crawling Framework

EuroPython Conference
Add to list
00:00
-01:18