Главная
Study mode:
on
1
- Video overview
2
- Check out DataCamp! sponsored
3
- Setup
4
Task #1: Scrape the infobox from Toy Story 3 wiki page save in python dictionary
5
Task #2: Scrape infobox for all movies in List of Disney Films save as list of dictionaries
6
- Robots.txt Are you allowed to scrape a site?
7
- Task #2: Scrape infobox for all movies in List of Disney Films save as list of dictionaries
8
- Save & Load dataset checkpoint JSON file
9
Task #3: Clean our data!
10
- Task #3.1: Strip out all references [1],[2],etc from HTML
11
- Task #3.2: Split up the long strings
12
- Task #3.3: Examine errors we are getting
13
- Task #3.4: Convert “Running time” field to an integer
14
- Task #3.5: Convert “Budget” & “Box office” fields to floats
15
- Task #3.6: Convert dates into datetime objects
16
- Saving our data again using Pickle
17
Task #4: Attach IMDB, Metascore, and Rotten Tomatoes scores to dataset working with APIs
18
Task #5: Save final dataset as a JSON file and as a CSV file
Description:
Learn how to solve real-world data science tasks using Python and Beautiful Soup in this comprehensive tutorial. Scrape Wikipedia pages to create a dataset on Disney movies while covering a wide range of Python and data science topics. Master web scraping with BeautifulSoup, clean data effectively, test code using Pytest, implement pattern matching with regular expressions, work with dates using the datetime library, save and load data with the Pickle library, and access data from APIs using the Requests library. Follow along with hands-on tasks, including scraping movie information, cleaning and processing data, and integrating external movie ratings. By the end of this tutorial, gain practical experience in creating a robust movie dataset from scratch using various Python libraries and data science techniques.

Solving Real World Data Science Tasks With Python Beautiful Soup - Movie Dataset Creation

Keith Galli
Add to list
0:00 / 0:00