Play all

Introduction

Naive Bayes: A Little History

Naive Bayes: Advantages and Disavantages

About the dataset: YouTube Spam Collection

Pre-requisites

Naive Bayes: An example

Naive Bayes: The Equation

Loading the Dataset

Train/test split

Feature extraction: Bag of words approach

Bag of words approach-Training

Bag of Words approach-Testing and Evaluation

Feature Extraction: TF-IDF Approach

TF-IDF Approach: Training

TF-IDF Approach: Testing and Evaluation

Tuning parameters: Laplace smoothing

Description:

Explore a 30-minute EuroPython Conference talk on building a Naive Bayes text classifier using scikit-learn. Learn about the algorithm's simplicity and effectiveness in classifying large, sparse datasets like text documents. Discover preprocessing techniques such as text normalization and feature extraction. Follow along as the speaker demonstrates model construction using the spam/ham YouTube comment dataset from the UCI repository. Gain insights into the Naive Bayes algorithm's history, advantages, and disadvantages. Dive into practical examples, equations, and implementation steps, including dataset loading, train/test splitting, and feature extraction using bag-of-words and TF-IDF approaches. Conclude with techniques for model evaluation and parameter tuning through Laplace smoothing.

Building a Naive Bayes Text Classifier with scikit-learn

EuroPython Conference

Add to list

#Conference Talks #EuroPython #Computer Science #Machine Learning #Programming #Programming Languages #Python #scikit-learn #Feature Extraction #Text Classification #Bag of Words #Naive Bayes Classifier #Spam Detection #Data Science #Text Mining #TF-IDF

0:00 / 0:00