Play all

Introduction

Why Open Datasets

Amnesty

Security Locks Datasets

Malware Classification

Ember

The Name

The Dataset

The Training Set

The Data

Two Types of Features

Calculating Features

Categories of Features

Section Information

Strings

File Size

Feature Vectorization

Training a Model

Scoring the Model

Disclaimer

Code Base

Python Notebook

Feature Engineering

Semisupervised Learning

Offensive Research

Demo Time

Hat

Download Data

Packed Samples

Metadata

Description:

Explore an open-source malware classifier and dataset in this conference talk from BSidesSF 2018. Delve into the challenges of machine learning for static malware detection due to limited public datasets. Learn about a new open-source dataset of labels for diverse Windows PE files, including feature vectors for model building and a pre-trained model for research. Discover the reasoning behind feature selection and labeling, and witness the model's performance on real-world samples. Gain insights into the Ember dataset, its naming convention, and the training set composition. Examine two types of features, their calculation methods, and various categories such as section information, strings, and file size. Understand feature vectorization, model training, and scoring processes. Explore the code base, Python notebook, and feature engineering techniques. Investigate semisupervised learning and offensive research applications. Conclude with a live demonstration showcasing data download, packed samples analysis, and metadata examination. Read more

An Open Source Malware Classifier and Dataset

Security BSides San Francisco

Add to list

#Conference Talks #Security BSides #Information Security (InfoSec) #Cybersecurity #Computer Science #Machine Learning #Feature Engineering #Model Training #Malware Classification