Главная
Study mode:
on
1
Introduction
2
Why Open Datasets
3
Amnesty
4
Security Locks Datasets
5
Malware Classification
6
Ember
7
The Name
8
The Dataset
9
The Training Set
10
The Data
11
Two Types of Features
12
Calculating Features
13
Categories of Features
14
Section Information
15
Strings
16
File Size
17
Feature Vectorization
18
Training a Model
19
Scoring the Model
20
Disclaimer
21
Code Base
22
Python Notebook
23
Feature Engineering
24
Semisupervised Learning
25
Offensive Research
26
Demo Time
27
Hat
28
Download Data
29
Packed Samples
30
Metadata
Description:
Explore an open-source malware classifier and dataset in this conference talk from BSidesSF 2018. Delve into the challenges of machine learning for static malware detection due to limited public datasets. Learn about a new open-source dataset of labels for diverse Windows PE files, including feature vectors for model building and a pre-trained model for research. Discover the reasoning behind feature selection and labeling, and witness the model's performance on real-world samples. Gain insights into the Ember dataset, its naming convention, and the training set composition. Examine two types of features, their calculation methods, and various categories such as section information, strings, and file size. Understand feature vectorization, model training, and scoring processes. Explore the code base, Python notebook, and feature engineering techniques. Investigate semisupervised learning and offensive research applications. Conclude with a live demonstration showcasing data download, packed samples analysis, and metadata examination. Read more

An Open Source Malware Classifier and Dataset

Security BSides San Francisco
Add to list