Some definitions: training data, test data, prediction, generalization
4
Why clustering matters
5
Classification
6
Why regression matters
7
Visualization design principles
8
The presence of an adversary
9
The false positive problem
10
A machine learning based detection fail due to false positives
11
The need for interpretability
12
Another potential mitigation
Description:
Explore the critical intersection of security and data science in this 52-minute Black Hat conference talk by Joshua Saxe. Delve into the challenges and opportunities of applying data science to cybersecurity, including machine learning, data visualization, and scalable storage technologies. Learn about state-of-the-art data visualization techniques and the three main machine learning tasks: classification, clustering, and regression. Discover how these methods can be applied to attack detection, threat intelligence, malware analysis, and scalable malware analytics. Address security-specific data science challenges, such as detecting malicious activity in vast amounts of benign data and training machine learning models without access to zero-day attack data. Examine statistical methods designed to generalize to new attacks and minimize false positives. Investigate security data visualization techniques, including log visualization, malware analysis visualization, and threat intelligence visualization. Understand how machine learning approaches can bridge the semantic gap between low-level security data and high-level activity of interest. Gain insights into the emerging field of security data science, its potential applications, and effective approaches to overcome its unique challenges.
Read more
Why Security Data Science Matters and How It's Different