USENIX Enigma 2019 - When the Magic Wears Off: Flaws in ML for Security Evaluations
Description:
Explore a critical analysis of machine learning-based malware classification in this conference talk from USENIX Enigma 2019. Delve into the endemic issue of inflated results caused by spatial and temporal biases in experimental design. Learn about proposed space and time constraints for more accurate experiment design, and discover a new metric for evaluating classifier performance over time. Examine TESSERACT, an open-source evaluation framework that enables fair comparison of malware classifiers in realistic settings. Gain insights from a case study involving two well-known malware classifiers tested on a dataset of 129,000 applications, revealing result distortions due to experimental bias and demonstrating the benefits of performance tuning.
When the Magic Wears Off - Flaws in Machine Learning for Security Evaluations