Garbage In, Garbage Out: How Purportedly Great Machine Learning Models can be Screwed Up by Bad Data
Description:
Explore a 28-minute Black Hat conference talk that delves into the critical impact of data quality on machine learning models for malicious URL detection. Examine sensitivity results from a deep learning model trained and tested across three different URL data sources. Analyze surface differences between data sets and investigate higher-level feature activations identified by the neural network in some data sets but not others. Gain insights into how seemingly robust machine learning models can be compromised by poor quality input data, emphasizing the importance of data integrity in cybersecurity applications.
Garbage In, Garbage Out - How Purportedly Great Machine Learning Models Can Be Screwed Up by Bad Data