you inherit the complexity of the data-set by virtue. so the more of a mess it is, the more cleaning up you need to do.
4
normalization, wild-cards Vs domain:key and the 'Big Lesson learned
5
when we've been used in red-teaming, the tool has kicked ass!
6
I've been using the Chris Roberts' metric to track the growth of the tool.
7
the elephant in the room
8
just check the email-field of any recent SQL dump, to verify that.
9
731,308,683 Unique Documents Indexed
10
of course 42 people used the zip code for the State of Michigan Department of Treasury'.
Description:
Explore a conference talk that delves into the complexities of handling third-party data sets, focusing on normalization techniques, wildcard usage versus domain:key approaches, and lessons learned from red-teaming experiences. Learn about the growth of a data analysis tool, measured using Chris Roberts' metric, and discover intriguing insights about data quality issues, including the prevalence of unusual entries in large-scale SQL dumps. Gain valuable knowledge on inheriting and managing complex data sets, cleaning processes, and the importance of data integrity in cybersecurity and information analysis.