Explore data privacy techniques and protection of personally identifiable information in this 27-minute talk from Databricks. Compare offensive and defensive approaches, learning about k-anonymity, quasi-identifiers, and various methods like suppression, perturbation, obfuscation, encryption, tokenization, and watermarking. Discover elementary code examples for implementing these techniques when third-party products are unavailable. Examine approaches to minimize data exfiltration risks and understand how Databricks Delta can assist in making datasets privacy-ready. Gain insights into the long-term implications of different privacy methods and their effects on statistical usefulness, re-identification risks, data schema, format preservation, and read/write performance.
Data Privacy Techniques with Apache Spark - Defensive and Offensive Approaches