2010 Census: Summary of Publications (approximate counts)
6
We performed a database reconstruct and re-identification attack for all 308.745538 people in the 2010 Census
7
The basic idea of differential privacy: Uncertainty (noise) protects privacy
8
The Census Bureau is using differential privacy for the 2020 Census.
9
How much noise do we add? That's a policy decision.
10
We planned to create a Disclosure Avoidance System that dropped into the Census production system.
11
The Disclosure Avoidance System allows the Census Bureau to enforce global confidentiality protections
12
Our DP mechanism protects histograms of person types. Census "block"
13
Running the block-by-block algorithm with spark
14
In 2018 we invented the TopDown Algorithm (TDA)
15
Key challenges in monitoring spark
16
We created our own monitoring framework
17
Cluster List
18
Each DAS run is a "mission"
19
Mission Report
20
System Load
21
Free Memory
22
In Summary
Description:
Explore the innovative use of Apache Spark and differential privacy in protecting respondent confidentiality for the 2020 US Census in this 29-minute talk. Dive into the challenges of balancing data accuracy with privacy protection while distributing $675 billion in federal funds and apportioning the US House of Representatives. Learn about the custom-built Spark application that performs millions of optimizations using mixed integer linear programs on a massive cluster. Discover the design of this differential privacy application and the sophisticated monitoring systems implemented in Amazon's GovCloud to oversee multiple clusters and thousands of application runs. Gain insights into the TopDown Algorithm (TDA) and how it addresses key challenges in monitoring Spark. Understand the importance of the Disclosure Avoidance System in enforcing global confidentiality protections for census data.
Using Apache Spark and Differential Privacy for 2020 Census Data Protection