
220 Harvard Journal of Law & Technology [Vol. 31
A. Differential Privacy
Differential privacy is a field pioneered by researchers at Microsoft
and Apple, alongside a handful of academics. The animating principle
behind differential privacy, as articulated by its original proponent Cyn-
thia Dwork, is that responses to dataset queries should not provide
enough information to identify any individual included in the dataset.
12
Differential privacy is ultimately a mathematical definition of privacy
that considers whether a particular person’s data has a significant impact
on the answer to a dataset query; if it does not, then the data will not
identify the person it describes.
13
The identifiability of information is (as
we have undoubtedly discovered)
14
not a binary question, but a proba-
bilistic one. How much of an impact the data must have on the query to
be excluded — and by extension how likely it is that a query would lead
to personal identification — depends on a “privacy budget” set by the
holder of the data, which defines how much information leakage is con-
sidered acceptable.
15
Setting an appropriate privacy budget is therefore crucial to the
proper use of differential privacy techniques. And because of the way
that differential privacy works, there is an inherent tradeoff between the
level of privacy afforded to data subjects and the accuracy of the query
results. This is because differential privacy is performed primarily by
injecting noise (randomness) into a dataset in such a way that the outputs
or conclusions generated by the data are minimally impacted while pri-
vacy protection is enhanced.
16
The amount of noise introduced will de-
pend on the specified amount of acceptable data leakage and the way the
data will be used. Just as data leakage will never reach zero, neither will
the amount of error introduced by the noise.
Apple has developed more sophisticated differential privacy tech-
niques that incorporate hashing and subsampling into its methodology as
12. See Cynthia Dwork, Differential Privacy, 33 INT’L COLLOQUIUM ON AUTOMATA,
LANGUAGES AND PROGRAMMING 1 (2006).
13. Matthew Green, What Is Differential Privacy?, A
FEW THOUGHTS ON CRYPTOGRAPHIC
ENGINEERING (June 15, 2016), https://blog.cryptographyengineering.com
/2016/06/15/what-is-differential-privacy/ [https://perma.cc/73YU-RKJZ].
14. For example, Netflix’s publicly released viewing dataset for an algorithmic design con-
test turned out to be insufficiently anonymized because researchers discovered that the dataset
could be used to re-identify certain viewers when combined with publicly-available data. This
led to inquiries by the FTC and a California class-action lawsuit against Netflix. See Andrew
Chin & Anne Klinefelter, Differential Privacy as a Response to the Reidentification Threat: The
Facebook Advertiser Case Study, 90 N.C. L. R
EV. 1417, 1424 (2012). In another case, Latanya
Sweeney published a study in which she merged supposedly anonymized Massachusetts worker
hospital records with easily acquired voter registration records, and found she was able to iden-
tify the health records of then-Governor William Weld; she later published “a broader study
finding that 87% of the 1990 U.S. Census population could be identified using only gender, zip
code, and full date of birth.” Id. at 1425.
15. Green, supra note 13.
16. Id.