Programming exercise 1.
The goal of this exercise is to write a program implementing the
(exponential time) reconstruction attack we saw in class which you can
find described in Chapter 8 of the book DR linked from the course
webpage - you can also look at the original paper linked here:
http://www.wisdom.weizmann.ac.il/%7Edinuri/mypapers/db_privacy.pdf
More in details, the goal of the exercise is to define a
function "attack" which takes in input a dataset D consisting
of n records with 1 binary attribute and generate a database D’ which
is at hamming distance less than 4E from D. The function attack can
access the database D only using queries that are protected via
uniform noise with perturbation E=sqrt(n).
Test your attack by defining a function which checks that the dataset
that the function attack outputs is at most at hamming
distance 4E from D. I suggest that you repeat
this test on several datasets D. To make it feasible I suggest you to
use very small values of n, for example n=10 and n=12.
Please, summarize the results of the test with some plot that can be
used to visualize them (e.g. you could print the hamming distance for
each testcase, etc.)
Programming exercise 2.
The goal of this exercise is to write a program implementing the
algorithm RandomizedResponse that I presented - you can find it on the
slides. I remind you that in this algorithm the input q:X→{0,1} is a
predicate, and we are interested in the average number of record
satisfying this predicate.
Write a plotting function visualizing a bar plot comparing several runs of
this algorithm on two different datasets. In this case we want to see
that as we consider more and more runs of the algorithm the
privacy_loss variable tends to be experimentally bound by ϵ.
Additionally, we want some plotting function that could help us
visualize the accuracy guarantee. To do this, write a plotting
function that plot several output of the algorithm on a single
database, and a single query, set a threshold alpha based on values
of β accordingly to the accuracy theorem for differential privacy, and
use this threshold to show how many points are above or below the
threshold (counting them can also be helpful). Repeat several tests
for different values of the parameters: n, ϵ, β .
NOTE: I am aware that the two programming exercises are not fully
specified, especially in the plotting part, and this is
intentional. Part of the exercise is trying to understand the concepts
of privacy and accuracy guarantees and how to manipulate them and I
think the best way to do this is by trial and error.
Requirements for the submission: please submit a zip file containing
both your code and few plots visualizing both the privacy and accuracy
guarantees.
Theoretical exercise 1
Suppose that we answer n counting queries q_1,...,q_n which are
eps-differentially private, using laplace noise and sequential
composition. Let's denote a_1,...,a_n the result of these queries on
the dataset D.
Prove that
Pr[max_i (|q_i(D) - a_i|> ln(k/b)*(n/eps) ] <= b
Theoretical exercise 2
Consider a noise addition mechanism similar to the laplace mechanism but where
we sample the noise variable Y from a distribution P_x defined on a bounded interval
[-x,+x] for some number x. That is:
Mech(q,D)= q(D) + y where y~P_x (y is an element in [-x,x] with probability described by P_x)
Prove that this mechanism is not (eps,0)-differentially private for any real number eps.