Private Convex
Empirical Risk Minimization and High-dimensional Regression
Daniel Kifer, Adam Smith, Abhradeep Thakurta
In Conference on Learning Theory (COLT), 2012.
Abstract:
We consider differentially private algorithms for convex
empirical risk minimization (ERM). Differential privacy is
a recently introduced notion of privacy which guarantees that an
algorithm's output does not depend on the data of any individual in
the dataset. This is crucial in fields that handle sensitive data,
such as genomics, collaborative filtering, and economics. Our
motivation is the design of private algorithms for sparse
learning problems, in which one aims to find solutions (e.g.,
regression parameters) with few non-zero coefficients. To this end:
- We significantly extend the analysis of the ``objective
perturbation'' algorithm of (Chaudhuri et al., 2010) for convex ERM
problems. We show that their method can be modified to use less
noise (be more accurate), and to apply to problems with hard
constraints and non-differentiable regularizers. We also give a
tighter, data-dependent analysis of the additional error
introduced by their method.
A key tool in our analysis is a new nontrivial limit theorem for
differential privacy which is of independent interest: if a sequence
of differentially private algorithms converges, in a weak sense, then the
limit algorithm is also differentially private.
In particular, our methods give the best known algorithms for
differentially private linear regression. These methods work in settings where the
number of parameters $p$ is less than the number of samples $n$.
- We give the first two private algorithms for sparse regression
problems in high-dimensional settings, where $p$ is much larger
than $n$. We analyze their performance for linear regression:
under standard assumptions on the data, our algorithms have
vanishing empirical risk for $n = poly(s,\log p)$ when there
exists a good regression vector with $s$ nonzero
coefficients.
Our algorithms demonstrate that randomized
algorithms for sparse regression problems can be both stable
and accurate -- a combination which is impossible for
deterministic algorithms.
Available files:
- COLT Proceedings version: pdf
Last modified: Tue Nov 6 16:01:08 EST 2012