Project
For the very same reasons that big data is transforming modern life, it also presents a profound threat to privacy and the control of personal information. A major challenge associated with big data is to enable statistical analysis of complex data sets, without compromising the privacy of the individuals whose data they contain. Addressing this challenge is both necessary, since access to many data sources is restricted due to privacy concerns, and difficult, as numerous attacks on supposedly anonymized data demonstrate. This project will investigate the design and limitations of algorithms for the private, continual analysis of time-varying data sets. That is, it will study algorithms that release information about a data set as it is collected (say, in the form of a data stream from the web, or a long-term sociological study). The research will advance the state of the art in the private analysis of ``big''--massive, complex, time-varying--data. If successful, the project will provide enabling technologies that facilitate research in areas where access to sensitive data is limited by confidentiality concerns.
The project will focus on the design of algorithms that satisfy differential privacy---a rigorous notion of privacy that is widely studied in computer science and related fields. The privacy implications of sequential releases are still poorly understood, and relatively few of the algorithms developed in the extensive recent literature on private data analysis allow for sequential releases with high accuracy. The two major thrusts of the project are (1) algorithms for the "continual release" model, and (2) algorithms for the "local" model, which offers even stronger privacy guarantees. The work will provide novel algorithmic design techniques and understanding of complexity-theoretic limitations of algorithms for these models. The research will entail advances in related areas such as learning theory, statistical inference and streaming algorithms. The project will also include educational, outreach and work-force training activities designed to broaden the impact of the research.
This material is based upon work supported by the National Science Foundation under Grant No. IIS-1447700. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.