The popularity of cloud computing has made outsourced databases prevalent in real-world applications. To protect data security, numerous encrypted outsourced databases have been proposed for this paradigm. However, the maintenance of encrypted databases has scarcely been addressed. In this paper, we focus on a typical maintenance task — functional dependency (FD) discovery. We develop novel FD protocols in encrypted databases while guaranteeing minimal leakages: nothing is revealed besides the database size and the actual discovered FDs. As far as we know, we are the first to formally define secure FD discovery with minimal leakage.
We present two oblivious FD protocols and prove them secure in the presence of the persistent adversary (monitoring processes on the server). The first protocol leverages Oblivious RAM (ORAM) and is suitable for dynamic databases. The second protocol relies on oblivious sorting and is more practical in static databases due to high parallelism. We also present a thorough experimental evaluation of the proposed methods.
As organizations struggle with processing vast amounts of information, outsourcing sensitive data to third parties becomes a necessity. Various cryptographic techniques are used in outsourced database systems to ensure data privacy while allowing for efficient querying. This thesis proposes a definition and components of a new secure and efficient outsourced database system, which answers various types of queries, with different privacy guarantees in different security models.
This work starts with the survey of five order-preserving and order-revealing encryption schemes that can be used directly in many database indices, such as the B+ tree, and five range query protocols with various tradeoffs in terms of security and efficiency. The survey systematizes the state-of-the-art range query solutions in a snapshot adversary setting and offers some non-obvious observations regarding the efficiency of the constructions.
The thesis then proceeds with $\mathcal{E}\text{psolute}$ - an efficient range query engine in a persistent adversary model. In $\mathcal{E}\text{psolute}$, security is achieved in a setting with a much stronger adversary where she can continuously observe everything on the server, and leaking even the result size can enable a reconstruction attack. $\mathcal{E}\text{psolute}$ proposes a definition, construction, analysis, and experimental evaluation of a system that provably hides both access pattern and communication volume while remaining efficient.
The dissertation concludes with $k\text{-a}n\text{o}n$ - a secure similarity search engine in a snapshot adversary model. The work presents a construction in which the security of $k\text{NN}$ queries is achieved similarly to OPE / ORE solutions - encrypting the input with an approximate Distance Comparison Preserving Encryption scheme so that the inputs, the points in a hyperspace, are perturbed, but the query algorithm still produces accurate results. Analyzing the solution, we run a series of experiments to observe the tradeoff between search accuracy and attack effectiveness. We use TREC datasets and queries for the search, and track the rank quality metrics such as MRR and nDCG. For the attacks, we build an LSTM model that trains on the correlation between a sentence and its embedding and then predicts words from the embedding. We conclude on viability and practicality of the solution.
As organizations struggle with processing vast amounts of information, outsourcing sensitive data to third parties becomes a necessity. To protect the data, various cryptographic techniques are used in outsourced database systems to ensure data privacy, while allowing efficient querying. A rich collection of attacks on such systems has emerged. Even with strong cryptography, just communication volume or access pattern is enough for an adversary to succeed.
In this work we present a model for differentially private outsourced database system and a concrete construction, $\mathcal{E}\text{psolute}$, that provably conceals the aforementioned leakages, while remaining efficient and scalable. In our solution, differential privacy is preserved at the record level even against an untrusted server that controls data and queries. $\mathcal{E}\text{psolute}$ combines Oblivious RAM and differentially private sanitizers to create a generic and efficient construction.
We go further and present a set of improvements to bring the solution to efficiency and practicality necessary for real-world adoption. We describe the way to parallelize the operations, minimize the amount of noise, and reduce the number of network requests, while preserving the privacy guarantees. We have run an extensive set of experiments, dozens of servers processing up to 10 million records, and compiled a detailed result analysis proving the efficiency and scalability of our solution. While providing strong security and privacy guarantees we are less than an order of magnitude slower than range query execution of a non-secure plain-text optimized RDBMS like MySQL and PostgreSQL.
This work has also been presented to Brown Security Reading Group. See presentation video.
In permissioned blockchain systems, participants are admitted to the network by receiving a credential from a certification authority. Each transaction processed by the network is required to be authorized by a valid participant who authenticates via her credential. Use case settings where privacy is a concern thus require proper privacy-preserving authentication and authorization mechanisms.
Anonymous credential schemes allow a user to authenticate while showing only those attributes necessary in a given setting. This makes them a great tool for authorizing transactions in permissioned blockchain systems based on the user's attributes. As in most setups of such systems where there is one distinct certification authority for each organization in the network, the use of plain anonymous credential schemes still leaks the association of a user to her issuing organization. Camenisch, Drijvers and Dubovitskaya (CCS 2017) therefore suggest the use of a delegatable anonymous credential scheme to also hide that remaining piece of information.
In this paper we improve the Camenisch et al. scheme and extend it with revocation and auditability; two functionalities that are necessary for real-world adoption. We present a complete protocol and provide its production-grade open-source implementation including the scheme and the proposed extensions, ready to be integrated with Hyperledger Fabric. Our distributed-setting performance measurements show that the integration of the scheme with Hyperledger Fabric, while incurring an overhead in comparison to the less privacy-preserving solutions, is practical for settings with stringent privacy requirements.
This work has started while on internship at IBM Zurich Research Lab. See official DAC scheme source code on IBM's github.
Motivation: The complexity of protein-protein interactions (PPIs) is further compounded by the fact that an average protein consists of two or more domains, structurally and evolutionary independent subunits. Experimental studies have demonstrated that an interaction between a pair of proteins is not carried out by all domains constituting each protein, but rather by a select subset. However, finding which domains from each protein mediate the corresponding PPI is a challenging task.
Results: Here, we present Domain Interaction Statistical POTential (DISPOT), a simple knowledge-based statistical potential that estimates the propensity of an interaction between a pair of protein domains, given their SCOP family annotations. The statistical potential is derived based on the analysis of more than 352,000 structurally resolved protein-protein interactions obtained from DOMMINO, a comprehensive database on structurally resolved macromolecular interactions.
Availability and implementation: DISPOT is implemented in Python 2.7 and packaged as an open-source tool. DISPOT is implemented in two modes, basic and auto-extraction. The source code for both modes is available on GitHub: github.com/korkinlab/dispot and standalone docker images on DockerHub: hub.docker.com/r/korkinlab/dispot. The web-server is freely available at dispot.korkinlab.org.
Database query evaluation over encrypted data has received a lot of attention recently. Order Preserving Encryption (OPE) and Order Revealing Encryption (ORE) are two important encryption schemes that have been proposed in this area. These schemes can provide very efficient query execution but at the same time may leak some information to adversaries. In this paper, we present the first comprehensive comparison among a number of important OPE and ORE schemes using a framework that we developed. We evaluate protocols that are based on these schemes as well. We analyze and compare them both theoretically and experimentally and measure their performance over database indexing and query evaluation techniques using not only execution time but also IO performance and usage of cryptographic primitive operations. Our comparison reveals some interesting insights concerning the relative security and performance of these approaches in database settings. Furthermore, we propose a number of improvements for some of these scheme and protocols. Finally, we provide a number of suggestions and recommendations that can be valuable to database researchers and users.
This work has received the "Most Reproducible Paper" award for exemplary work on experiments, in particular, making them easily reproducible. See award page and the certificate.
I present a dynamic, voluntary contribution mechanism, public good game and derive its potential outcomes. In each period, players endogenously determine contribution productivity by engaging in costly investment. The level of contribution productivity carries from period to period, creating a dynamic link between periods. The investment mimics investing in the stock of technology for producing public goods such as national defense or a clean environment. After investing, players decide how much of their remaining money to contribute to provision of the public good, as in traditional public good games. I analyze three kinds of outcomes of the game: the lowest payoff outcome, the Nash Equilibria, and socially optimal behavior. In the lowest payoff outcome, all players receive payoffs of zero. Nash Equilibrium occurs when players invest any amount and contribute all or nothing depending on the contribution productivity. Therefore, there are infinitely many Nash Equilibria strategies. Finally, the socially optimal result occurs when players invest everything in early periods, then at some point switch to contributing everything. My goal is to discover and explain this point. I use mathematical analysis and computer simulation to derive the results.
This project expands the functionality of the Massachusetts Technology, Talent, and Economic Reporting System (MATTERS) for the Massachusetts High Technology Council (MHTC), a protechnology advocacy and lobbyist organization, through the addition of two new features, namely, an Application Program Interface (API) and the Metric Builder. This API defines a communication protocol between MATTERS and other computational-based systems. Extensive API documentation was developed. The Metric Builder is a tool that allows users to create their own indices with custom rules out of existing MATTERS metrics. This empowers them to track individual states' performance using their own custom models.
The purpose of this IQP project is to scientifically develop profitable systems and indicators for trading in the markets. The project consists of nine individually developed strategies, which were quantitatively analyzed for profitability and then combined into a system of systems. Each individual system or indicator was given defined rules and then allocated simulated money to trade. Two types of systems were mainly developed, predictive and confirmative, leading to a system of systems that incorporated a predictive layer and a confirmative layer in the decision to take positions.