Sarah Scheffler

About me

I am a Ph.D. student in the BUSec group working with Prof. Mayank Varia.

I am a studying applied cryptographer, including secure messaging, key derivation, multi-party computation, and hash combiners.

I am the current organizer of the Multi Party Computation Reading Group for BUSec and the Hariri Institute, and am an active member and blogger for the Cyber Security, Law, and Society Alliance.

I am interested in improving the law's understanding of technology, and technology's understanding of the law and society. I believe that the current lack of mutual understanding between these fields is harming society and individuals caught in the crossfire. I hope to help by gaining and spreading an understanding of both fields. I also have a lot of opinions about machine learning, and how you shouldn't use it for things that bear high costs to individuals when the predictions are wrong, and at the very least you should be using algorithmic fairness techniques.



Publications

From Soft Classifiers to Hard Decisions: How fair can we be?
Ran Canetti, Aloni Cohen, Nishanth Dikkala, Govind Ramnarayan, Sarah Scheffler, Adam Smith
ACM FAT* 2019 / arxiv version

The Unintended Consequences of Email Spam Prevention
Sarah Scheffler, Sean Smith, Yossi Gilad, Sharon Goldberg
PAM 2018



Current Research

Blacklisting Encrypted Messages

Is it possible to have a secure messaging system that maintains all the normal end-to-end encryption secure messaging guarantees, including confidentiality against a malicious server, while also allowing the server to keep a large blacklist of messages that, if sent, can be flagged by the receiver as containing malicious content? Yes, it is possible, but can it be done efficiently enough to use in practice? Can we use it to stop the spread of fake news on end-to-end encrypted platforms? Stay tuned!



Privacy Against Inference-Based Device Fingerprinting

Device fingerpinting methods are employed by websites in order to identify unique devices. The older, "traditional" device fingerprinting methods involve direct requests for information from a client's machine: measuring information in the HTTP request header, sending a cookie, or embedding additional web requests (e.g. for invisible pixels). All this information can be identified and blocked from being sent, in order to preserve user privacy. However, the newer wave of device fingerprinting methods use indirect approaches, by asking the user to perform a seemingly irrelevant computation and then gleaning information from the result. These methods are more insidious, more difficult to block, and somewhat sneaky in the sense that even an expert client may not be able to tell that it's happening. These methods are only described in general terms in privacy policies. More research is necessary to determine how to address these methods in a way that respects consumer privacy.



Older Projects

Resilient Password-Based Key Derivation Functions

paper / poster

Older password-based key derivation functions like PBKDF2 rely on repeated iteration of a single hash function to force the attacker to spend more resources. But thanks to Bitcoin, the cost of specialized hardware to do small, repeated functions, has gone down dramatically. Newer PBKDFs like scrypt add memory as a resource that attackers must spend in order to compute efficiently. We extend this resource consumption model to a PBKDF that consumes many resources, like CPU, storage, cache, or chip access in order to correctly derive the key from the password. Paper is forthcoming.



Proposing Safeguards for Government Risk-Assessment Systems

paper

This paper analyzes governmentally-regulated risk assessment systems by evaluating them on three axes: We examine the costs of the systems on individuals, the system holders, and society, we analyze the inputs to the system, and we describe the transparency (or lack thereof) within the systems. Using three case studies—the Unified Passenger system (UPAX), the COMPAS Risk & Need Assessment System, and the FICO score—we develop a standardized set of potential technical requirements to mitigate abuse and ensure individuals are treated fairly while remaining within the constraints levied by the system’s purpose.



Dismantling the False Distinction between Traditional Programming and Machine Learning in Lethal Autonomous Weapons

paper

Contrary to my expectations at the start of the project, the type of programming used to create lethal autonomous weapons does not inherently affect their ability to comply with International Humanitarian Law. Traditional programming, machine learning, and artificial intelligence are distinct, overlapping techniques in programming autonomous weapons, and the use of one technique over another should not affect the standard used to determine whether a given lethal autonomous weapon complies with the Law of Armed Conflict. Rather, the same (strict) standards should apply to all lethal autonomous weapons, and their outward performance in accordance with the law should be the sole determinant of legality.



From Soft Classifiers to Hard Decisions: How Fair Can We Be?

paper / poster

Question: When your machine learning algorithm is "calibrated" for different protected groups, can this calibrated score be post-processed in a "fair" way? Answer: In general, no. But you can achieve some partial fairness properties (such as equalizing the positive predictive value across groups), or you can defer on some inputs and guarantee good expected fairness properties for the non-deferred outputs. Paper and poster forthcoming.



The Unintended Consequences of Email Spam Prevention

paper / website / talk / poster

To combat DNS cache poisoning attacks and exploitation of the DNS as an amplifier in DoS attacks, many recursive DNS resolvers are configured as "closed" and refuse to answer queries made by hosts outside their organization. This work presents a technique to induce DNS queries within an organization, using the organization's email service and the Sender Policy Framework (SPF) email spam-checking mechanism. We use this technique to study closed resolvers, verifying that most closed DNS resolvers have deployed common DNS poisoning defense techniques, but showing that SPF is often deployed in a way that allows an external attacker to cause the organization's resolver to issue numerous DNS queries to a victim IP address by sending a single email to any address within the organization's domain.



Proactively-secure Accumulo with Cryptographic Enforcement

At the MIT Lincoln Laboratory, as assistant research staff, I worked in the Secure and Resilient Systems and Technology group within the Cybersecurity and Information Sciences division to assist in the implementation, testing, and release of a library that adds confidentiality and integrity guarantees to the Accumulo database, protecting it against a malicious server or sysadmin. Earlier in the project, I also implemented Oblivious RAM (Path ORAM) for Accumulo.



Quantifying Latent Fingerprint Quality

talk

As a capstone project at Harvey Mudd College, I worked with a team of four students for the MITRE Corporation on a project to design, implement, and test a system that uses image processing and machine learning techniques to evaluate the suitability of crime scene fingerprint images for identification by Automated Fingerprint Identification Systems.



Statistical Testing of Cryptographic Entropy Sources

talk

As a summer undergraduate research fellow at the National Institute of Standards and Technology (NIST), I worked with Dr. Allen Roginsky in the Computer Security Division to improve NIST's statistical tests for entropy sources for use in cryptographic random number generators. I also made adjustments to the process for generating large primes used in cryptography.