Protecting Patient Privacy in Genomic Analysis

Patient genomes are interpretable only in the context of other genomes. However, privacy concerns over genetic data oftentimes deter individuals from contributing their genomes to scientific studies and prevent researchers from sharing their data with the scientific community. In this talk, I will describe how we can leverage secure multiparty computation techniques from modern cryptography to perform useful scientific computations on genomic data while protecting the privacy of the participants' genomes. In multiple real scenarios, our methods successfully identified the disease-causing genes and even discovered previously unrecognized disease genes, all while keeping nearly all of the participants' most sensitive genomic information private. We believe that our techniques will help make currently restricted data more readily available to the scientific community and enable individuals to contribute their genomes to a study without compromising their personal privacy. 
The material from this talk is based on joint works with Gill Bejerano, Bonnie Berger, Johannes A. Birgmeier, Dan Boneh, Hyunghoon Cho, and Karthik A. Jagadeesh.

Each patient has a vector  where   = 1 if patient has a rare variant in gene  Gene

Rare Disease Diagnosis
Patients with Kabuki Syndrome Each patient has a list of 200-400 rare variants over ≈20,000 genes ⋮

Gene
Works well for Mendelian (monogenic) diseases (estimated to affect ≈10% of individuals) Each patient has a vector  where   = 1 if patient has a rare variant in gene General techniques apply to many different scenarios for diagnosing Mendelian diseases Patients with Kabuki Syndrome

Gene
Identify causal gene for a rare disease given a small patient cohort Identify patients with the same rare functional mutation at two different hospitals Identify rare functional variants that are present in the child but in neither of the parents

Gene
Identify causal gene for a rare disease given a small patient cohort Step 2: Garbler "encrypts" the circuit (i.e., "garbles" the circuit) Garbler chooses two different encryption keys for every wire in the circuit Each key is associated with a possible wire value

Garbled truth table randomly permuted
Invariant: Given just a single key for each input wire, evaluator can learn a single key for the output wire Step 2: Garbler "encrypts" the circuit (i.e., "garbles" the circuit)

Garbled truth table randomly permuted
Invariant: Given just a single key for each input wire, evaluator can learn a single key for the output wire Step 2: Garbler "encrypts" the circuit (i.e., "garbles" the circuit)

Garbled truth table randomly permuted
Invariant: Given just a single key for each input wire, evaluator can learn a single key for the output wire Step 2: Garbler "encrypts" the circuit (i.e., "garbles" the circuit) Garbler can send garbled truth tables and keys for its inputs  Step 3: Evaluator uses "oblivious transfer" to obtain keys for its input garbler evaluator For each wire corresponding to evaluator's input, the garbler has two keys For each input wire, evaluator wants to obtain key corresponding to its input value the same rare functional mutation at two different hospitals Simple frequency-based algorithms, but techniques enabled us to discover a previously unidentified pathogenic variant General techniques apply to many different scenarios for diagnosing Mendelian diseases Identify rare functional variants that are present in the child but in neither Birgmeier-Boneh-Bejerano [Science 2017]Experimental benchmarks for identifying causal gene in small disease cohort • Simulated two non-colluding entities with 1 server on East Coast and 1 on West CoastEnd-to- Encrypt the output key (for the output wire) with the two input keys (for the input wires) Yao's Protocol for Two-Party Computation [Yao82] symmetric key -does not reveal what the output bit is Yao's Protocol for Two-Party Computation [Yao82] Question: how does evaluator obtain keys for its input?

⋮
At the end of the oblivious transfer protocol, garbler learns nothing about which key evaluator obtains, and evaluator learns exactly one of the two keys Keys communicated using OT (garbler does not know which keys are transmitted) garbler evaluator Protocol is very efficient; communication is the bottleneck Two-round protocol for secure two-party communication Many improvements are possible to achieve better performance