How Are Linkage Results Using Privacy-Preserving Record Linkage Different?

Main Article Content

Michael Jarrett
Brent Hills
Yinshan Zhao
Adrian Brown
Sean Randall
James Boyd
Anna Ferrante
Kimberlyn McGrail

Abstract

Introduction
Privacy-Preserving Record Linkage (PPRL) presents opportunities to improve privacy protection when performing record linkage on the most sensitive data. Currently our linkage agency performs all linkages in clear text, but expansion of data sources is now including extremely sensitive data, such as justice data. Understanding that specific circumstances may demand different approaches to linkage, we evaluated a PPRL algorithm implemented through the LinXmart software. This is the first real-world evaluation of PPRL in British Columbia and among the first in Canada.


Objectives and Approach
Our standard linkage method is probabilistic and relies on rules established by analysts to determine accepted links. Datasets are linked to a population spine (N=8,440,442) containing all current and past residents of the province. LinXmart was configured to link to the top weighted candidate above a predetermined confidence threshold. We evaluated performance by comparing the standard method to PPRL for three increasingly complex (messy) datasets. Initial results on the simplest/cleanest dataset informed an iterative process to improve implementation of PPRL.


Results
Overall linkage rates were lower for standard linkage (81%) compared to PPRL (90%). Records with a unique ID linked at similarly high rates in clear-text and PPRL, while the performance of PPRL with records without the unique ID varied depending on the exact parameters chosen for the match threshold and field comparisons.


Conclusion / Implications
This work suggests that for datasets that include a well-populated unique identifier, PPRL can be implemented in real-world linkages without a substantial drop-off in linkage quality. Messier data require careful tuning of linkage parameters to match the performance of clear linkage. PPRL may best be used in cases where clear text identifiers cannot be shared, and where some degradation in linkage rates is acceptable.

Article Details

How to Cite
Jarrett, M., Hills, B., Zhao, Y., Brown, A., Randall, S., Boyd, J., Ferrante, A. and McGrail, K. (2020) “How Are Linkage Results Using Privacy-Preserving Record Linkage Different?”, International Journal of Population Data Science, 5(5). doi: 10.23889/ijpds.v5i5.1541.

Most read articles by the same author(s)

1 2 3 4 5 6 > >>