Linking Sensitive Data – Applications, Techniques, and Challenges

Main Article Content

Peter Christen
Thilina Ranbaduge
Rainer Schnell


The linking of sensitive databases containing personal identifying information across organisations is an increasingly important task in application domains ranging from health and social science research to national censuses. Various techniques have been developed to facilitate the linking of sensitive databases while at the same time preserving the privacy of individuals represented in these databases.

Objectives and approach
We present several case studies where the privacy-preserving linking of sensitive databases is crucial, and then discuss the advantages and limitations of existing algorithms and techniques to link sensitive databases. We discuss privacy techniques such as Bloom filter encoding, hashing, and secure multi-party computation, from the point of view of a linkage practitioner. We highlight those aspects that are important when selecting or implementing a privacy-preserving linkage technique within practical applications.

Conceptually, linkage techniques can be evaluated across three main dimensions linkage quality, scalability to linking large or multiple databases, and the privacy protection provided by a technique. From a practical perspective, however, several other dimensions are crucial, including the availability of software or ease of implementation, technical knowledge available in an organisation, and the suitability of techniques for a given linkage scenario. Our analysis of a diverse range of linkage techniques has shown that currently no technique provides an adequate solution along all conceptual as well as all practical dimensions.

More research is required to develop novel techniques that facilitate the privacy-preserving linkage of large sensitive databases across organisations; including new encoding methods and cryptanalysis attacks (where until now most attacks have neglected the attack vectors that likely occur in practice), and novel evaluation measures to assess the privacy provided by linkage techniques. We encourage practitioners to be aware of the identified limitations – as well as the opportunities – of existing privacy-preserving linkage techniques and carefully assess the technical and organisational requirements of such techniques within their institution.

Article Details

How to Cite
Christen, P., Ranbaduge, T. and Schnell, R. (2020) “Linking Sensitive Data – Applications, Techniques, and Challenges”, International Journal of Population Data Science, 5(5). doi: 10.23889/ijpds.v5i5.1475.

Most read articles by the same author(s)

1 2 > >>