A new data linkage software shows the feasibility and accuracy of linking research databases without revealing any personal health information. The ‘Brain-CODE Link’ software was used to combine a large dataset of routinely collected information with a smaller dataset of primary collected data. This privacy-preserving record linkage comprised the largest data linkage performed with the Brain-CODE Link software to date.

There are vast amounts of routinely collected data in clinical practice. These data are incredibly useful for understanding how health systems work and giving researchers high-level insight into health care metrics. On the other hand, these data can miss the nuances seen in smaller datasets that focus on specific health conditions. Ideally, researchers would be able to combine the breadth of routine data with the depth of project-specific data for a richer dataset. However, not all parties who collect data are necessarily allowed to store personal health information.

This study focussed on two datasets: (1) A routine dataset stored at ICES, and (2) a project-specific dataset stored at the Ontario Brain Institute (OBI). OBI is not permitted to store personal health information. Rather, they stored encrypted health information – this limited the type of data linkage method that could be used. The Brain-CODE Link software was designed for this exact scenario and was tested to confirm if a record linkage could be performed without needing to decrypt health information.

The Brain-CODE Link software succeeded in linking the databases with 99.8% accuracy. Researchers expect that future applications of this software will be more iterative, involving real-time comparisons via ongoing linkages, rather than only at the end of research projects.

Alisia Southwell, lead author, contextualizes the use-case of this method: “In an ideal world, all researchers who want to combine datasets should have the privacy authorizations needed to simply share direct identifiers. In the real world, independent researchers may not have the institutional support required to store direct identifiers. We hope that this software can help make data linkage even more accessible.”


Click here to read the full open access article

Alisia Southwell, MPH Student, Sunnybrook Health Sciences Centre

Southwell, A., Bronskill, S., Gee, T., Behan, B., Evans, S., Mikkelsen, T., Theriault, E., Nylen, K., Lefaivre, S., Chong, N., Azimaee, M., Tusevljak, N., Lee, D. and Swartz, R. (2022) “Validating a novel deterministic privacy-preserving record linkage between administrative & clinical data: applications in stroke research”, International Journal of Population Data Science, 7(4). doi: 10.23889/ijpds.v7i4.1755.