Modernization of Record Linkage At ICES

Main Article Content

Mahmoud Azimaee
Nelson Chong
Charlotte Ma
Gordon Fehringer
Gangamma Kalappa
Nan Wang
Cheng Qian
Marian Vermeulen

Abstract

Introduction
Probabilistic Record Linkage of large databases requires a substantial amount of time and resources, resulting in significant costs. In addition, the process is subject to error, particularly during manual grey area resolution of uncertain matched pairs.


Objectives and Approach
The objective of this semi-experimental desinged study was to compare the accuracy and efficiency of different record linkage approaches. Four different record linkage software packages were selected: AutoMatch, G-Link, SAS Data Quality (DataFlux) and LinxMart. A large data set with all required linkage variables (e.g., first and last name, date of birth and gender) and a common unique identifier with the ICES linkage spine (registry) was chosen to represent our ground truth. Four non-overlapping cohorts were randomly selected from this data source, representing small (n=10,000), medium (n=250,000) and large (n=5,000,000) data sets. Simulated errors were inserted into each cohort to represent a real linkage scenario.


The smallest cohort was used to run a complete record linkage for each software package. Where the software allowed for manual grey area resolution, linkage was replicated by two different linkage analysts who were blinded to the simulated errors included in the data set. The time spent by each analyst on processing, programming and manual grey area resolution was recorded. The larger cohorts were used to measure accuracy and processing time taken by each of the software packages. In order to analyse possible errors, detailed output from each software package was generated to compare accepted and rejected pairs with our ground truth. 


Results
This project is still ongoing. Evaluation of AutoMatch, G-Link and SAS Data Quality has largely been completed. The remaining analyses will be completed by August 2020.


Conclusion / Implications
The outcome of this project can inform the record linkage strategy at organizations and data centres such as ICES and help identify more efficient methods that preserve an acceptable level of accuracy for their needs.

Article Details

How to Cite
Azimaee, M., Chong, N., Ma, C., Fehringer, G., Kalappa, G., Wang, N., Qian, C. and Vermeulen, M. (2020) “Modernization of Record Linkage At ICES”, International Journal of Population Data Science, 5(5). Available at: https://ijpds.org/article/view/1620 (Accessed: 17January2021).

Most read articles by the same author(s)

1 2 > >>