Linking medical records across multiple healthcare settings is critical to delivering high-quality medical care and conducting valid research. Linkage is very challenging in countries without national identifiers, and remains an issue elsewhere because of imperfect recording and transfer of identifiers into databases. Researchers need to understand linkage software behavior.
Objectives and Approach
Our objective is to compare 4 record linkage software packages and various algorithms using real data: independent inpatient (n=69,523) and outpatient (n=176,154) datasets from a major medical system without a universal identifier. We conducted 30 trials, varying the software package (LinkPlus, LinXmart/CUPLE [Curtin University], Merge ToolBox and the R RecordLinkage package), the algorithm (deterministic or probabilistic; exact or inexact string matching) and the variables used for matching (first and last name, and gender, and these three plus full date of birth), using year of birth as blocking variable. We evaluated performance using the weights assigned to each of the 132M record pairs.
Despite substantial similarity, the packages and algorithms did not behave identically. The number of weights assigned to the compared pairs ranged over trials from 4 to 7,925,493, leading to different decisions when declared matches were pairs whose weights exceed a threshold. In all trials with exact string matching and three matching variables, 30,805 pairs received the maximum weight; with four matching variables, 30,536 pairs did. However, software and algorithms varied in assigning the 2nd, 3rd, and 4th rank weights. For example, some algorithms assigned higher weight to pairs matching on first name and gender but not on last name, and others to pairs matching on first name and last name but not gender. Ordering of the weights also reflected differences in treatment of missing values.
Unlike previous linkage work, we have analyzed weights assigned to all possible record pairs in each trial, which allows us to describe exactly what match decisions are possible. Because weights are not comparable across trials, focusing on ranks is crucial. Software documentation does not yield ready insight into differences.