The grouping of record-pairs to determine which administrative records belong to the same individual is an important process in record linkage. A variety of grouping methods are used but the relative benefits of each are unknown. We evaluate a number of grouping methods against the traditional merge based clustering approach using large scale administrative data.
The research aimed to both describe current grouping techniques used for record linkage, and to evaluate the most appropriate grouping method for specific circumstances. A range of grouping strategies were applied to three datasets with known truth sets. Conditions were simulated to appropriately investigate one-to-one, many-to-one and ongoing linkage scenarios.
Results suggest alternate grouping methods will yield large benefits in linkage quality, especially when the quality of the underlying repository is high. Stepwise grouping methods were clearly superior for one-to-one linkage. There appeared little difference in linkage quality between many-to-one grouping approaches. The most appropriate techniques for ongoing linkage depended on the quality of the population spine and the underlying dataset.
These results demonstrate the large effect that the choice of grouping strategy can have on overall linkage quality. Ongoing linkages to high quality population spines provide large improvements in linkage quality compared to merge based linkages. Procuring or developing such a population spine will provide high linkage quality at far lower cost than current methods for improving linkage quality. By improving linkage quality at low cost, this resource can be further utilised by health researchers.