Developing an Evaluation Toolkit for Data Linkage Pipelines

Main Article Content

Leah Quinn

Abstract

Objectives
The linkage evaluation toolkit aims to develop criteria that can be used to evaluate data linkage pipelines and methods. This toolkit will not only allow assessment of a single pipeline, but will also facilitate the comparison between additional pipelines; enabling analysts to identify their strengths and weaknesses.


Methods
There is currently little information available on standardised evaluation metrics for linkage pipelines, except the quality of the resulting linked dataset. Relevant teams across the organisation were consulted in a series of workshops to identify existing methods of evaluating pipelines across 8 key areas; accuracy, bias in linkage error, flexibility in input datasets, scalability, efficiency, platform suitability, ease of use, and the ease and transparency of quality assurance. Existing methods, tools, and criteria for evaluating each area were discussed, and where these did not exist, discussion aimed to identify potential solutions.


Results
Evaluation criteria for each area were identified. Whilst some areas had quantitative metrics such as precision and recall, other areas had more qualitative criteria, such as clarity of the methods documentation and ease of auditing. Methods to quantify these criteria are being explored, to allow the evaluation of a single linkage pipeline or a comparison of multiple pipelines. Once finalised, the toolkit will initially be used to compare two linkage pipelines used to link the same datasets. We aim to identify which pipeline performs best across each criterion. We then aim to be able to combine the strengths of each pipeline to create a new, higher performing method.


Conclusion
The evaluation toolkit will allow analysts to evaluate the strengths and weaknesses of linkage pipelines, in both their use and impact on linked outputs. The methods can be used to improve existing pipelines, and to support the development and evaluation of new methodologies in future.

Article Details

How to Cite
Quinn, L. (2025) “Developing an Evaluation Toolkit for Data Linkage Pipelines”, International Journal of Population Data Science, 10(4). doi: 10.23889/ijpds.v10i4.3165.

Most read articles by the same author(s)