Preparing Pathology Data for Linkage

Main Article Content

Nadine Wiggins
Tim Albion
Brian Stokes
Matthew Jose


The Tasmanian Data Linkage Unit (TDLU) undertook a complex data linkage project in 2019 linking public and private pathology data to five disparate health datasets. Having linked pathology data previously, the unit was aware of the challenges it faced linking a large dataset covering a fourteen-year time span. The aim of this study was to use data-linkage to develop a Tasmanian dataset to quantify the burden and distribution of chronic kidney disease, including identifying barriers to dialysis treatment services.

Objectives and Approach
A cohort was selected from public and private providers of pathology services in Tasmania from 2004-2017 to support the establishment of a comprehensive researchable dataset. A linkage plan was developed that included detailed processes for cleaning and de-duplicating the pathology data prior to linkage. The larger private pathology dataset comprised 3.9 million records and data cleaning strategies were implemented. De-duplication created extensive clerical review and methods to reduce this work were devised and implemented as part of the linkage process.

De-duplication based on exact matches reduced the size of the dataset from 3.9 million to just over 520,000 individuals. Internal linkage of the dataset resulted in approximately 47,000 ‘groups’ eligible for review. Structured Query Language (SQL) queries were constructed and the number of groups eligible for review decreased by 42%. Further analysis was conducted, which resulted in an appropriate ‘cut off’ threshold being determined for clerical review and an estimate of false positive links remaining was calculated.

Conclusion / Implications
Methods of reducing the amount of manual clerical review can be incorporated into a linkage design when there is a thorough understanding of the characteristics and content of the dataset to be linked. The methods used for this linkage project will be utilised for future projects using pathology data.

Article Details

How to Cite
Wiggins, N., Albion, T., Stokes, B. and Jose, M. (2020) “Preparing Pathology Data for Linkage”, International Journal of Population Data Science, 5(5). doi: 10.23889/ijpds.v5i5.1480.

Most read articles by the same author(s)