COVID-19 transmission and infection: linkage of COVID-19 Infection Survey, Test and Trace, and Patient Demographics Survey.

Main Article Content

Leah Maizey
Josie Plachta
Holly Clarke
Sarah Collyer
Sarah Cummins
Gabriela De La Serna
Shelley Gammon
Ben Harries
Elizabeth Pereira
Caroline Youell


Data linkage was conducted between the Office for National Statistics’ Covid Infection Survey (CIS), the Department of Health and Social Care’s Test and Trace (T&T) and NHS’ Personal Demographics Service (PDS) datasets. Linked data was required to provide reliable estimates of rates of COVID-19 transmission and infection used to inform policy regarding the ongoing pandemic.

The CIS was created to track infection rates in the UK population. Linking CIS participants to positive tests in T&T helped improve these estimates. Linkage to PDS was required to attach NHS number to these datasets to facilitate further linkages that could also be used to inform Government about the spread of the virus. Multiple approaches were used to link the data. Initially, T&T was linked to itself via a series of strict matchkeys to cluster records belonging to the same individual, to create a person level identifier. Subsequent linkage of CIS-PDS, T&T-PDS and CIS-T&T involved deterministic linkages with matchkeys designed and applied independently. A probabilistic (Fellegi-Sunter scoring) method was used to link CIS-PDS and CIS-T&T. Additional, associative links were created between CIS and T&T records that had matched to the same PDS record but had not matched to each other.

The accuracy of CIS-PDS and CIS-T&T linkages was high (recall and precision >98%; all 95% lower confidence intervals >93%). A quality assessment of T&T-PDS is underway, as are relevant bias analyses.

As a result of this linkage, COVID-19 analysts have access to enriched datasets linked to compare previously separated variables, with confidence that the linkage method used was to required quality standards. The linked data has been used to provide crucial evidence to Government on infection and re-infection rates. Subsequent linkages have enabled analysts to explore risk factors associated with different variants of the virus, vaccination status and hospital episodes. Improvements continue to be made.

Article Details

How to Cite
Maizey, L., Plachta, J., Clarke, H., Collyer, S., Cummins, S., De La Serna, G., Gammon, S., Harries, B., Pereira, E. and Youell, C. (2022) “ and Patient Demographics Survey”., International Journal of Population Data Science, 7(3). doi: 10.23889/ijpds.v7i3.2094.

Most read articles by the same author(s)

1 2 > >>