“Hiding in the Crowd”: defining conceptual models for privacy-preserving linkage of place-based and personal data for public benefit research.
Main Article Content
Abstract
Objective
To address the challenge of linking identifiable but low-sensitivity place-based data with high-sensitivity, de-identified, health and socio-economic records. While also minimising the financial and carbon costs of computationally intensive environmental modelling.
Approach
For domain and legal experts, TRE leads and public contributors to co-develop a trusted system-wide governance model for linking these data.
Results
We identified the UK’s ‘Unique Property Reference Numbers’ (UPRNs) - a geo-coordinated national property ID number - as ideally suited for linking individuals to households and households to place-based data. Under Data Protection laws, UPRNs are considered inherently identifiable, yet public-domain research use is permitted as the data have low sensitivity and there is broad societal acceptance. Once linked with individuals’ records, additional controls are needed to preserve privacy. We developed a workflow for modelling place-based data at a national level by a single agency and then, using trusted third parties, de-identifying and sharing minimised derived extracts into TREs for linkage and analysis. To maintain confidentiality, the UPRNs of the population of interest are masked by matched ‘control’ UPRNs drawn from the wider population. We will discuss permutations of the model, proof-of-concept implementations in UK TREs and public views on the models proposed.
Conclusions
The challenge can be addressed by minimising the data flowing between national generators of place-based data and TREs using trusted third parties and masking data to maintain confidentiality.
Implications
Relatively small adjustments to existing TRE workflows principles could enable new research possibilities without requiring re-evaluation of current governance norms or high-cost infrastructure changes.