Knowing where a person lives and who they live with can provide important insights into the effects of the local environment and household composition on their health.

When a patient registers with a GP practice, their home address is recorded and held in the NHS Electronic Health Records database. These addresses can be used by approved researchers to link the residence of every patient in a population to environmental information, as well as link people who share a household. But addresses are currently entered into GP records in an un-standardised way, making linkage to other datasets difficult. To standardise this, patient addresses can be matched to Unique Property Reference Numbers (UPRNs) in a process known as address matching.

A UPRN is the unique identifier for every addressable location in Great Britain. They are allocated by local authorities and made available at the national level by GeoPlace and Ordnance Survey. The UPRN database provides a standardised version of the address and the high-resolution geo-coordinates for the location of the property.

Representing an address by its UPRN means it can be linked to other datasets also containing UPRNs, and this can all be achieved whilst maintaining complete privacy for patients by using encrypted or pseudonymised identifiers. Researchers can then deduce that patients with the same UPRN live together, and also identify a patient’s property type and location more accurately than from their postcode alone.

UPRNs are now mandated for UK public sector datasets that include addresses, so there is an increasing requirement to include them in routinely collected data. However, this has not been accompanied by transparency and evaluation of the methods currently used to assign UPRNs.

If an address-matching algorithm is not accurate, an incorrect UPRN can result in the incorrect residential location being attributed to a patient and miscalculated estimates, such as exposure to air pollution. It can also result in misassignment of occupants of the same household, which can introduce errors in studies where the household occupancy is the factor of interest.

The authors addressed these challenges by developing a new open-source address-matching algorithm called ASSIGN with the aim of transparently describing and quality assuring it, and examining potential biases in match results.

ASSIGN compares addresses in the GP patient record with the Ordnance Survey UPRN database ‘AddressBase Premium’, one element at a time and determines whether there is a match. The algorithm mirrors human pattern recognition, so it allows for certain character swaps, spelling mistakes and abbreviations which are common in addresses that are not entered into GP records in a standardised way.

Dr Gill Harper, lead author and UKRI Innovation Fellow at Queen Mary University London commented that “Linking places to people is a core element of the UK government’s geospatial strategy. We’ve shown that ASSIGN is a highly accurate technique which enables linking health records to geospatial information. The algorithm will enable researchers to confidently conduct studies into the effects of a person’s household, location and surroundings on their health.”

The authors evaluated ASSIGN using London and Welsh addresses and found an encouragingly high accurate match rate for both.

ASSIGN is already enabling research to address important public health questions about health and place, including questions relevant to the COVID-19 pandemic.  As ASSIGN is open-source, it is available to other researchers to link health data to information about places and households to improve population health.


Click here to read the full open access article

Dr Gill Harper, Barts and the London School of Medicine and Dentistry, Queen Mary University of London

Harper, G., Stables, D., Simon, P., Ahmed, Z., Smith, K., Robson, J. and Dezateux, C. (2021) “Evaluation of the ASSIGN open-source deterministic address-matching algorithm for allocating Unique Property Reference Numbers to general practitioner-recorded patient addresses”, International Journal of Population Data Science, 6(1). doi: 10.23889/ijpds.v6i1.1674.