Cleaning and validating longitudinal maternal and child postcode histories from a national healthcare registry for environmental health research in London, UK

Main Article Content

Pia Hardelid [Joint First Author]
Glory Atilola [Joint First Author]
Amal Rammah
Bianca De Stavola
Tom Clemens
Steve Cunningham
Chris Dibben
Samantha Hajna
Alison Macfarlane
Ai Milojevic
Jonathon Taylor
Linda Wijlaars

Abstract

Aim
To create longitudinal postcode history datasets that allocate mothers to one postcode for each week of pregnancy and children to one postcode for each week of infancy for a study of air pollution and respiratory infections in infants.


Datasets
We used linked birth registrations and NHS birth notifications for all children born in London between 2010 and 2014, which constituted the spine for the Air Pollution, housing and respiratory tract Infections in Children: National Birth Cohort Study (PICNIC) study. The birth data were linked by NHS England to the Personal Demographics Service (PDS) in order to derive maternal and child postcode histories for each week of pregnancy and infancy.


Challenges
While the research team had extensive experience working with administrative data, including birth registrations and notifications, the postcode history data was a new resource and lacked meta-data, papers or reports from previous users. A substantial number of records were missing a move-in date, or both a move-in date and postcode, adding complexities when ascertaining an address history for study participants. Further, we encountered instances of incorrectly recorded postcodes and implausible numbers of postcodes recorded in a week.


Lessons learned
One half of children in this London-based cohort moved during infancy, and one third of their mothers moved during pregnancy. This highlights the importance of taking into account changes in residential address in studies examining the association between environmental exposures and health outcomes. Cleaned and validated longitudinal national address records are crucial for environmental health studies. However, they are also resource intensive, with implications for researchers and research funders.

Introduction

National and regional administrative datasets are highly valuable resources for the study of the impacts of environmental exposures on health outcomes [1]. Administrative datasets have many advantages over recruited cohort studies or surveys: they include everyone who is in contact with a particular service, thereby minimising selection bias; through linkage to other administrative databases, they allow for the follow-up of individuals over time and thus to examine lagged exposure-outcome effects; and given the large size of many administrative datasets, they allow for the estimation of small, but important, effect sizes usually of interest in environmental epidemiology and related research [2, 3]. To assign environmental exposures reflecting individuals’ residential movements, researchers need to link national longitudinal address registers that account for residential mobility to the relevant administrative data. In this article, we describe how we cleaned and validated longitudinal postcode histories for children born in London between 2010 and 2014 and their mothers; they were derived from the Personal Demographic Service by linkage to the National Health Service address register for England as part of a study of the impact of air pollution on respiratory infections in children.

Background

There is increasing research interest in examining the impact of the physical environment on health, particularly due to the changing profile of physical and built environments (e.g. air pollution, climate change, urban sprawl) globally [4, 5]. Exposure to adverse environments or pollutants during pregnancy and early childhood is of particular concern as these can impact health outcomes throughout the life course [610]. Administrative health data, collected via contact with primary, secondary or community care services have been successfully used to examine the health impacts of environmental exposures [1]. Many of these studies have involved linking environmental data to administrative data on specific events, such as births, hospital admissions or deaths, and these linked event data are then analysed using case-control, case-crossover, or time series designs [1115]. In such studies environmental exposure data are linked to environmental data via the residential address at the time of the health event.

Administrative data can also be used to create cohort studies to examine the health impact of long-term exposures to environmental threats. Such cohort designs are used to calculate relative incidence of events across different exposure levels and over long-term periods, including to assess the time-varying impact of environmental exposures across the life course [16, 17], an approach which, similar to case-crossover study designs [18], offers the advantage of controlling for individual-level confounding by design [19]. To define early and long-term environmental exposures, linkage between environmental data spanning several decades and some geo-identifier in the administrative data, usually the residential address or postcode (or zip code), is performed. Residential mobility needs to be taken into account to ensure environmental exposures can be accurately assigned to individuals at various periods of their life course [20, 21], particularly since individuals who move tend to have different exposure and health risk profiles than those who do not [22, 23]. Not allowing for residential moving may therefore lead to systematic bias.

This is challenging to do in studies using administrative data since researchers would need national, universal address registries that can be linked to environmental data to assign exposures at residential (or other relevant) addresses where exposures are deemed to take place. Further, any national address registry needs to be cleaned and validated for research purposes. This is different from traditional cohorts of recruited participants, where residential addresses are updated by the research team when sweeps of data are collected, or through other direct contact with cohort members.

In the UK, this type of national address assignment has been possible in Wales at the national level [24], and has been used in studies of the impact of greenspace access on mental health outcomes and of housing improvements on hospital admissions [25, 26]. In Scotland, address histories are available from GP registration data for the Scottish Longitudinal Study [27] (linked Census, vital statistics, health and education data for a 5% population subsample), as well as nationally (used in the PICNIC Scotland substudy [28]). In England, algorithms to clean residential address histories and link these to environmental data have been developed for recruited cohorts, including the Avon Longitudinal Study of Parents and Children (ALSPAC) [29]. Further, residential addresses from primary care data have been linked to a range of environmental datasets for the Connected Bradford data resource, which currently covers 300,000 people registered with primary care within the City of Bradford in the North of England [30].

The Personal Demographic Service (PDS) dataset is the national register of any individual, living or deceased, who has been registered with the National Health Service (NHS) in England. All individuals living in England are entitled to register with an NHS general practice without charge [31]. Children have been registered on PDS from birth since 2006 [32]. The PDS has, since the late 2000s, replaced previous NHS address registers, including the NHS Central Register, the National Health Applications and Infrastructure Services and NHS Numbers for Babies. Individuals can update their address through multiple means, by informing their NHS providers of their new address during any contact with NHS services. In this paper, we present a workflow for cleaning and validating maternal and child postcode histories from PDS for the Air Pollution, housing and respiratory tract Infections in Children: National Birth Cohort (PICNIC) study [28]. The goal of this procedure was to create longitudinal postcode history datasets that allocated mothers to one (cleaned) postcode for each week of pregnancy and children to one (cleaned) postcode for each week of infancy. In the UK, one postcode includes on average 15 addresses and a population of 40 individuals [33]. We also assessed the socio-demographic representativeness of the linked cohort relative to the original PICNIC cohort, the quality of its postcode data, and the characteristics of those who did not change postcode during pregnancy or infancy compared to those who did.

Data sources

The PICNIC study: London birth cohort

The spine for the PICNIC study in England consisted of linked Office for National Statistics (ONS) birth registrations and NHS birth notifications for all children born in England between 2005 and 2014. These birth data have also been linked to Hospital Episode Statistics Admitted Patient Care Data (HES APC) [34] for mothers and children via delivery and birth records. The process to establish this birth cohort was carried out by researchers at City, University of London, and has been described elsewhere [3537]. For this particular study, we used a subset of children born in Greater London (determined via the residential postcode recorded on birth registrations) between 1st January 2010 and 31st December 2014, since this is the time period and area for which highly spatio-temporally resolved air pollution data were available. Greater London was here defined by the area covered by the air pollution data used for the PICNIC study for London, which roughly equates to the area within the M25 motorway (the London Ring Road).

PDS data

The PICNIC birth cohort had been linked to PDS postcode histories data for mothers and children by NHS England using a deterministic algorithm based on dates of birth, postcode at birth/delivery and sex, and for children also their NHS number, a unique identifier within the English NHS. Each record in the PDS data referred to a recorded postcode for each individual on the PDS. Each individual therefore had multiple records, where each record included information on the postcode and the date on which the move was recorded on PDS (the move in date).

ONS postcode directory

We used the ONS postcode directory to check whether postcodes within the PDS were correctly recorded. The ONS Postcode Directory contains a list of all terminated and current UK postcodes [38]; all postcodes, including terminated postcodes, are geocoded since November 2000 [39] (see the ONS Postcode Directory User Guide, available on the ONS website). We used the May 2019 version of the Directory for this study.

Methods

Inclusion criteria

We included all singleton children born in Greater London (determined via the residential postcode recorded on birth registrations) between 1st January 2010 and 31st December 2014. Children born before 24 weeks’ gestation, with missing gestational age, with no linked HES APC data, and children stillborn were excluded. Our aim was to create longitudinal weekly postcode records during pregnancy and infancy (until age 51 weeks and 6 days inclusive) for each child. The estimated date of start of pregnancy is not available on birth records in the UK. We therefore estimated the week of conception by subtracting the gestational age from the date of birth, adding an additional two weeks [40]. We used International Organisation for Standardisation (ISO) weeks throughout to ensure we could later link in data on air pollution (which had to be aggregated from daily to weekly averages).

Sociodemographic characteristics for checking representativeness

Using ONS birth registration data, we derived the following covariates from the PICNIC London birth cohort: child’s year of birth, mother’s age at delivery (categorised as <20, 20-24, 25-29, 30-34, 35-39 and 40+ years); mother’s country of birth (UK, Europe (excluding the UK), Middle East & Asia, Africa, Antarctica & Oceania, The Americas, The Caribbean); and quintiles of the Index of Multiple Deprivation (IMD; 1: most deprived, 5: least deprived) [41], a small-area level indicator of deprivation in England derived primarily from Census, tax and benefit records at the lower super output area level (~1500 people) from postcodes recorded at birth registration.

Joining the birth cohort to postcode histories

We merged the PICNIC birth cohort to PDS postcode histories for mothers and babies using the unique study ID.

Validating and cleaning postcode histories

We used the following four steps to clean and validate the maternal and child postcode histories:

Step 1: We excluded PDS postcode records for both mothers and children with missing move-in dates where the postcode was also missing, as these records did not provide any useful information.

Step 2: We excluded any PDS postcode records with move-in dates that fell after the first birthday (for children) or after the delivery date (for mothers). For mothers, we also excluded all PDS postcode records which occurred before the estimated conception date, except for the last postcode record before the start of pregnancy (i.e., indicating the postcode where the mother lived at conception; Figure 1).

Step 3: We checked whether PDS postcodes for mothers and children were correctly recorded by linking the PDS postcodes to the ONS Postcode Directory; any PDS records with invalid or mis-recorded postcodes were flagged and excluded in this step. We also flagged and excluded postcodes that were recorded as outside England, no fixed abode or coded as unknown, or valid postcodes with missing move-in dates. We termed postcodes that were correctly recorded and not missing as ‘good quality postcodes’.

Step 4: We selected one postcode record per week of pregnancy and infancy. In cases where multiple records were available in the same week, we selected the last recorded good quality postcode record (i.e., valid, non-missing postcode in London, if available) for each week. Some children’s first move-in date on PDS was not equal to their birth date, which would be expected. We set the move-in date to the birth date if a child’s first recorded PDS postcode matched the birth registration postcode, or their mother’s postcode at delivery, and where the move-in date was recorded up to 12 weeks after birth, (i.e., we assume that birth postcode was recorded late on the child’s PDS record late). Where the first postcode on the child’s record did not match the birth registration or maternal postcode from birth, the child was assumed to have moved to that postcode after birth from their birth postcode.

Figure 1: Maternal and child residential postcode records that were retained or excluded during pregnancy and infancy for the PICNIC London birth cohort, 2010–2014. Light pink area indicates the last postcode recorded before pregnancy; dark pink area includes postcodes recorded during pregnancy; green area indicates postcodes recorded during infancy.

Representativeness and quality

We compare the socio-demographic characteristics (see above) of children who were linked to at least one postcode record with those that were not. Furthermore, we examined the agreement between each of the cleaned weekly Greater London postcode records for the week of delivery (from the mother’s postcode record) and the week of birth (from the child’s postcode record) and the postcode recorded on ONS birth registration.

Comparing mothers and children with postcode changes vs those without

Finally, we derived the proportion of mothers and children who did not have a recorded change in postcodes and those who had at least one recorded postcode change during pregnancy and infancy, respectively. We examined the distributions of sociodemographic characteristics among children whose mothers moved during pregnancy and those who moved during infancy. We also examined these distributions among children with maternal and child postcode records both in and outside of Greater London. All analyses were carried out in Stata 17 (College Station, TX: StataCorp LLC) [42].

Ethical considerations

This project was approved by the NHS London Queen Square Ethics Committee (reference: 18/LO/1514) and the Confidentiality Advisory Group (18/CAG/0159), the National Statistician’s Data Ethics Committee (18(07)), and the ONS Research Accreditation Panel (2019/020). NHS England data access was approved by the Independent Group Advising on Release of Data (DARS-NIC-234656).

Results

Joining the birth cohort to postcode histories

We retained 565,734 children from the original 6,676,911 children in the birth cohort for England (8.5%), the vast majority of whom were excluded because they had a birth registration postcode outside of the London area or their date of birth was outside the study period (Figure 2).

Figure 2: Flow chart detailing the steps taken to derive the PICNIC London birth cohort, 2010–2014.

Of the 565,734 children that were included in the London birth cohort of the PICNIC Study, 499,100 were matched to at least one maternal postcode record (88.2%), whereas all children in the cohort matched to at least one child postcode record (Figures 3 and 4). Children whose birth record did not link to at least maternal postcode history were more likely to live in deprived areas, be born earlier in the study period and have younger mothers (Supplementary Table 1).

Figure 3: Flow chart detailing the steps taken to clean the maternal residential postcode history dataset for the PICNIC London birth cohort, 2010–2014. aN records indicates the number of postcode records (a child may have multiple maternal postcode records). bChildren excluded in the maternal residential postcode history dataset cleaning were included in the final dataset using their birth registration postcode.

Figure 4: Flow chart detailing the steps taken to clean the children residential postcode history dataset for the PICNIC London birth cohort, 2010–2014.

Validating and cleaning postcode histories

We present the outcome of each step in terms of the number and proportion of postcode records excluded.

Step 1: 12,637 children (2.2%) had missing maternal postcodes and/or move-in dates in all postcode records. We further excluded 9,288 infancy postcode records with missing move-in dates.

Step 2: We excluded 1,546,083 maternal postcode records as these were outside the pregnancy period, and 742,765 child postcode records as they had move-in dates after infancy. A small number of children only had maternal or child postcode histories before pregnancy or after infancy (n=332 and n = 436), respectively.

Step 3: Following linkage to the ONS postcode directory, we identified only a small number of postcode records (2,953 postcode records for mothers and 3,476 records for children) where the postcode was not correctly recorded (invalid postcodes; Table 1). We also identified 34,251 maternal postcode records and 45,554 child postcode records that were recorded as unknown, with <0.1% of records having no fixed address recorded. The majority of postcode records excluded at this step were excluded due to missing move-in dates: 21.8% of maternal postcode records (177,935 of 817,174) and 6.3% of children’s postcodes (55,178 of 882,514) were excluded due to missing move-in dates. We made the assumption that the mother or child remained at the same postcode until the next move-in date where this date was not missing.

Reason a for exclusion Mothers n (%) Children n (%)
Invalid postcode 2,953 (0.6) 3,476 (0.6)
No fixed abode 272 (<0.1) 683 (0.1)
Unknown postcode (e.g., coded as X99 9XX, ZZ99 3CZ or ZZ99 3WZ) 34,251 (7.0) 45,554 (7.8)
Any data quality issue 37,212 (7.7) 48,903 (8.6)
Table 1: Distribution of excluded postcode records (following linkage to ONS postcode directory) for mothers and children in the PICNIC London birth cohort, 2010–2014. aGroups are not mutually exclusive.

Overall, 549,698 children had mothers who only had postcodes in Greater London recorded during pregnancy (97.2%) and 541,900 children (95.8%) only had postcodes in Greater London recorded during infancy (Table 2).

Pregnancy Infancy
Only Greater London postcodes n (%) At least one non-Greater London postcode n (%) Overall n (%)
Only Greater London postcodes 529,117 (93.5) 20,581 (3.6) 549,698 (97.2)
At least one non-Greater London postcode 12,783 (2.3) 3,253 (0.6) 16.036 (2.8)
Overall 541,900 (95.8) 23,834 (4.2) 565,734 (100)
Table 2: Distribution of maternal and child postcode records by postcode location in the PICNIC London birth cohort, 2010–2014.

Step 4: We excluded 13,232 maternal and 59,069 child postcode records (2.1% and 7.1% of records retained at Step 4 respectively) for which move-in dates were in the same week as retained postcode records (i.e., where children had multiple postcode records recorded within one week, we kept one). The final dataset of cleaned, weekly postcodes in Greater London during pregnancy and infancy included 626,007 weekly maternal postcode records during pregnancy for 486,054 children in the cohort (85.9% of children therefore had at least one cleaned maternal postcode). It also included 768,267 child postcode records for the 565,734 children in the cohort (all children had at least one cleaned postcode record).

Data quality

The cleaned postcode records for the week of delivery matched the ONS birth certificate postcode for 400,764 children (70.8%). The postcode recorded at birth matched the ONS birth certificate postcode for 537,677 children (95.0%). One third of children’s mothers had at least 2 postcodes recorded during pregnancy and one half of children had two or more postcodes recorded during infancy (Table 3).

Number of postcode records Pregnancy n (%) Infancy n (%)
1 379,994 (67.2) 291,456 (51.5)
2 159,851 (28.3) 197,764 (35.0)
3 23,583 (4.2) 59,253 (10.5)
4+ 2,306 (0.4) 17,261 (3.1)
Table 3: Number of postcode records by residential history period (pregnancy or infancy) in the PICNIC London birth cohort, 2010–2014.

Comparing mothers and children with postcode changes vs those without

Mothers who had more than one postcode recorded during pregnancy were more likely to be born outside the UK and be aged less than 25 years at delivery. Similarly, children who had more than one postcode recorded during infancy were more likely to have mothers born outside the UK and be aged less than 25 years at delivery (Supplementary Tables S2 and S3).

Recommendations and lessons learned

We describe a procedure for cleaning maternal and child postcode history data from PDS for children born in London for a study of air pollution exposure and child health using linked vital statistics and administrative health data. This is the first time national, longitudinal postcode data has been used to derive residential postcode histories for an environmental epidemiology study based on administrative health data in England. Following our cleaning procedure, the PDS postcode record at week of delivery agreed with the birth registration postcode for 70% of children; whereas the PDS record at birth agreed with the birth registration postcode for 95% of children.

We showed that one half of children born in London had more than one postcode recorded during infancy, and one third of their mothers had more than one postcode recorded during pregnancy. The high proportion of children whose mothers had two or more postcodes recorded during pregnancy or who had multiple postcodes recorded during infancy shows that using the address recorded at birth registration alone to allocate environmental exposures during pregnancy and infancy may lead to inaccurate attribution of environmental exposures, which may bias the exposure-outcome relationship. Further, since younger mothers and mothers born abroad were more likely to change postcodes, this misattribution of exposures is likely to be systematic and therefore affect the magnitude and direction of the bias.

London has one of the highest rates of internal migration in all of England and Wales during the study period [43]; however, similarly high rates of internal migration during pregnancy and infancy have been reported in urban areas of the United States [44, 45]. This highlights the importance of linking to longitudinal address records to account for residential mobility in environmental exposure assignment over time in cohort studies. Our work has highlighted the complexity of cleaning national address records from PDS and deriving a longitudinal postcode history dataset suitable for linkage to environmental exposure data for research purposes.

We discovered data quality issues with missing move-in dates, missing postcodes, incorrectly entered postcodes, and multiple postcodes recorded in a week. Our cleaning procedure was developed to correct or exclude such records, yet we estimate developing the procedure took an experienced analyst 4-6 months full time. We were not able to assess whether the mother and child actually resided in the postcode recorded in the PDS during each week of pregnancy and infancy, nor whether multiple postcode records indicated residential moving, or correction of incorrectly recorded addresses. Note that, in cases where a new postcode record was in fact a correction to a previously incorrectly recorded postcode, this would result in more a more accurate assessment of exposure to air pollution. Validation of postcode histories from a national registry such as PDS would require using a recruited birth cohort study where self-reported addresses can be compared against national address records such as the PDS.

We only had postcode histories available for this study, which may not reflect exposures at the household level or accurately capture variations due to daily mobility and time spent in other locations. Further, as the United Kingdom Government now mandates recording of Unique Property Reference Numbers (UPRNs), unique identifiers for each property address in administrative datasets, including NHS address records, it is increasingly becoming possible to link environmental data at dwelling level [24, 30, 46], rather than postcode level. This provides opportunities for linking indoor and outdoor environments at fine spatial scales to administrative datasets for studies in environmental epidemiology. However, our work shows that cleaning national address records will require substantial staff resource.

Conclusions

The use of national address records such as the PDS to derive postcode or address histories offers a way of assigning longitudinal environmental exposures in cohort studies based on administrative datasets. However, substantial staff time is required to clean and validate these address records.

Author contributions

The paper was conceived by LW and PH. GA carried out data processing and statistical analysis. LW produced the final code, statistical analyses and outputs. PH drafted the manuscript with AR and LW. All authors commented on the manuscript draft.

Funding

This project was funded by the UK Research and Innovation Medical Research Council (MR/T016558/1). AR was funded by Health Data Research UK (HDRUK2023.0029), an initiative funded by UK Research and Innovation, Department of Health and Social Care (England) and the devolved administrations, and leading medical research charities. Research at the UCL Great Ormond Street Institute of Child Health benefits from funding from the National Institute for Health Research Great Ormond Street Hospital Biomedical Research Centre.

Acknowledgements

This work used data provided by the public and patients and collected by the NHS as part of their care and support. NHS data was provided within the terms of a data sharing agreement (DARS-NIC-234656-C3J1D-v3.7) to the researchers by NHS England.

Ethics Statement

This project was approved by the NHS London Queen Square Ethics Committee (reference: 18/LO/1514) and the Confidentiality Advisory Group (18/CAG/0159), the National Statistician’s Data Ethics Committee (18(07)), and the ONS Research Accreditation Panel (2019/020). NHS England data access was approved by the Independent Group Advising on Release of Data (DARS-NIC-234656).

This work was undertaken in the Office for National Statistics Secure Research Service using data from ONS and other owners and does not imply the endorsement of the ONS or other data owners.

Conflict of Interests Statement

The authors declare that they have no conflicts of interest with regard to the content of this report.

Data Availability Statement

The data and code used in this study were accessed through the Office for National Statistics (ONS) Secure Research Service (SRS). They are not publicly available and cannot be shared by the authors due to legal and ethical restrictions. Accredited researchers may apply for access to some of the data used through the ONS.

AI Statement

The authors declare that no generative AI tools were used in the preparation of this manuscript.

References

  1. Schinasi, L.H., A.H. Auchincloss, C.B. Forrest, and A.V. Diez Roux, Using electronic health record data for environmental and place based population health research: a systematic review. Annals of Epidemiology, 2018. 28(7): p. 493-502. 10.1016/j.annepidem.2018.03.008

    10.1016/j.annepidem.2018.03.008
  2. Boland, M.R., L.M. Davidson, S.P. Canelón, et al., Harnessing electronic health records to study emerging environmental disasters: a proof of concept with perfluoroalkyl substances (PFAS). npj Digital Medicine, 2021. 4(1): p. 122. 10.1038/s41746-021-00494-5

    10.1038/s41746-021-00494-5
  3. Pedersen, M., L. Giorgis-Allemand, C. Bernard, et al., Ambient air pollution and low birthweight: a European cohort study (ESCAPE). The Lancet Respiratory Medicine, 2013. 1(9): p. 695-704. 10.1016/S2213-2600(13)70192-970192-9

    10.1016/S2213-2600(13)70192-970192-9
  4. Fuller, R., P.J. Landrigan, K. Balakrishnan, et al., Pollution and health: a progress update. The Lancet Planetary Health, 2022. 6(6): p. e535-e547. 10.1016/S2542-5196(22)00090-000090-0

    10.1016/S2542-5196(22)00090-000090-0
  5. Romanello, M., C. Di Napoli, P. Drummond, et al., The 2022 report of the Lancet Countdown on health and climate change: health at the mercy of fossil fuels. The Lancet, 2022. 400(10363): p. 1619-1654. 10.1016/S0140-6736(22)01540-901540-9

    10.1016/S0140-6736(22)01540-901540-9
  6. Korten, I., K. Ramsey, and P. Latzin, Air pollution during pregnancy and lung development in the child. Paediatr Respir Rev, 2017. 21: p. 38-46. https://doi.org/10.1016/j.prrv.2016.08.008

  7. Chersich, M.F., M.D. Pham, A. Areal, et al., Associations between high temperatures in pregnancy and risk of preterm birth, low birth weight, and stillbirths: systematic review and meta-analysis. BMJ, 2020. 371: p. m3811. 10.1136/bmj.m3811

    10.1136/bmj.m3811
  8. Goldizen, F.C., P.D. Sly, and L.D. Knibbs, Respiratory effects of air pollution on children. Pediatr Pulmonol, 2016. 51(1): p. 94-108. 10.1002/ppul.23262

    10.1002/ppul.23262
  9. Newbury, J.B., J. Heron, J.B. Kirkbride, et al., Air and Noise Pollution Exposure in Early Life and Mental Health From Adolescence to Young Adulthood. JAMA Network Open, 2024. 7(5): p. e2412169-e2412169. 10.1001/jamanetworkopen.2024.12169

    10.1001/jamanetworkopen.2024.12169
  10. Engemann, K., C.B. Pedersen, L. Arge, et al., Residential green space in childhood is associated with lower risk of psychiatric disorders from adolescence into adulthood. Proceedings of the National Academy of Sciences, 2019. 116(11): p. 5188-5193. 10.1073/pnas.1807504116

    10.1073/pnas.1807504116
  11. Bind, M.A., Causal Modeling in Environmental Health. Annu Rev Public Health, 2019. 40: p. 23-43. 10.1146/annurev-publhealth-040218-044048

    10.1146/annurev-publhealth-040218-044048
  12. Health Effects Institute. Revised Analyses of Time-Series Studies of Air Pollution and Health. 2003; Available from: https://www.healtheffects.org/system/files/TimeSeries.pdf. Accessed: 25/10/2024

  13. Janes, H., L. Sheppard, and T. Lumley, Case-crossover analyses of air pollution exposure data: referent selection strategies and their implications for bias. Epidemiology, 2005. 16(6): p. 717-26. 10.1097/01.ede.0000181315.18836.9d

    10.1097/01.ede.0000181315.18836.9d
  14. Ponjoan, A., J. Blanch, L. Alves-Cabratosa, et al., Effects of extreme temperatures on cardiovascular emergency hospitalizations in a Mediterranean region: a self-controlled case series study. Environ Health, 2017. 16(1): p. 32. 10.1186/s12940-017-0238-0

    10.1186/s12940-017-0238-0
  15. Smith, R.B., D. Fecht, J. Gulliver, et al., Impact of London’s road traffic air and noise pollution on birth weight: retrospective population based cohort study. BMJ, 2017. 359: p. j5299. 10.1136/bmj.j5299

    10.1136/bmj.j5299
  16. Baranyi, G., L. Williamson, Z. Feng, et al., Early life PM(2.5) exposure, childhood cognitive ability and mortality between age 11 and 86: A record-linkage life-course study from Scotland. Environ Res, 2023. 238(Pt 1): p. 117021. 10.1016/j.envres.2023.117021

    10.1016/j.envres.2023.117021
  17. Hansell, A., R.E. Ghosh, M. Blangiardo, et al., Historic air pollution exposure and long-term mortality risks in England and Wales: prospective longitudinal cohort study. Thorax, 2016. 71(4): p. 330-338. 10.1136/thoraxjnl-2015-207111

    10.1136/thoraxjnl-2015-207111
  18. Maclure, M., The case-crossover design: a method for studying transient effects on the risk of acute events. Am J Epidemiol, 1991. 133(2): p. 144-53. https://doi.org/10.1093/oxfordjournals.aje.a115853

  19. Carracedo-Martínez, E., M. Taracido, A. Tobias, et al., Case-Crossover Analysis of Air Pollution Health Effects: A Systematic Review of Methodology and Application. Environmental Health Perspectives, 2010. 118(8): p. 1173-1182. 10.1289/ehp.0901485

    10.1289/ehp.0901485
  20. Saucy, A., U. Gehring, S. Olmos, et al., Effect of residential relocation on environmental exposures in European cohorts: An exposome-wide approach. Environment International, 2023. 173: p. 107849. 10.1016/j.envint.2023.107849

    10.1016/j.envint.2023.107849
  21. Brokamp, C., G.K. LeMasters, and P.H. Ryan, Residential mobility impacts exposure assessment and community socioeconomic characteristics in longitudinal epidemiology studies. J Expo Sci Environ Epidemiol, 2016. 26(4): p. 428-34. 10.1038/jes.2016.10

    10.1038/jes.2016.10
  22. Gambaro, L., H. Joshi, and R. Lupton, Moving to a better place? Residential mobility among families with young children in the Millennium Cohort Study. Population, Space and Place, 2017. 23(8): p. e2072. 10.1002/psp.2072

    10.1002/psp.2072
  23. Xu, W., M. Agnew, C. Kamis, et al., Constructing Residential Histories in a General Population-Based Representative Sample. American Journal of Epidemiology, 2023. 193(2): p. 348-359. 10.1093/aje/kwad188

    10.1093/aje/kwad188
  24. Rodgers, S.E., R.A. Lyons, R. Dsilva, et al., Residential Anonymous Linking Fields (RALFs): a novel information infrastructure to study the interaction between the environment and individuals’ health. Journal of Public Health, 2009. 31(4): p. 582-588. 10.1093/pubmed/fdp041

    10.1093/pubmed/fdp041
  25. Geary, R.S., D. Thompson, A. Mizen, et al., Ambient greenness, access to local green spaces, and subsequent mental health: a 10-year longitudinal dynamic panel study of 2. 3 million adults in Wales. The Lancet Planetary Health, 2023. 7(10): p. e809-e818. 10.1016/S2542-5196(23)00212-7

    10.1016/S2542-5196(23)00212-7
  26. Rodgers, S.E., R. Bailey, R. Johnson, et al., Emergency hospital admissions associated with a non-randomised housing intervention meeting national housing quality standards: a longitudinal data linkage study. J Epidemiol Community Health, 2018. 72(10): p. 896-903. 10.1136/jech-2017-210370

    10.1136/jech-2017-210370
  27. SLSD. SU Geography & ecology. 2025; Available from: https://sls.lscs.ac.uk/guides-resources/what-data-are-included/geographical-ecological-data/. Accessed: 28/03/2025

  28. Favarato, G., T. Clemens, S. Cunningham, et al., Air Pollution, housing and respiratory tract Infections in Children: NatIonal birth Cohort study (PICNIC): study protocol. BMJ Open, 2021. 11(5): p. e048038. 10.1136/bmjopen-2020-048038

    10.1136/bmjopen-2020-048038
  29. Fecht, D., K. Garwood, O. Butters, et al., Automation of cleaning and reconstructing residential address histories to assign environmental exposures in longitudinal studies. International Journal of Epidemiology, 2020. 49(Supplement_1): p. i49-i56. 10.1093/ije/dyz180

    10.1093/ije/dyz180
  30. Krenz, K., A. Dhanani, R.R.C. McEachan, et al., Linking the Urban Environment and Health: An Innovative Methodology for Measuring Individual-Level Environmental Exposures. Int J Environ Res Public Health, 2023. 20(3).

  31. UK Health Security Agency. NHS entitlements: migrant health guide. 2023; Available from: https://www.gov.uk/guidance/nhs-entitlements-migrant-health-guide. Accessed: 14/01/2025

  32. NHS England. Personal Demographics Service. 2024; Available from: https://digital.nhs.uk/services/personal-demographics-service. Accessed: 25/10/2024

  33. Office for National Statistics. Postal Geographies. 2023; Available from: https://www.ons.gov.uk/methodology/geography/ukgeographies/postalgeography. Accessed: 14/01/2025

  34. Herbert, A., L. Wijlaars, A. Zylbersztejn, et al., Data Resource Profile: Hospital Episode Statistics Admitted Patient Care (HES APC). International Journal of Epidemiology, 2017. 46(4): p. 1093-1093i. 10.1093/ije/dyx015

    10.1093/ije/dyx015
  35. Harper, G., Linkage of Maternity Hospital Episode Statistics data to birth registration and notification records for births in England 2005-2014: Quality assurance of linkage of routine data for singleton and multiple births. BMJ Open, 2018. 8(3): p. e017898. 10.1136/bmjopen-2017-017898

    10.1136/bmjopen-2017-017898
  36. Dattani, N. and A. Macfarlane, Linkage of Maternity Hospital Episode Statistics data to birth registration and notification records for births in England 2005-2014: methods. A population-based birth cohort study. BMJ Open, 2018. 8(2): p. e017897. 10.1136/bmjopen-2017-017897

    10.1136/bmjopen-2017-017897
  37. Macfarlane, A., N. Dattani, R. Gibson, et al., Births and their outcomes by time, day and year: a retrospective birth cohort data linkage study. Health Services and Delivery Research, 2019. 7(18). 10.3310/hsdr07180

    10.3310/hsdr07180
  38. Office for National Statistics. A Guide to ONS Geography Postcode Products. 2016; Available from: https://geoportal.statistics.gov.uk/documents/8093d03408f04240a2f11a9d8913a45e/explore. Accessed: 25/10/2024

  39. Office for National Statistics. ONS Postcode Directory. 2019; Available from: https://geoportal.statistics.gov.uk/search?tags=ons%2520postcode%2520directory. Accessed: 14/04/2025

  40. Stock, S.J., J. Carruthers, C. Calvert, et al., SARS-CoV-2 infection and COVID-19 vaccination rates in pregnant women in Scotland. Nature Medicine, 2022. 28(3): p. 504-512. 10.1038/s41591-021-01666-2

    10.1038/s41591-021-01666-2
  41. Ministry of Housing Communities and Local Government. English indices of deprivation 2019. 2019; Available from: https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019. Accessed: 25/10/2024

  42. Stata Corp, Stata Statistical Software Release 17. 2022: College Station, TX.

  43. Office for National Statistics. Internal migration, England and Wales: Year Ending June 2015. 2016; Available from: https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/migrationwithintheuk/bulletins/internalmigrationbylocalauthoritiesinenglandandwales/yearendingjune2015. Accessed: 25/10/2024

  44. Heo, S., Y. Afanasyeva, L. Trasande, et al., Residential mobility in pregnancy and potential exposure misclassification of air pollution, temperature, and greenness. Environ Epidemiol, 2023. 7(6): p. e273. 10.1097/EE9.0000000000000273

    10.1097/EE9.0000000000000273
  45. Canfield, M.A., T.A. Ramadhani, P.H. Langlois, and D.K. Waller, Residential mobility patterns and exposure misclassification in epidemiologic studies of birth defects. J Expo Sci Environ Epidemiol, 2006. 16(6): p. 538-43. 10.1038/sj.jes.7500501

    10.1038/sj.jes.7500501
  46. Clark, D.D., C;. A guide to CHI-UPRN Residential Linkage (CURL) File. 2020;Available from: https://www.scadr.ac.uk/sites/default/files/CURLreport2311%20-%20A%20guide%20to%20CHI-UPRN%20Residential%20Linkage.pdf. Accessed: 28/03/2025

Article Details

How to Cite
Hardelid, P., Atilola, G., Rammah, A., De Stavola, B., Clemens, T., Cunningham , S., Dibben, C., Hajna, S., Macfarlane, A., Milojevic, A., Taylor, J. and Wijlaars, L. (2026) “Cleaning and validating longitudinal maternal and child postcode histories from a national healthcare registry for environmental health research in London, UK”, International Journal of Population Data Science, 11(1). doi: 10.23889/ijpds.v11i1.2990.