Cognitive development Respiratory Tract Illness and Effects of eXposure (CORTEX) project: Combining high spatial resolution pollution measurements with individual level data, a methodological approach.

Main Article Content

Sarah Rodgers
Jane Lyons
Amy Mizen
Damon Berridge
Ashley Akbari
David Carruthers
Gwyneth Davies
Lorraine Dearden
Ruth Doherty
Iain Lake
Anna Mavrogianni
Ai Milojevic
Sarah Strickland
Paul Wilkinson

Abstract

Introduction
The Secure Anonymised Information Linkage (SAIL) databank facilitated linkage of routinely collected health and education data, high spatial resolution pollution modelling and daily pollen measurements for 18,241 pupils in 7 cross-sectional cohorts across Cardiff city, UK, to investigate effects of air quality and respiratory health conditions on education attainment.


Objectives and Approach
An urban atmospheric dispersion and chemistry modelling system (ADMS-Urban) simulated modelled hourly concentrations of air pollutants: PM2.5, PM10, NO2 and ozone levels. These were summarised into minimum, average and maximum daily readings for 4 time periods (e.g. school hours 9am-3pm) for all home and school locations across Cardiff between 2009 and 2015. The combination of different pollutants, measurements and time-periods created a comprehensive multi-row dataset per location. We transformed the dimensionality of this high-resolution data to create one row of summarised data per pupil per cohort, in preparation for statistical analysis.


Results
157,361 school and home locations across Cardiff were anonymised and household linkage fields were appended to combine pollution estimates at the household/school to individual health data. The pollution dataset contained 369 columns, 472,083 rows per year with one column per location, pollutant type, pollutant measurement, daily time-period, and day of year. Dataset transformation reduced algorithm computation by creating a single date column, producing a five column, 3,446,205,900-row matrix per year dataset. The algorithm adjusted for weekends, school/bank holidays and allowed location to vary 3pm-5pm on school days when pupil location was uncertain. The algorithm calculated tailored pollution exposures per pupil for revision and examination periods, creating one row per pupil and reducing 7 years of data and 24 billion rows to 18,241.


Conclusion/Implications
We successfully linked 95% of the cohorts’ household/school pollution data to their corresponding health and education data. This demonstrates data linking retrospective exposures for total populations using multiple daily locations, and extends our analysis platform for natural experiments to include daily exposure. Future work includes adding modelled route exposures.

Introduction

The Secure Anonymised Information Linkage (SAIL) databank facilitated linkage of routinely collected health and education data, high spatial resolution pollution modelling and daily pollen measurements for 18,241 pupils in 7 cross-sectional cohorts across Cardiff city, UK, to investigate effects of air quality and respiratory health conditions on education attainment.

Objectives and Approach

An urban atmospheric dispersion and chemistry modelling system (ADMS-Urban) simulated modelled hourly concentrations of air pollutants: PM2.5, PM10, NO2 and ozone levels. These were summarised into minimum, average and maximum daily readings for 4 time periods (e.g. school hours 9am-3pm) for all home and school locations across Cardiff between 2009 and 2015. The combination of different pollutants, measurements and time-periods created a comprehensive multi-row dataset per location. We transformed the dimensionality of this high-resolution data to create one row of summarised data per pupil per cohort, in preparation for statistical analysis.

Results

157,361 school and home locations across Cardiff were anonymised and household linkage fields were appended to combine pollution estimates at the household/school to individual health data. The pollution dataset contained 369 columns, 472,083 rows per year with one column per location, pollutant type, pollutant measurement, daily time-period, and day of year. Dataset transformation reduced algorithm computation by creating a single date column, producing a five column, 3,446,205,900-row matrix per year dataset. The algorithm adjusted for weekends, school/bank holidays and allowed location to vary 3pm-5pm on school days when pupil location was uncertain. The algorithm calculated tailored pollution exposures per pupil for revision and examination periods, creating one row per pupil and reducing 7 years of data and 24 billion rows to 18,241.

Conclusion/Implications

We successfully linked 95% of the cohorts’ household/school pollution data to their corresponding health and education data. This demonstrates data linking retrospective exposures for total populations using multiple daily locations, and extends our analysis platform for natural experiments to include daily exposure. Future work includes adding modelled route exposures.

Article Details

How to Cite
Rodgers, S., Lyons, J., Mizen, A., Berridge, D., Akbari, A., Carruthers, D., Davies, G., Dearden, L., Doherty, R., Lake, I., Mavrogianni, A., Milojevic, A., Strickland, S. and Wilkinson, P. (2018) “ a methodological approach”., International Journal of Population Data Science, 3(4). doi: 10.23889/ijpds.v3i4.802.

Most read articles by the same author(s)

1 2 3 4 5 6 7 8 9 10 > >>