Predicting Childhood Overweight in Linked Administrative Data: Are There Lessons for Targeting Early Years Interventions?

Main Article Content

Louis Chislett
Paul Henery
Linsay Gray
Alastair Leyland
Ruth Dundas
Rachael Wood
Anna Pearce


Many early years interventions, e.g. UK’s Family Nurse Partnership, aim to provide additional support to higher need families to prevent poor health and development. Most prediction studies using cohort surveys have had limited success in identifying which families to target for such interventions.

Objectives and Approach
To examine whether the breadth and volume of information available in linked administrative data can predict childhood overweight (defined using International Obesity TaskForce cut-offs) at age 5 years. Data for all children born in Scotland 2011-2012 (n~120,000) were sourced from: birth registration records, maternity hospital records, health visitor and school health checks, immunisation records, prescription data. Predictors spanned socio-economic (e.g. neighbourhood deprivation, occupational status), demographic (e.g. mother’s age, number of children), birth (e.g. birthweight, APGAR score), service planning (allocation to core/additional services), and health (e.g. smoking in pregnancy, overweight at age 3y). Sensitivity, specificity and positive predictive values (PPV) for each predictor was considered. The collective predictive power of >20 variables (selected a-priori) (complete case sample, n~22000) was examined using conventional area under the curve (AUC) and three machine learning methods: decision tree classification, random forest, gradient boosted trees.

Overweight at 3y was the strongest predictor of overweight at 5y (sensitivity:0.53; specificity:0.89; PPV:0.51). AUC and the machine learning approaches prioritised different sets of predictors, but all included overweight at 3y. All produced ‘Moderate-Good’ predictive power, falling to ‘Poor-Moderate’ when overweight at 3y was excluded. Multiple imputation will be carried out to address item missingness.

Overweight at 3y was the most informative predictor, yet 49% who were overweight at 3y were not by 5y. Models excluding overweight at 3y had poor predictive power, indicating the challenge of targeting interventions to prevent overweight. Thus, universal interventions addressing the upstream determinants of childhood overweight are likely to have greatest success in supporting child health.

Article Details

How to Cite
Chislett, L., Henery, P., Gray, L., Leyland, A., Dundas, R., Wood, R. and Pearce, A. (2020) “Predicting Childhood Overweight in Linked Administrative Data: Are There Lessons for Targeting Early Years Interventions?”, International Journal of Population Data Science, 5(5). Available at: (Accessed: 18 April 2024).