A team at University College London have launched the ECHILD Phenotype Code List Repository. ECHILD (Education and Child Health Insights from Linked Data) brings together a range of national health, education and social care datasets for all of England, including the core Hospital Episode Statistics and National Pupil Database. It includes deidentified data on over 20 million individuals born since September 1984 and enables research into child and family health, education and social care at a scale not previously possible.

A key component of working with administrative (and especially health) data is the process of phenotyping: using codes in the data to define groups of children with particular conditions or diseases. This is usually done using lists of diagnostic and procedure codes and looking for those codes in the health data. This is a complex process made more difficult by the fact that existing code lists tend to be scattered among the research literature in a format that is not easily implementable in new studies.

The ECHILD Phenotype Code List Repository addresses this by making available code lists relevant to ECHILD’s datasets, formatted in a consistent way that is easily implementable. Each code list comes with example R and Stata scripts. A new paper published in the International Journal of Population Data Science (IJPDS), introduces the Repository, its rationale and the potential benefits for the research community.

The ECHILD Phenotype Code List Repository sits alongside a range of resources the produced by the ECHILD team, including How To Guides that take users through all the core data management and processing tasks necessary to produce a research-ready dataset in ECHILD, a user guide, and a discussion forum. Users can access these resources, plus more, at the ECHILD website. As ECHILD links pre-existing datasets together, many of these resources, including the Phenotype Code List Repository and ‘How To’ Guides, are applicable to those datasets in standalone projects not using ECHILD data.

Lead author, Matthew Jay, added ‘We hope that these resources will serve as a model for future enhancements to administrative data projects in addition to helping both novice and veteran administrative data users. This should make data easier to use and result in better science for better policy.”

 

Click here to read the full article

Matthew Jay, Senior Research Fellow in Epidemiology, UCL Great Ormond Street Institute of Child Health, London, UK

Jay, M. A., Lewis, K., Shi, D., Langella, R., Stone, T., Ní Chobhthaigh, S., Zylbersztejn, A., Blackburn, R. and Harron, K. (2025) “Open science and phenotyping in UK administrative health, education and social care data: the ECHILD phenotype code list repository”, International Journal of Population Data Science, 10(2). doi: 10.23889/ijpds.v10i2.2943.