<?xml version="1.0"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "JATS-journalpublishing1.dtd"[]>
<article xml:lang="en" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" dtd-version="1.2" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">IJPDS</journal-id>
<journal-title-group>
<journal-title>International Journal of Population Data Science</journal-title>
<abbrev-journal-title>IJPDS</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">2399-4908</issn>
<publisher>
<publisher-name>Swansea University</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.23889/ijpds.v6i1.1671</article-id>
<article-id pub-id-type="publisher-id">6:1:1671</article-id>
<article-id pub-id-type="pii">S2399490821016712</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Population Data Science</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Linking education and hospital data in England: linkage process and quality</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author"><name><surname>Libuy</surname><given-names initials="N">Nicol&#x00E1;s</given-names></name><xref ref-type="aff" rid="affil-1">1</xref><xref ref-type="corresp" rid="correspondingAurthor">*</xref></contrib>
<contrib contrib-type="author"><name><surname>Harron</surname><given-names initials="K">Katie</given-names></name><xref ref-type="aff" rid="affil-1">1</xref><xref ref-type="aff" rid="affil-2">2</xref></contrib>
<contrib contrib-type="author"><name><surname>Gilbert</surname><given-names initials="R">Ruth</given-names></name><xref ref-type="aff" rid="affil-1">1</xref><xref ref-type="aff" rid="affil-2">2</xref></contrib>
<contrib contrib-type="author"><name><surname>Caulton</surname><given-names initials="R">Richard</given-names></name><xref ref-type="aff" rid="affil-3">3</xref></contrib>
<contrib contrib-type="author"><name><surname>Cameron</surname><given-names initials="E">Ellen</given-names></name><xref ref-type="aff" rid="affil-3">3</xref></contrib>
<contrib contrib-type="author"><name><surname>Blackburn</surname><given-names initials="R">Ruth</given-names></name><xref ref-type="aff" rid="affil-1">1</xref></contrib>
<aff id="affil-1"><label>1</label><institution>Institute of Health Informatics, University College London, London, NW1 2DA, UK</institution></aff>
<aff id="affil-2"><label>2</label><institution>UCL Great Ormond Street Institute of Child Health, University College London, London, WC1N 1EH, UK</institution></aff>
<aff id="affil-3"><label>3</label><institution>NHS Digital, Leeds, LS1 6AE, UK</institution></aff>
</contrib-group>
<author-notes>
<corresp id="correspondingAurthor"><label>*</label>Corresponding author: Nicol&#x00E1;s Libuy <email>nicolas.libuy.16@ucl.ac.uk</email>
</corresp>
<fn fn-type="conflict">
<label>Conflict of interest statement</label>
<p>None declared.</p>
</fn>
</author-notes>
<pub-date date-type="pub" publication-format="electronic"><day>16</day><month>09</month><year>2021</year></pub-date>
<pub-date date-type="collection" publication-format="electronic"><year></year></pub-date>
<volume>6</volume>
<issue>1</issue>
<elocation-id>1671</elocation-id>
<permissions>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by-nc-nd/4.0/">
<license-p>This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.</license-p>
</license>
</permissions>
<self-uri xlink:href="https://ijpds.org/article/view/1671">This article is available from the IJPDS website at: https://ijpds.org/article/view/1671</self-uri>
<abstract>
<title>Abstract</title>
<sec>
<title>Introduction</title>
<p>Linkage of administrative data for universal state education and National Health Service (NHS) hospital care would enable research into the inter-relationships between education and health for all children in England.</p>
</sec>
<sec>
<title>Objectives</title>
<p>We aim to describe the linkage process and evaluate the quality of linkage of four one-year birth cohorts within the National Pupil Database (NPD) and Hospital Episode Statistics (HES).</p>
</sec>
<sec>
<title>Methods</title>
<p>We used multi-step deterministic linkage algorithms to link longitudinal records from state schools to the chronology of records in the NHS Personal Demographics Service (PDS; linkage stage 1), and HES (linkage stage 2). We calculated linkage rates and compared pupil characteristics in linked and unlinked samples for each stage of linkage and each cohort (1990/91, 1996/97, 1999/00, and 2004/05).</p>
</sec>
<sec>
<title>Results</title>
<p>Of the 2,287,671 pupil records, 2,174,601 (95%) linked to HES. Linkage rates improved over time (92% in 1990/91 to 99% in 2004/05). Ethnic minority pupils and those living in more deprived areas were less likely to be matched to hospital records, but differences in pupil characteristics between linked and unlinked samples were moderate to small.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>We linked nearly all pupils to at least one hospital record. The high coverage of the linkage represents a unique opportunity for wide-scale analyses across the domains of health and education. However, missed links disproportionately affected ethnic minorities or those living in the poorest neighbourhoods: selection bias could be mitigated by increasing the quality and completeness of identifiers recorded in administrative data or the application of statistical methods that account for missed links.</p>
</sec>
<sec>
<title>Highlights</title>
<list list-type="bullet">
<list-item><p>Longitudinal administrative records for all children attending state school and acute hospital services in England have been used for research for more than two decades, but lack of a shared unique identifier has limited scope for linkage between these databases.</p></list-item>
<list-item><p>We applied multi-step deterministic linkage algorithms to 4 one-year cohorts of children born 1 September-31 August in 1990/91, 1996/97, 1999/00 and 2004/05. In stage 1, full names, date of birth, and postcode histories from education data in the National Pupil Database were linked to the NHS Personal Demographic Service. In stage 2, NHS number, postcode, date of birth and sex were linked to hospital records in Hospital Episode Statistics.</p></list-item>
<list-item><p>Between 92% and 99% of school pupils linked to at least one hospital record. Ethnic minority pupils and pupils who were living in the most deprived areas were least likely to link. Ethnic minority pupils were less likely than white children to link at the first step in both algorithms.</p></list-item>
<list-item><p>Bias due to linkage errors could lead to an underestimate of the health needs in disadvantaged groups. Improved data quality, more sensitive linkage algorithms, and/or statistical methods that account for missed links in analyses, should be considered to reduce linkage bias.</p></list-item>
</list>
</sec>
</abstract>
<kwd-group>
<kwd>record linkage</kwd>
<kwd>linkage error</kwd>
<kwd>bias</kwd>
<kwd>hospital records</kwd>
<kwd>educational records</kwd>
<kwd>data linkage</kwd>
<kwd>administrative data</kwd>
</kwd-group>
<funding-group>
<funding-statement>This work was supported by ESRC via the Administrative Data Research UK through the Strategic Hub [grant number ES/V000977/1]; the Administrative Data Research Centre for England; the NIHR Great Ormond Street Hospital Biomedical Research Centre and the Health Data Research UK [grant number LOND1], which is funded by the UK Medical Research Council and eight other funders; Wellcome Trust [grant number 212953/Z/18/Z to KH]; and UKRI Innovation Fellowship funded by the Medical Research Council [grant number MR/S003797/1 to RB].</funding-statement>
</funding-group>
</article-meta>
</front>
<body>
<sec>
<title>Introduction</title>
<p>Administrative data have been routinely collected for more than two decades in England from schools and hospitals by the Department for Education (DfE) and National Health Service (NHS) Digital respectively [<xref ref-type="bibr" rid="ref-1">1</xref>, <xref ref-type="bibr" rid="ref-2">2</xref>]. These data collections have been used to monitor service provision and costs, and longitudinal linkage has made them powerful resources for national research [<xref ref-type="bibr" rid="ref-3">3</xref>&#x2013;<xref ref-type="bibr" rid="ref-7">7</xref>]. Despite evidence from other countries of the value of linking education and health data to inform policy and practice [<xref ref-type="bibr" rid="ref-8">8</xref>&#x2013;<xref ref-type="bibr" rid="ref-14">14</xref>], these databases have not previously been linked for children in England because they do not share a unique identifier. Linkage between these datasets can only be done using confidential, personal identifiers such as full names, postcodes, date of birth and sex, thereby creating technical and governance challenges.</p>
<p>Linkage error could significantly undermine the real-world benefits for policy if certain groups, such as those with a foreign name structure, are less likely to link than others [<xref ref-type="bibr" rid="ref-15">15</xref>]. For example, missed links could lead to undercounting of adverse health or education outcomes for these groups, and in turn, under-provision of services. Evidence on linkage error can help data providers to improve the quality of identifiers or to develop more effective linkage algorithms. Evidence on differences in the characteristics between groups who link or not can be used by researchers to account for linkage bias in analyses [<xref ref-type="bibr" rid="ref-16">16</xref>].</p>
<p>We describe the methods used to link education data from the National Pupil Database (NPD) to hospital data for children in England (Hospital Episode Statistics; HES) [<xref ref-type="bibr" rid="ref-1">1</xref>, <xref ref-type="bibr" rid="ref-2">2</xref>]. Our goal was to create de-identified, linked cohorts of pupils&#x2019; longitudinal records of education and hospital events over the childhood years. We also evaluated associations between child characteristics and linkage error in order to understand the implications of these errors for analysis. Our evaluation is based on 2.2 million children in England born in four one-year cohorts in 1990/91, 1996/97, 1999/00 and 2004/05. These cohorts reflect age and time periods when identifier quality, and hence linkage quality, is likely to differ due to data collection and system changes. This paper is relevant to users of The Education and Child Health Insights from Linked Data (ECHILD) database, which will be available from Spring 2022 and combines education, social care and hospital data for all children in England born from 1995 [<xref ref-type="bibr" rid="ref-1">1</xref>, <xref ref-type="bibr" rid="ref-2">2</xref>, <xref ref-type="bibr" rid="ref-17">17</xref>]. The findings are also relevant more generally to data linkages that lack a unique, high-quality identifier.</p>
</sec>
<sec>
<title>Methods</title>
<sec>
<title>Study design and population</title>
<p>Governance permissions and data flows for the linkage followed the separation principle [<xref ref-type="bibr" rid="ref-16">16</xref>], whereby identifiers such as names and postcodes were kept separate at all times from attribute data (records from school or hospital records). <xref ref-type="fig" rid="fig-1">Figure 1</xref> shows the flow of identifiers and a pseudo-identifier (the anonymised Pupil Matching Reference, aPMR) from the Department for Education to NHS Digital. Separately, education attribute data flowed from the Department for Education to the Office of National Statistics Secure Research Service (ONS SRS). A two-stage linkage process was used to link NPD to HES. Stage 1 linked NPD to the Personal Demographic Service (PDS), which contains all individuals with an NHS number, and stage 2 linked NPD-PDS linked data to HES. At the first stage of linkage (step C in <xref ref-type="fig" rid="fig-1">Figure 1</xref>), NHS Digital linkers had access only to the identifiers (date of birth, sex, and histories of forenames, surnames and postcodes) but no attribute data. At the second stage of linkage (step D), NHS Digital used the NHS number, date of birth, sex and postcode to link to HES data. The linkage step, pseudonymised HESID and anonymised PMR were transferred (step E) and merged with a University College London (UCL) held extract of HES within the UCL Data Safe Haven (DSH) (step F). Linked HES-PMR records were ultimately transferred to the ONS SRS (step G).</p>
<fig id="fig-1"><label>Figure 1: Data flow and linkage process for linkage between the national pupil database, the personal demographic service and hospital episode statistics</label>
<graphic xlink:href="ijpds-06-1671-g001.tif"/>
<attrib>Notes: **NHS Digital sent two Linkage bridging filesto UCL DSH. Details are described in <xref ref-type="supplementary-material" rid="sup-a">Supplementary Appendix 1</xref>. Dark shading indicates de-identified and light shading identified data. NPD = National Pupil Dataset; PDS = Personal Demographics Service; HES = Hospital Episode Statistics; NHS = National Health Service; ONS SRS = Office for National Statistics Secure Research Service; UCL = University College London.</attrib>
</fig>
<p>The study population consisted of four cohorts of children born between 1 September and 31 August in the academic years of 1990/91, 1996/97, 1999/00, and 2004/05 (<xref ref-type="fig" rid="fig-2">Figure 2</xref>). These cohorts were defined separately in NPD and HES, so that linkage created three comparison groups for each of the four cohorts: linked NPD-HES, unlinked NPD, and unlinked HES records. We compared pupil characteristics in the linked and unlinked NPD cohorts at each stage of each linkage process. We used NPD as the inception cohort, as state school is a universal service attended at some point in the school years by at least 95% of all children [<xref ref-type="bibr" rid="ref-2">2</xref>, <xref ref-type="bibr" rid="ref-18">18</xref>]. On the other hand, not all children attend hospital, unless they were young enough for their birth to be recorded in HES (1997 onwards).</p>
<fig id="fig-2"><label>Figure 2: Lexis diagram to show year of age of each cohort (y axis) and start year of each dataset (x axis)</label>
<graphic xlink:href="ijpds-06-1671-g002.tif"/>
<attrib>Notes: See details in <xref ref-type="supplementary-material" rid="sup-a">Supplementary Figure 1</xref> and <xref ref-type="supplementary-material" rid="sup-a">Supplementary Table 1</xref> in the <xref ref-type="supplementary-material" rid="sup-a">Supplementary Appendices 2</xref> and <xref ref-type="supplementary-material" rid="sup-a">3</xref>.</attrib>
</fig>
<p><xref ref-type="fig" rid="fig-2">Figure 2</xref> shows that whether a pupil is expected to link to a HES record or not is affected by the start date of the PDS, the NPD and the subsets of HES data. Pupils born in 1990/91 were expected to have the lowest proportion of records in NPD that linked to HES (i.e. linkage rate). These children only appeared in NPD at the first school census collection in 2001/02 at age 10. Their names and postcodes captured each year in NPD from 2001/02 until leaving state school in 2009/10 or earlier, would be linked to names and postcode details recorded prospectively from General Practitioner (GP) registrations and hospital contacts on the PDS from 2004 onwards. These children could link to HES admission records from 1997 onwards (age 6 years), outpatients from age 12, or accident and emergency department from age 16.</p>
<p>Whilst it was expected that most children would have contact with hospital at some point during childhood or adolescence, we did not anticipate complete overlap between the two datasets. We expected children born in 2004/05 to have the best linkage rates of the four cohorts (and for linkage quality to remain constant or improve for subsequent cohorts). Firstly, 97% of children born in England would be expected to have their birth recorded in HES and in PDS [<xref ref-type="bibr" rid="ref-19">19</xref>]. Secondly, their linkage to subsequent health records should be more accurate than earlier cohorts due to immediate allocation by midwives of NHS numbers to babies at birth, a process introduced at the end of 2002 [<xref ref-type="bibr" rid="ref-20">20</xref>].</p>
</sec>
<sec>
<title>Data sources</title>
<p>The data sources are described in detail in the <xref ref-type="supplementary-material" rid="sup-a">Supplementary Appendices 2</xref> and <xref ref-type="supplementary-material" rid="sup-a">3</xref>.</p>
<sec>
<title>National pupil database (NPD)</title>
<p>NPD contains pupil-level information on all children and adolescents attending state-funded schools in England, capturing information on attainment tests, absences, exclusions and alternative provision (details in <xref ref-type="supplementary-material" rid="sup-a">Supplementary Figure 1</xref> of <xref ref-type="supplementary-material" rid="sup-a">Supplementary Appendix 2</xref>) [<xref ref-type="bibr" rid="ref-2">2</xref>]. The school census collects information each term on pupils enrolled and updates of the pupil&#x2019;s name, address, and postcode. We used identifiers recorded in the Spring census (submitted in February) for linkage as this is the definitive entry for the year (i.e. for school year 2001/2). Pupil records are linked across years and between NPD modules using a pseudo-identifier called the anonymised Pupil Matching Reference (aPMR).</p>
</sec>
<sec>
<title>Hospital episode statistics (HES)</title>
<p>HES is an episode level administrative database that covers all admissions (day case and overnight) to the National Health Service (NHS) hospitals in England [<xref ref-type="bibr" rid="ref-1">1</xref>], as well as all attendances at the accident and emergency attendances (from 2007/8) and outpatient appointments (from 2003/4). From January 1998 onwards, HES has been routinely linked to ONS death registration records [<xref ref-type="bibr" rid="ref-21">21</xref>]. <xref ref-type="supplementary-material" rid="sup-a">Supplementary Table 1</xref> in <xref ref-type="supplementary-material" rid="sup-a">Supplementary Appendix 3</xref> describes data availability in HES. For researchers using de-identified attribute data from HES, episodes of care relating to a patient can be linked over time or between datasets using a study-specific pseudonymised patient identifier generated by NHD Digital &#x2013; HESID [<xref ref-type="bibr" rid="ref-22">22</xref>].</p>
</sec>
<sec>
<title>Personal demographics service (PDS)</title>
<p>PDS is a national electronic database that contains the chronology of demographic information, including sex, name and address, for all individuals in England with an NHS number. Introduced in June 2004, as part of The National Programme for IT, the PDS was developed to integrate management of patient demographic information across NHS services in England. PDS replaced the NHS Central Register (CHRIS); the demographic functions of the National Health Applications and Infrastructure Services (NHAIS); the NHS Strategic Tracing Service (NSTS); and the NHS Number for Babies (NN4B) [<xref ref-type="bibr" rid="ref-23">23</xref>]. Current identifiers from these databases were transferred into PDS in 2004. The patient demographic details on the PDS data can be updated by NHS care providers when a person uses an NHS service, including GP surgeries, inpatient or outpatient appointments [<xref ref-type="bibr" rid="ref-24">24</xref>, <xref ref-type="bibr" rid="ref-25">25</xref>]. The accuracy and quality of PDS data is assured by staff at the PDS National Back Office (NBO) in NHS Digital [<xref ref-type="bibr" rid="ref-26">26</xref>].</p>
</sec>
</sec>
<sec>
<title>Linkage</title>
<sec>
<title>Linkage process</title>
<p><xref ref-type="fig" rid="fig-1">Figure 1</xref> shows two stages of linkage. Stage 1 involved transfer of a linkage file containing full name and postcode histories and other identifiers (<xref ref-type="table" rid="table-1">Table 1</xref>) from the Department for Education to NHS Digital for linkage to the PDS. Extracts from NPD and PDS listed multiple identifiers for each individual together with the date interval when the identifier was recorded (details in <xref ref-type="supplementary-material" rid="sup-a">Supplementary Appendix 4</xref>). To link the NPD linkage file and PDS, we relied on a deterministic linkage algorithm comprising 8 steps, shown in <xref ref-type="table" rid="table-2">Table 2</xref>. These steps were designed to identify records that have high levels of agreement across names, date of birth, sex and postcode, and to resolve inconsistencies between records belonging to the same pupil.</p>
<table-wrap id="table-1">
<label>Table 1: Availability of personal identifiers in the national pupil database, personal demographic service and hospital episode statistics</label>
<table>
<thead>
<tr>
<th rowspan="3" valign="middle" align="left" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Linkage identifiers</bold></th>
<th colspan="3" valign="middle" align="center" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Data sources</bold></th>
</tr>
<tr>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>DfE</bold></th>
<th colspan="2" valign="middle" align="center" style="border-bottom: solid 1pt"><bold>NHSD</bold></th>
</tr>
<tr>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>NPD</bold></th>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>PDS</bold></th>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>HES</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="middle" align="left">First name(s)</td>
<td valign="middle" align="center">&#x2713;</td>
<td valign="middle" align="center">&#x2713;</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left">Surname(s)</td>
<td valign="middle" align="center">&#x2713;</td>
<td valign="middle" align="center">&#x2713;</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left">Date of birth (e.g. 23/02/1988)</td>
<td valign="middle" align="center">&#x2713;</td>
<td valign="middle" align="center">&#x2713;</td>
<td valign="middle" align="center">&#x2713;</td>
</tr>
<tr>
<td valign="middle" align="left">Sex</td>
<td valign="middle" align="center">&#x2713;</td>
<td valign="middle" align="center">&#x2713;</td>
<td valign="middle" align="center">&#x2713;</td>
</tr>
<tr>
<td valign="middle" align="left">NHS number</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">&#x2713;</td>
<td valign="middle" align="center">&#x2713;</td>
</tr>
<tr>
<td valign="middle" align="left">Residence postcodes*</td>
<td valign="middle" align="center">&#x2713;</td>
<td valign="middle" align="center">&#x2713;</td>
<td valign="middle" align="center">&#x2713;</td>
</tr>
<tr>
<td valign="middle" align="left">Residence postcodes dates**</td>
<td valign="middle" align="center">&#x2713;</td>
<td valign="middle" align="center">&#x2713;</td>
<td valign="middle" align="center">&#x2713;</td>
</tr>
<tr>
<td valign="middle" align="left">Anonymised Pupil Matching Reference (aPMR)</td>
<td valign="middle" align="center">&#x2713;</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left">UCL HESID</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">&#x2713;</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Notes: * Full postcodes (e.g. LS0 0AA) were available in NPD and PDS. For records in NPD a list of postcodes was available over the academic years. For a specific patient&#x2019;s NHS number in PDS, a list of postcodes was available over time. ** Dates referring to changes is patient&#x2019;s postcodes over time were available in PDS. Similarly, dates referring to postcodes in academic years were available in NPD. UCL HESID: is a unique and pseudonymised patient-level identifier that can be used to link patient-level information over time and across different modules of the UCL HES extracts. aPMR: anonymised Pupil Matching Reference is a nationally unique and anonymised child-level identifier that can be used to link pupil-level information over time and across different modules of NPD.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap id="table-2">
<label>Table 2: Linkage stage 1:8 step deterministic algorithm for linking the national pupil database to the personal demographic service</label>
<table>
<thead>
<tr>
<th valign="middle" align="left" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Step</bold></th>
<th valign="middle" align="center" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>First name</bold></th>
<th valign="middle" align="center" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Surname</bold></th>
<th valign="middle" align="center" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Date of birth</bold></th>
<th valign="middle" align="center" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Sex</bold></th>
<th valign="middle" align="center" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Postcode*</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="middle" align="left">1**</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
</tr>
<tr>
<td valign="middle" align="left">2</td>
<td valign="middle" align="center">Soundex</td>
<td valign="middle" align="center">Soundex</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
</tr>
<tr>
<td valign="middle" align="left">3</td>
<td valign="middle" align="center">1st character</td>
<td valign="middle" align="center">Characters 1&#x2013;3</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
</tr>
<tr>
<td valign="middle" align="left">4</td>
<td valign="middle" align="center">1st character</td>
<td valign="middle" align="center">Characters 1&#x2013;3</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">Exact</td>
</tr>
<tr>
<td valign="middle" align="left">5</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
</tr>
<tr>
<td valign="middle" align="left">6</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">Partial</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
</tr>
<tr>
<td valign="middle" align="left">7</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left">8</td>
<td valign="middle" align="center">1st character</td>
<td valign="middle" align="center">Characters 1&#x2013;3</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="left"></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Notes: * Full postcode (e.g. LS0 0AA). ** Step 1 was repeated by NHS Digital but allowing an NPD record to link to many PDS records. The objective of repeating this modified step 1 was to remove potential duplicate HESIDs for the same pupil. See details in <xref ref-type="supplementary-material" rid="sup-a">Supplementary Appendix 4</xref>. Exact refers to exact linking; Partial refers exact linking but using month and year of birth only; Soundex refers to the Structured Query Language (SQL) algorithm that converts an alphanumeric string to a four-character code that is based on how the string sounds when spoken. NPD = National Pupil Database; PDS = Personal Demographic Service.</p>
</table-wrap-foot>
</table-wrap>
<p>Besides considering the 8 steps in <xref ref-type="table" rid="table-2">Table 2</xref>, a further restriction was that a linked pair of records needed to have identifiers within the same academic year in PDS and in NPD (details in <xref ref-type="supplementary-material" rid="sup-a">supplementary Appendix 4</xref>). All eight steps of the algorithm were run for each school year (September to August) ordered from 2004/05 to 2016/17 for all pupils. In order to allow for multiple links with the highest level of agreement between NPD and PDS, step 1 was repeated (details in <xref ref-type="supplementary-material" rid="sup-a">Supplementary Appendix 4</xref>). For all other steps, a pupil was removed from the linking pool (i.e. all records for that pupil were excluded from subsequent linking steps) once a linkage was identified.</p>
<p>Stage 2 involved linking the PDS table of identifiers for children linked to NPD with HES, using the NHS Digital internal 7 step algorithm (<xref ref-type="table" rid="table-3">Table 3</xref>). The bridging files resulting from this linkage did not contain any identifiable data (such as name or date of birth) and contained all possible linkage pairs (linked and unlinked) resulting from linkage stages 1 and 2. Files contained the pseudonymised HESIDs for each of the four cohorts that included: all individuals in HES with a birth date in the relevant cohort and for those that linked to NPD, the anonymised PMR, two record-level indicators identifying the resulting linkage step of the linkage stages 1 and 2, and a variable indicating the specific cohort.</p>
<table-wrap id="table-3">
<label>Table 3: Linkage stage 2: 7 step deterministic algorithm for linking the personal demographic service to hospital episode statistics</label>
<table>
<thead>
<tr>
<th valign="middle" align="left" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Step</bold></th>
<th valign="middle" align="center" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>NHS number</bold></th>
<th valign="middle" align="center" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Date of birth</bold></th>
<th valign="middle" align="center" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Sex</bold></th>
<th valign="middle" align="center" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Postcode*</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="middle" align="left">1</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
</tr>
<tr>
<td valign="middle" align="left">2</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left">3</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Partial</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
</tr>
<tr>
<td valign="middle" align="left">4</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Partial</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left">5</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">Exact</td>
</tr>
<tr>
<td valign="middle" align="left">6</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
<td valign="middle" align="center">Exact</td>
</tr>
<tr>
<td valign="middle" align="left"></td>
<td colspan="4" valign="middle" align="center" style="border-top: solid 1pt; border-bottom: solid 1pt">Where NHS number does not contradict the match and date of birth is not 1 January</td>
</tr>
<tr>
<td valign="middle" align="left">7</td>
<td valign="middle" align="left" style="border-bottom: solid 1pt"></td>
<td valign="middle" align="center" style="border-bottom: solid 1pt">Exact</td>
<td valign="middle" align="center" style="border-bottom: solid 1pt">Exact</td>
<td valign="middle" align="center" style="border-bottom: solid 1pt">Exact</td>
</tr>
<tr>
<td valign="middle" align="left"></td>
<td colspan="4" valign="middle" align="center" style="border-bottom: solid 1pt">Where date of birth is not 1 January</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Notes: * Full postcode (e.g. LS0 0AA). Exact refers to exact linking; Partial refers exact linking but using month and year of birth only.</p>
</table-wrap-foot>
</table-wrap>
<p><xref ref-type="fig" rid="fig-1">Figure 1</xref> shows the transfer of pseudonymised HES attribute data (admitted patient care, accident and emergency, outpatient), together with the linkage bridging file of all possible linkage pairs, to the ONS SRS. Similarly, the Department for Education transferred NPD attribute data extracts containing the anonymised PMR to the ONS SRS.</p>
<p>The final phase of the process was to merge NPD and HES attribute data, using the bridging file obtained from stage 2 of the linkage. This was done by an Accredited Researcher (NL) in the ONS SRS. There were minor differences in HESIDs transferred by NHS Digital to UCL and those held by UCL as the NHS Digital HES data is continually updated, whereas UCL holds a static subset of the NHS Digital HES data (e.g. that is limited by age).</p>
</sec>
<sec>
<title>Evaluation of linkage quality</title>
<p>Among pupils who linked to a HES record, we calculated the distribution linked at each step for linkage stages 1 and 2, according to region, ethnic group, decile of deprivation, measured by income deprivation affecting children index (IDACI), and cohort year. We calculated the overall linkage rate as the percentage of pupils in the NPD who linked to any HES record for each of the four cohorts [<xref ref-type="bibr" rid="ref-27">27</xref>].</p>
<p>To evaluate potential bias resulting from missed matches, we compared characteristics of pupils in NPD who were linked to HES records with pupils in NPD who were not linked to HES [<xref ref-type="bibr" rid="ref-15">15</xref>, <xref ref-type="bibr" rid="ref-28">28</xref>]. Unlinked pupils could include pupils who never attended hospital or missed matches of pupils who did attend hospital. We used standardized differences (mean difference in standard deviation units) as these are thought to be more informative to detect potential biases than P-values in large samples [<xref ref-type="bibr" rid="ref-28">28</xref>, <xref ref-type="bibr" rid="ref-29">29</xref>]. Standardized differences were calculated using the &#x2018;stddiff&#x2019; command in Stata for the following variables: sex; ethnic group; region of pupil&#x2019;s residence; IDACI Deciles; age at start of the first academic year; whether a child receives Special Education Need (SEN) provision (recorded in NPD as receiving Action, Action Plus or Support (AAP/S) and having a statement of SEN or an Education Health &#x0026; Care Plan (S/EHCP) [<xref ref-type="bibr" rid="ref-30">30</xref>]); and persistent authorized annual absence rate for all academic years available defined as whether a child was absent in 10% or more of academic sessions (see <xref ref-type="supplementary-material" rid="sup-a">Supplementary Appendix 5</xref> for recording of variables) [<xref ref-type="bibr" rid="ref-31">31</xref>].</p>
<p>Multivariable logistic analysis was used to evaluate linkage from NPD to HES using the following demographic characteristics: sex, ethnicity, region of residence and IDACI Deciles.</p>
</sec>
</sec>
</sec>
<sec>
<title>Results</title>
<p>The bridging file produced by NHS Digital included 2,289,587 records with all possible linkage results. From this file, 41 duplicates were excluded since the same aPMR-HESID pairs linked in two different academic years. The second bridging file that included only the modified linkage step 1 of linkage stage 1 (i.e., where multiple links were allowed for each NPD record) contained 2,093,787 records, of which only 8,858 records were new linkage results. By combining both files, we linked an additional 4,059 (0.18%) aPMR-HESID pairs.</p>
<p>The final bridging file contains 2,294,369 records, corresponding to 2,287,671 pupils that were used in the linkage quality analysis (<xref ref-type="fig" rid="fig-3">Figure 3</xref>). Of the 2,287,671 pupil records in the four cohorts, 2,174,601 (95%) linked to a HES record. As expected, linkage rates increased as we moved from pupils born in academic year 1990/91(92%) to those born 2004/05 (99%). Results for each linkage stage show that 30,323 (1.3%) of pupils&#x2019; records were not linked in stage 1, 61,223 (2.7%) records were not linked in stage 2, and a further 21,524 (0.9%) were not merged with the UCL extract. An improvement of linkage was observed over time. For example, in the cohort born in 1990/91 3.3% of records were not linked in stage 1, whereas only 1.1% of records were not linked in the cohort born in 2004/05.</p>
<fig id="fig-3"><label>Figure 3: Results of linkage at stage 1 (NPD and PDS) and stage 2 (PDS and HES) and final linkage rates</label>
<graphic xlink:href="ijpds-06-1671-g003.tif"/>
<attrib>Notes: NPD = national pupil dataset; PDS = personal demographics service; HES = hospital episode statistics; NHS = national health service; NHSD = NHS digital; ONS SRS = office for national statistics secure research service; UCL = university college London; aPMR = anonymised pupil matching reference.</attrib>
</fig>
<sec>
<title>Distribution of pupil characteristics in linked records</title>
<p>At stage 1, between 91% and 95% of pupils linked at the first step of the 8-step algorithm, i.e. exact linkage by first name, surname, date of birth, sex and postcode (<xref ref-type="table" rid="table-2">Table 2</xref>; <xref ref-type="supplementary-material" rid="sup-a">Supplementary Appendix 6</xref>). However, evaluation by ethnic group showed that the additional steps in this algorithm, i.e. from 2-8, captured a greater percentage of ethnic minority groups (11.8% of minority ethnic groups versus 4.2% of white ethnic group).</p>
<p>A considerable percentage of records were linked in years after the first available Spring census (<xref ref-type="fig" rid="fig-4">Figure 4</xref>). For example, 12% and 21% of records of pupils born in academic years 1990/91 and 1996/97 respectively, were matched after 2004/05 &#x2013; their first available Spring census when it was possible to link to PDS. Similarly, in academic years 1999/00 and 2004/05, 16% and 9% of pupils were matched after their academic Year 1- their second available Spring census. For pupils born in academic year 1999/00 or after, the majority of records were linked in the first two academic years. In particular, 50% of records in cohort 1999/00 and 51% in 2004/05 were linked in Year 1, while 34% and 40% were linked in reception year (<xref ref-type="supplementary-material" rid="sup-a">Supplementary Appendix 6</xref>).</p>
<fig id="fig-4"><label>Figure 4: Cumulative percentage of records linked in stage 1 (NPD to PDS; y axis) by academic year in spring census (x axis)</label>
<graphic xlink:href="ijpds-06-1671-g004.tif"/>
<attrib>Notes: NPD = national pupil dataset; PDS = personal demographics service; HES = hospital episode statistics; NHS = national health service. The registration online system (RON) is a web-based system registering life events (births and deaths) that was first piloted in November 2006 and fully implemented in July 2009. Since the implementation of RON, validation checks of addresses and postcodes have become possible at the point of registration [<xref ref-type="bibr" rid="ref-32">32</xref>]. Prior to the 2013/14 financial year, birth admissions were missing due to an extraction error by NHS Digital, resulting in postcodes missing in recorded birth episodes [<xref ref-type="bibr" rid="ref-33">33</xref>].</attrib>
</fig>
<p>Linkage at stage 2, from PDS to HES using the NHS Digital internal 7-step algorithm (<xref ref-type="table" rid="table-3">Table 3</xref>) showed a similar pattern to linkage at stage 1. Of the 2,202,823 pairs in NPD linked at stage 2, 81% (n=1,791,480) were linked at step 1 and 18% at step 2 (n=386,579) (<xref ref-type="supplementary-material" rid="sup-a">Supplementary Table 7.1</xref> in <xref ref-type="supplementary-material" rid="sup-a">Supplementary Appendix 6</xref>). Pupils from ethnic minorities were disproportionately linked at steps 2-8. For example, around 20% of pupils categorized in Black and Chinese ethnic groups were linked at step 2, compared to 17% of white pupils that linked at this step. Of steps 3-8 of the algorithm, step 6 was particularly important for the linkage of ethnic minority groups, linking between 0.7%-1.7% of ethnic minority records (see <xref ref-type="supplementary-material" rid="sup-a">Supplementary Appendix 6</xref> for more details).</p>
</sec>
<sec>
<title>Linkage rates by demographic characteristics of pupils</title>
<p>Pupils who linked to HES after both linkage stages and who were merged with HES attribute data comprise the matched dataset used for all subsequent analyses. Linkage rate by region, ethnic group, sex and IDACI deciles are shown in the <xref ref-type="supplementary-material" rid="sup-a">Supplementary Appendix 7</xref>. We found that linkage rates improved over time for all these variables. However, ethnic minorities and pupils living in more deprived areas were less likely to match to HES. The linkage rate for white pupils improved from 94.6% in the 1990/91 cohort to 98.9% in the 2004/05 cohort. In contrast, for ethnic minority pupils in the same cohorts the linkage rate rose from 92.4% to 97.7%, respectively. We found a similar pattern by IDACI deciles. Linkage rates by region provide evidence that London has consistently lower linkage rates than the rest of the country.</p>
</sec>
<sec>
<title>Comparing characteristics of linked and unlinked pupils</title>
<p>Differences in the distribution of sociodemographic and educational characteristics of pupils recorded in NPD who linked or not to HES are shown in <xref ref-type="table" rid="table-4">Table 4</xref> (and <xref ref-type="supplementary-material" rid="sup-a">Supplementary Table 9.1</xref>&#x2013;<xref ref-type="supplementary-material" rid="sup-a">9.4</xref> in <xref ref-type="supplementary-material" rid="sup-a">Supplementary Appendix 8</xref>). Overall, relatively low standardized differences are observed across all variables providing evidence of small or moderate differences between linked and unlinked groups. We considered standardized differences of 0.2, 0.5 and 0.8 as small, moderate and large, respectively [<xref ref-type="bibr" rid="ref-28">28</xref>, <xref ref-type="bibr" rid="ref-34">34</xref>]. The largest differences were for the AAP/S and persistent authorized absence rate in cohort 1996/97 with values of 0.44 and 0.42. The mean standardized difference across cohort for region and ethnic groups was 0.25 and 0.24 whereas for sex and IDACI deciles was 0.13 and 0.17 (<xref ref-type="table" rid="table-4">Table 4</xref>).</p>
<table-wrap id="table-4">
<label>Table 4: Sociodemographic characteristics of the pupil sample from the national pupil database linked and non-linked to hospital episode statistics (N = 2,294,369 pairs).</label>
<table>
<thead>
<tr>
<th rowspan="2" valign="middle" align="left" style="border-top: solid 1pt; border-bottom: solid 1pt"></th>
<th colspan="3" valign="middle" align="center" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Cohort 1990/91</bold></th>
<th colspan="3" valign="middle" align="center" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Cohort 1996/97</bold></th>
</tr>
<tr>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>Non-linked (n = 47,934) (%)</bold></th>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>Linked (n =565,798) (%)</bold></th>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>Stand. Diff.</bold></th>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>Non-linked (n = 35,299) (%)</bold></th>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>Linked (n = 536,619) (%)</bold></th>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>Stand. Diff.</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="7" valign="middle" align="left">Region</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">London</td>
<td valign="middle" align="center">7,729 (16.1)</td>
<td valign="middle" align="center">68,073 (12.0)</td>
<td valign="middle" align="center">0.191</td>
<td valign="middle" align="center">6,243 (17.7)</td>
<td valign="middle" align="center">71,652 (13.4)</td>
<td valign="middle" align="center">0.247</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">South East</td>
<td valign="middle" align="center">8,000 (16.7)</td>
<td valign="middle" align="center">81,806 (14.5)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">5,961 (16.9)</td>
<td valign="middle" align="center">75,452 (14.1)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">South West</td>
<td valign="middle" align="center">4,217 (8.8)</td>
<td valign="middle" align="center">52,018 (9.2)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">3,021 (8.6)</td>
<td valign="middle" align="center">50,302 (9.4)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">West Midlands</td>
<td valign="middle" align="center">4,915 (10.3)</td>
<td valign="middle" align="center">63,013 (11.1)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">3,392 (9.6)</td>
<td valign="middle" align="center">60,027 (11.2)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">North West</td>
<td valign="middle" align="center">6,200 (12.9)</td>
<td valign="middle" align="center">83,376 (14.7)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">3,630 (10.3)</td>
<td valign="middle" align="center">77,805 (14.5)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">North East</td>
<td valign="middle" align="center">1,567 (3.3)</td>
<td valign="middle" align="center">29,318 (5.2)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">1,025 (2.9)</td>
<td valign="middle" align="center">27,374 (5.1)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Yorkshire and The Humber</td>
<td valign="middle" align="center">3,885 (8.1)</td>
<td valign="middle" align="center">57,539 (10.2)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">2,908 (8.2)</td>
<td valign="middle" align="center">54,564 (10.2)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">East Midlands</td>
<td valign="middle" align="center">3,535 (7.4)</td>
<td valign="middle" align="center">47,096 (8.3)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">2,769 (7.8)</td>
<td valign="middle" align="center">42,187 (7.9)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">East of England</td>
<td valign="middle" align="center">5,541 (11.6)</td>
<td valign="middle" align="center">59,686 (10.5)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">4,525 (12.8)</td>
<td valign="middle" align="center">54,424 (10.1)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Wales</td>
<td valign="middle" align="center">28 (0.1)</td>
<td valign="middle" align="center">38 (0.0)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">*</td>
<td valign="middle" align="center">*</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Missing</td>
<td valign="middle" align="center">2,317 (4.8)</td>
<td valign="middle" align="center">23,835 (4.2)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">1,818 (5.2)</td>
<td valign="middle" align="center">22,794 (4.2)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td colspan="7" valign="middle" align="left">Ethnic group</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">White</td>
<td valign="middle" align="center">27,692 (57.8)</td>
<td valign="middle" align="center">488,330 (86.3)</td>
<td valign="middle" align="center">0.160</td>
<td valign="middle" align="center">24,452 (69.3)</td>
<td valign="middle" align="center">453,764 (84.6)</td>
<td valign="middle" align="center">0.159</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Asian</td>
<td valign="middle" align="center">2,541 (5.3)</td>
<td valign="middle" align="center">33,024 (5.8)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">2,584 (7.3)</td>
<td valign="middle" align="center">37,654 (7)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Black</td>
<td valign="middle" align="center">1,507 (3.1)</td>
<td valign="middle" align="center">17,047 (3.0)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">1,429 (4.0)</td>
<td valign="middle" align="center">19,228 (3.6)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Chinese</td>
<td valign="middle" align="center">278 (0.6)</td>
<td valign="middle" align="center">1,384 (0.2)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">213 (0.6)</td>
<td valign="middle" align="center">1,439 (0.3)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Other ethnic group</td>
<td valign="middle" align="center">498 (1.0)</td>
<td valign="middle" align="center">3,627 (0.6)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">626 (1.8)</td>
<td valign="middle" align="center">3,951 (0.7)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Mixed</td>
<td valign="middle" align="center">834 (1.7)</td>
<td valign="middle" align="center">13,808 (2.4)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">1,278 (3.6)</td>
<td valign="middle" align="center">19,286 (3.6)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Missing</td>
<td valign="middle" align="center">14,584 (30.4)</td>
<td valign="middle" align="center">8,578 (1.5)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">4,717 (13.4)</td>
<td valign="middle" align="center">1,297 (0.2)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td colspan="7" valign="middle" align="left">Sex</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Male</td>
<td valign="middle" align="center">27,334 (57.0)</td>
<td valign="middle" align="center">285,716 (50.5)</td>
<td valign="middle" align="center">0.131</td>
<td valign="middle" align="center">17,014 (48.2)</td>
<td valign="middle" align="center">275,479 (51.3)</td>
<td valign="middle" align="center">0.062</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Female</td>
<td valign="middle" align="center">20,543 (42.9)</td>
<td valign="middle" align="center">279,520 (49.4)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">18,268 (51.8)</td>
<td valign="middle" align="center">261,094 (48.7)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Missing</td>
<td valign="middle" align="center">57 (0.1)</td>
<td valign="middle" align="center">562 (0.1)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">17 (0.0)</td>
<td valign="middle" align="center">46 (0.0)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td colspan="7" valign="middle" align="left">IDACI Deciles</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">1 (deprived)</td>
<td valign="middle" align="center">7,306 (15.2)</td>
<td valign="middle" align="center">54,336 (9.6)</td>
<td valign="middle" align="center">0.242</td>
<td valign="middle" align="center">4,866 (13.8)</td>
<td valign="middle" align="center">50,540 (9.4)</td>
<td valign="middle" align="center">0.218</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">2</td>
<td valign="middle" align="center">6,001 (12.5)</td>
<td valign="middle" align="center">55,606 (9.8)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">4,247 (12.0)</td>
<td valign="middle" align="center">51,132 (9.5)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">3</td>
<td valign="middle" align="center">5,414 (11.3)</td>
<td valign="middle" align="center">56,149 (9.9)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">3,811 (10.8)</td>
<td valign="middle" align="center">51,662 (9.6)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">4</td>
<td valign="middle" align="center">4,941 (10.3)</td>
<td valign="middle" align="center">56,600 (10.0)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">3,738 (10.6)</td>
<td valign="middle" align="center">51,725 (9.6)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">5</td>
<td valign="middle" align="center">4,611 (9.6)</td>
<td valign="middle" align="center">56,620 (10.0)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">3,444 (9.8)</td>
<td valign="middle" align="center">52,336 (9.8)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">6</td>
<td valign="middle" align="center">4,255 (8.9)</td>
<td valign="middle" align="center">56,927 (10.1)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">3,310 (9.4)</td>
<td valign="middle" align="center">52,503 (9.8)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">7</td>
<td valign="middle" align="center">3,854 (8.0)</td>
<td valign="middle" align="center">56,891 (10.1)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">2,936 (8.3)</td>
<td valign="middle" align="center">53,336 (9.9)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">8</td>
<td valign="middle" align="center">3,685 (7.7)</td>
<td valign="middle" align="center">56,122 (9.9)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">2,914 (8.3)</td>
<td valign="middle" align="center">54,281 (10.1)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">9</td>
<td valign="middle" align="center">3,514 (7.3)</td>
<td valign="middle" align="center">54,875 (9.7)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">2,851 (8.1)</td>
<td valign="middle" align="center">55,791 (10.4)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">10 (affluent)</td>
<td valign="middle" align="center">3,630 (7.6)</td>
<td valign="middle" align="center">54,286 (9.6)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">2,701 (7.7)</td>
<td valign="middle" align="center">56,355 (10.5)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Missing</td>
<td valign="middle" align="center">723 (1.5)</td>
<td valign="middle" align="center">7,386 (1.3)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">481 (1.4)</td>
<td valign="middle" align="center">6,958 (1.3)</td>
<td valign="middle" align="left"></td>
</tr>
</tbody>
</table>
<table>
<thead>
<tr>
<th rowspan="2" valign="middle" align="left" style="border-top: solid 1pt; border-bottom: solid 1pt"></th>
<th colspan="3" valign="middle" align="center" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Cohort 1999/00</bold></th>
<th colspan="3" valign="middle" align="center" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Cohort 2004/05</bold></th>
</tr>
<tr>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>Non-linked (n = 22,185) (%)</bold></th>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>Linked (n</bold> = <bold>507,725) (%)</bold></th>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>Stand. Diff.</bold></th>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>Non linked (n</bold> = <bold>8,477) (%)</bold></th>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>Linked (n</bold> =<bold>570,332) (%)</bold></th>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>Stand. Diff.</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="7" valign="middle" align="left">Region</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">London</td>
<td valign="middle" align="center">4,303 (19.4)</td>
<td valign="middle" align="center">71,001 (14.0)</td>
<td valign="middle" align="center">0.31</td>
<td valign="middle" align="center">1,590 (18.8)</td>
<td valign="middle" align="center">83,817 (14.7)</td>
<td valign="middle" align="center">0.237</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">South East</td>
<td valign="middle" align="center">3,881 (17.5)</td>
<td valign="middle" align="center">74,189 (14.6)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">1,353 (16.0)</td>
<td valign="middle" align="center">83,748 (14.7)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">South West</td>
<td valign="middle" align="center">1,364 (6.1)</td>
<td valign="middle" align="center">45,672 (9.0)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">504 (5.9)</td>
<td valign="middle" align="center">49,993 (8.8)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">West Midlands</td>
<td valign="middle" align="center">2,274 (10.3)</td>
<td valign="middle" align="center">55,174 (10.9)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">759 (9.0)</td>
<td valign="middle" align="center">60,358 (10.6)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">North West</td>
<td valign="middle" align="center">2,036 (9.2)</td>
<td valign="middle" align="center">70,533 (13.9)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">986 (11.6)</td>
<td valign="middle" align="center">76,373 (13.4)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">North East</td>
<td valign="middle" align="center">585 (2.6)</td>
<td valign="middle" align="center">24,497 (4.8)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">197 (2.3)</td>
<td valign="middle" align="center">26,007 (4.6)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Yorkshire and The Humber</td>
<td valign="middle" align="center">1,502 (6.8)</td>
<td valign="middle" align="center">49,701 (9.8)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">671 (7.9)</td>
<td valign="middle" align="center">56,330 (9.9)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">East Midlands</td>
<td valign="middle" align="center">1,786 (8.1)</td>
<td valign="middle" align="center">40,944 (8.1)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">689 (8.1)</td>
<td valign="middle" align="center">45,255 (7.9)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">East of England</td>
<td valign="middle" align="center">3,119 (14.1)</td>
<td valign="middle" align="center">52,238 (10.3)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">1,040 (12.3)</td>
<td valign="middle" align="center">57,545 (10.1)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Wales</td>
<td valign="middle" align="center">*</td>
<td valign="middle" align="center">*</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">*</td>
<td valign="middle" align="center">*</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Missing</td>
<td valign="middle" align="center">1,327 (6.0)</td>
<td valign="middle" align="center">23,720 (4.7)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">685 (8.1)</td>
<td valign="middle" align="center">30,840 (5.4)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td colspan="7" valign="middle" align="left">Ethnic group</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">White</td>
<td valign="middle" align="center">15,692 (70.7)</td>
<td valign="middle" align="center">415,660 (81.9)</td>
<td valign="middle" align="center">0.281</td>
<td valign="middle" align="center">5,255 (62.0)</td>
<td valign="middle" align="center">439,397 (77.0)</td>
<td valign="middle" align="center">0.358</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Asian</td>
<td valign="middle" align="center">2,581 (11.6)</td>
<td valign="middle" align="center">43,061 (8.5)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">1,207 (14.2)</td>
<td valign="middle" align="center">57,790 (10.1)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Black</td>
<td valign="middle" align="center">1,735 (7.8)</td>
<td valign="middle" align="center">21,528 (4.2)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">696 (8.2)</td>
<td valign="middle" align="center">31,656 (5.6)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Chinese</td>
<td valign="middle" align="center">172 (0.8)</td>
<td valign="middle" align="center">1,530 (0.3)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">89 (1.0)</td>
<td valign="middle" align="center">2,038 (0.4)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Other ethnic group</td>
<td valign="middle" align="center">700 (3.2)</td>
<td valign="middle" align="center">5,146 (1.0)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">486 (5.7)</td>
<td valign="middle" align="center">8,375 (1.5)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Mixed</td>
<td valign="middle" align="center">1,178 (5.3)</td>
<td valign="middle" align="center">20,177 (4.0)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">575 (6.8)</td>
<td valign="middle" align="center">29,871 (5.2)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Missing</td>
<td valign="middle" align="center">127 (0.6)</td>
<td valign="middle" align="center">623 (0.1)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">169 (2.0)</td>
<td valign="middle" align="center">1,205 (0.2)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td colspan="7" valign="middle" align="left">Sex</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Male</td>
<td valign="middle" align="center">9,717 (43.8)</td>
<td valign="middle" align="center">261,398 (51.5)</td>
<td valign="middle" align="center">0.153</td>
<td valign="middle" align="center">3,660 (43.2)</td>
<td valign="middle" align="center">292,784 (51.3)</td>
<td valign="middle" align="center">0.166</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Female</td>
<td valign="middle" align="center">12,445 (56.1)</td>
<td valign="middle" align="center">246,116 (48.5)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">4,814 (56.8)</td>
<td valign="middle" align="center">277,508 (48.7)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Missing</td>
<td valign="middle" align="center">23 (0.1)</td>
<td valign="middle" align="center">211 (0.0)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">0 (0.0)</td>
<td valign="middle" align="center">43 (0.0)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td colspan="7" valign="middle" align="left">IDACI Deciles</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">1 (deprived)</td>
<td valign="middle" align="center">2,863 (12.9)</td>
<td valign="middle" align="center">49,733 (9.8)</td>
<td valign="middle" align="center">0.142</td>
<td valign="middle" align="center">909 (10.7)</td>
<td valign="middle" align="center">53,590 (9.4)</td>
<td valign="middle" align="center">0.07</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">2</td>
<td valign="middle" align="center">2,487 (11.2)</td>
<td valign="middle" align="center">49,457 (9.7)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">855 (10.1)</td>
<td valign="middle" align="center">53,748 (9.4)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">3</td>
<td valign="middle" align="center">2,257 (10.2)</td>
<td valign="middle" align="center">49,130 (9.7)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">849 (10.0)</td>
<td valign="middle" align="center">54,246 (9.5)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">4</td>
<td valign="middle" align="center">2,263 (10.2)</td>
<td valign="middle" align="center">49,153 (9.7)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">750 (8.8)</td>
<td valign="middle" align="center">54,250 (9.5)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">5</td>
<td valign="middle" align="center">2,139 (9.6)</td>
<td valign="middle" align="center">49,450 (9.7)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">812 (9.6)</td>
<td valign="middle" align="center">55,571 (9.7)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">6</td>
<td valign="middle" align="center">2,056 (9.3)</td>
<td valign="middle" align="center">49,965 (9.8)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">840 (9.9)</td>
<td valign="middle" align="center">56,601 (9.9)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">7</td>
<td valign="middle" align="center">1,980 (8.9)</td>
<td valign="middle" align="center">50,467 (9.9)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">844 (10.0)</td>
<td valign="middle" align="center">57,776 (10.1)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">8</td>
<td valign="middle" align="center">2,077 (9.4)</td>
<td valign="middle" align="center">51,321 (10.1)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">885 (10.4)</td>
<td valign="middle" align="center">58,854 (10.3)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">9</td>
<td valign="middle" align="center">1,972 (8.9)</td>
<td valign="middle" align="center">52,884 (10.4)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">858 (10.1)</td>
<td valign="middle" align="center">61,048 (10.7)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">10 (affluent)</td>
<td valign="middle" align="center">1,953 (8.8)</td>
<td valign="middle" align="center">53,904 (10.6)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">821 (9.7)</td>
<td valign="middle" align="center">62,514 (11.0)</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Missing</td>
<td valign="middle" align="center">138 (0.6)</td>
<td valign="middle" align="center">2,261 (0.4)</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">54 (0.6)</td>
<td valign="middle" align="center">2,134 (0.4)</td>
<td valign="middle" align="left"></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Notes: IDACI = Income deprivation affecting children index. Stand. Diff.= Standardized Difference.* Value omitted to avoid risk of disclosure due to small cell count.</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec>
<title>Evaluation of linkage from NPD to HES</title>
<p><xref ref-type="table" rid="table-5">Table 5</xref> shows the results of multivariable logistic models displaying adjusted Odds Ratios (OR) for linkage to HES. Unadjusted models are also shown in <xref ref-type="supplementary-material" rid="sup-a">Supplementary Appendix 9</xref>. OR below 1 indicates lower odds of linkage to HES compared with the reference category. Consistent with linkage rate estimates, we found differences across ethnic groups, deprivation and region. Across all cohorts, we found that relative to pupils of white ethnicity, pupils of ethnic minorities including Asian, Black, Chinese, Mixed and Any other ethnic group were less like to be matched. The odds of linkage to HES for Asian ethnic groups were less than ethnic minority pupils (e.g. 1990/91: Adjusted OR 0.69, 95% CI 0.66 to 0.72, p &#x003C; 0.01; 2004/05: Adjusted OR 0.51, 95% CI 0.47 to 0.54, p &#x003C; 0.01). Relative to male pupils, with the exception of pupils born in academic year 1990/91, female pupils were less likely to be matched (e.g. 2004/05: Adjusted OR 0.72, 95% CI 0.69 to 0.75, p &#x003C; 0.01). Compared to pupils in the fifth IDACI Deciles, pupils living in the most deprived areas were less likely to be matched, whereas pupils living in the most affluent areas were more likely to be matched. Similarly, results for the region of pupil residence show differences for linkage success.</p>
<table-wrap id="table-5">
<label>Table 5: Adjusted odds ratios for a link between NPD and HES records according to sociodemographic characteristics in the NPD</label>
<table>
<thead>
<tr>
<th rowspan="2" valign="middle" align="center" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Characteristics from NPD</bold></th>
<th colspan="2" valign="middle" align="center" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Cohort 1990/91</bold></th>
<th colspan="2" valign="middle" align="center" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Cohort 1996/97</bold></th>
</tr>
<tr>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>aOR</bold></th>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>Conf. Int.</bold></th>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>aOR</bold></th>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>Conf. Int.</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="5" valign="middle" align="left">Ethnic group</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">White</td>
<td valign="middle" align="center">Ref</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">Ref</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Asian</td>
<td valign="middle" align="center">0.69</td>
<td valign="middle" align="center">[0.66,0.72]**</td>
<td valign="middle" align="center">0.69</td>
<td valign="middle" align="center">[0.66,0.73]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Black</td>
<td valign="middle" align="center">0.62</td>
<td valign="middle" align="center">[0.59,0.66]**</td>
<td valign="middle" align="center">0.67</td>
<td valign="middle" align="center">[0.63,0.71]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Chinese</td>
<td valign="middle" align="center">0.29</td>
<td valign="middle" align="center">[0.26,0.33]**</td>
<td valign="middle" align="center">0.38</td>
<td valign="middle" align="center">[0.33,0.44]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Any other ethnic group</td>
<td valign="middle" align="center">0.42</td>
<td valign="middle" align="center">[0.38,0.46]**</td>
<td valign="middle" align="center">0.32</td>
<td valign="middle" align="center">[0.30,0.35]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Mixed</td>
<td valign="middle" align="center">0.92</td>
<td valign="middle" align="center">[0.85,0.98]*</td>
<td valign="middle" align="center">0.80</td>
<td valign="middle" align="center">[0.75,0.85]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Missing</td>
<td valign="middle" align="center">0.03</td>
<td valign="middle" align="center">[0.03,0.03]**</td>
<td valign="middle" align="center">0.01</td>
<td valign="middle" align="center">[0.01,0.02]**</td>
</tr>
<tr>
<td colspan="5" valign="middle" align="left">Sex</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Male</td>
<td valign="middle" align="center">Ref</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">Ref</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Female</td>
<td valign="middle" align="center">1.35</td>
<td valign="middle" align="center">[1.32,1.37]**</td>
<td valign="middle" align="center">0.87</td>
<td valign="middle" align="center">[0.85,0.89]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Missing</td>
<td valign="middle" align="center">22.77</td>
<td valign="middle" align="center">[17.02,30.47]**</td>
<td valign="middle" align="center">10.21</td>
<td valign="middle" align="center">[5.77,18.07]**</td>
</tr>
<tr>
<td colspan="5" valign="middle" align="left">Region</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">London</td>
<td valign="middle" align="center">Ref</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">Ref</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">South East</td>
<td valign="middle" align="center">1.31</td>
<td valign="middle" align="center">[1.26,1.36]**</td>
<td valign="middle" align="center">1.12</td>
<td valign="middle" align="center">[1.08,1.17]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">South West</td>
<td valign="middle" align="center">1.34</td>
<td valign="middle" align="center">[1.28,1.40]**</td>
<td valign="middle" align="center">1.38</td>
<td valign="middle" align="center">[1.31,1.45]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">West Midlands</td>
<td valign="middle" align="center">1.27</td>
<td valign="middle" align="center">[1.22,1.33]**</td>
<td valign="middle" align="center">1.37</td>
<td valign="middle" align="center">[1.30,1.43]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">North West</td>
<td valign="middle" align="center">1.36</td>
<td valign="middle" align="center">[1.30,1.41]**</td>
<td valign="middle" align="center">1.64</td>
<td valign="middle" align="center">[1.57,1.72]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">North East</td>
<td valign="middle" align="center">1.91</td>
<td valign="middle" align="center">[1.80,2.04]**</td>
<td valign="middle" align="center">1.99</td>
<td valign="middle" align="center">[1.85,2.14]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Yorkshire and The Humber</td>
<td valign="middle" align="center">1.34</td>
<td valign="middle" align="center">[1.28,1.40]**</td>
<td valign="middle" align="center">1.42</td>
<td valign="middle" align="center">[1.35,1.49]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">East Midlands</td>
<td valign="middle" align="center">1.28</td>
<td valign="middle" align="center">[1.23,1.35]**</td>
<td valign="middle" align="center">1.22</td>
<td valign="middle" align="center">[1.16,1.28]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">East of England</td>
<td valign="middle" align="center">1.14</td>
<td valign="middle" align="center">[1.09,1.19]**</td>
<td valign="middle" align="center">1.00</td>
<td valign="middle" align="center">[0.95,1.04]</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Wales</td>
<td valign="middle" align="center">0.31</td>
<td valign="middle" align="center">[0.16,0.59]**</td>
<td valign="middle" align="center">0.40</td>
<td valign="middle" align="center">[0.17,0.93]*</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Missing</td>
<td valign="middle" align="center">1.16</td>
<td valign="middle" align="center">[1.10,1.23]**</td>
<td valign="middle" align="center">1.08</td>
<td valign="middle" align="center">[1.01,1.14]*</td>
</tr>
<tr>
<td colspan="5" valign="middle" align="left">IDACI Deciles</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">1 (deprived)</td>
<td valign="middle" align="center">0.67</td>
<td valign="middle" align="center">[0.64,0.70]**</td>
<td valign="middle" align="center">0.71</td>
<td valign="middle" align="center">[0.67,0.74]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">2</td>
<td valign="middle" align="center">0.78</td>
<td valign="middle" align="center">[0.74,0.81]**</td>
<td valign="middle" align="center">0.77</td>
<td valign="middle" align="center">[0.73,0.81]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">3</td>
<td valign="middle" align="center">0.86</td>
<td valign="middle" align="center">[0.82,0.90]**</td>
<td valign="middle" align="center">0.87</td>
<td valign="middle" align="center">[0.83,0.92]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">4</td>
<td valign="middle" align="center">0.95</td>
<td valign="middle" align="center">[0.90,0.99]*</td>
<td valign="middle" align="center">0.90</td>
<td valign="middle" align="center">[0.85,0.94]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">5</td>
<td valign="middle" align="center">Ref</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">Ref</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">6</td>
<td valign="middle" align="center">1.11</td>
<td valign="middle" align="center">[1.05,1.16]**</td>
<td valign="middle" align="center">1.05</td>
<td valign="middle" align="center">[1.00,1.11]</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">7</td>
<td valign="middle" align="center">1.26</td>
<td valign="middle" align="center">[1.20,1.32]**</td>
<td valign="middle" align="center">1.23</td>
<td valign="middle" align="center">[1.16,1.29]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">8</td>
<td valign="middle" align="center">1.31</td>
<td valign="middle" align="center">[1.25,1.38]**</td>
<td valign="middle" align="center">1.27</td>
<td valign="middle" align="center">[1.20,1.34]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">9</td>
<td valign="middle" align="center">1.31</td>
<td valign="middle" align="center">[1.25,1.38]**</td>
<td valign="middle" align="center">1.37</td>
<td valign="middle" align="center">[1.29,1.44]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">10 (affluent)</td>
<td valign="middle" align="center">1.27</td>
<td valign="middle" align="center">[1.21,1.34]**</td>
<td valign="middle" align="center">1.52</td>
<td valign="middle" align="center">[1.44,1.61]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Missing</td>
<td valign="middle" align="center">0.95</td>
<td valign="middle" align="center">[0.86,1.04]</td>
<td valign="middle" align="center">1.06</td>
<td valign="middle" align="center">[0.95,1.18]</td>
</tr>
<tr>
<td valign="middle" align="left">Observations</td>
<td valign="middle" align="center">613,732</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">571,918</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left">Pseudo R-squared</td>
<td valign="middle" align="center">0.162</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">0.093</td>
<td valign="middle" align="left"></td>
</tr>
</tbody>
</table>
<table>
<thead>
<tr>
<th rowspan="2" valign="middle" align="left" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Characteristics from NPD</bold></th>
<th colspan="2" valign="middle" align="center" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Cohort 1999/00</bold></th>
<th colspan="2" valign="middle" align="center" style="border-top: solid 1pt; border-bottom: solid 1pt"><bold>Cohort 2004/05</bold></th>
</tr>
<tr>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>aOR</bold></th>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>Conf. Int.</bold></th>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>aOR</bold></th>
<th valign="middle" align="center" style="border-bottom: solid 1pt"><bold>Conf. Int.</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="5" valign="middle" align="left">Ethnic group</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">White</td>
<td valign="middle" align="center">Ref</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">Ref</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Asian</td>
<td valign="middle" align="center">0.56</td>
<td valign="middle" align="center">[0.54,0.59]**</td>
<td valign="middle" align="center">0.51</td>
<td valign="middle" align="center">[0.47,0.54]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Black</td>
<td valign="middle" align="center">0.43</td>
<td valign="middle" align="center">[0.40,0.45]**</td>
<td valign="middle" align="center">0.47</td>
<td valign="middle" align="center">[0.43,0.51]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Chinese</td>
<td valign="middle" align="center">0.35</td>
<td valign="middle" align="center">[0.30,0.41]**</td>
<td valign="middle" align="center">0.27</td>
<td valign="middle" align="center">[0.22,0.34]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Any other ethnic group</td>
<td valign="middle" align="center">0.26</td>
<td valign="middle" align="center">[0.24,0.28]**</td>
<td valign="middle" align="center">0.18</td>
<td valign="middle" align="center">[0.17,0.20]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Mixed</td>
<td valign="middle" align="center">0.64</td>
<td valign="middle" align="center">[0.60,0.68]**</td>
<td valign="middle" align="center">0.60</td>
<td valign="middle" align="center">[0.55,0.66]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Missing</td>
<td valign="middle" align="center">0.21</td>
<td valign="middle" align="center">[0.17,0.25]**</td>
<td valign="middle" align="center">0.09</td>
<td valign="middle" align="center">[0.07,0.10]**</td>
</tr>
<tr>
<td colspan="5" valign="middle" align="left">Sex</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Male</td>
<td valign="middle" align="center">Ref</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">Ref</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Female</td>
<td valign="middle" align="center">0.73</td>
<td valign="middle" align="center">[0.71,0.75]**</td>
<td valign="middle" align="center">0.72</td>
<td valign="middle" align="center">[0.69,0.75]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Missing</td>
<td valign="middle" align="center">0.61</td>
<td valign="middle" align="center">[0.39,0.96]*</td>
<td valign="middle" align="center">1.00</td>
<td valign="middle" align="center">[0.29,3.50]</td>
</tr>
<tr>
<td colspan="5" valign="middle" align="left">Region</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">London</td>
<td valign="middle" align="center">Ref</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">Ref</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">South East</td>
<td valign="middle" align="center">1.00</td>
<td valign="middle" align="center">[0.95,1.04]</td>
<td valign="middle" align="center">0.94</td>
<td valign="middle" align="center">[0.87,1.02]</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">South West</td>
<td valign="middle" align="center">1.62</td>
<td valign="middle" align="center">[1.51,1.73]**</td>
<td valign="middle" align="center">1.38</td>
<td valign="middle" align="center">[1.24,1.54]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">West Midlands</td>
<td valign="middle" align="center">1.23</td>
<td valign="middle" align="center">[1.16,1.30]**</td>
<td valign="middle" align="center">1.21</td>
<td valign="middle" align="center">[1.11,1.33]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">North West</td>
<td valign="middle" align="center">1.64</td>
<td valign="middle" align="center">[1.55,1.74]**</td>
<td valign="middle" align="center">1.09</td>
<td valign="middle" align="center">[1.00,1.19]*</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">North East</td>
<td valign="middle" align="center">1.82</td>
<td valign="middle" align="center">[1.67,2.00]**</td>
<td valign="middle" align="center">1.71</td>
<td valign="middle" align="center">[1.47,1.99]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Yorkshire and The Humber</td>
<td valign="middle" align="center">1.61</td>
<td valign="middle" align="center">[1.51,1.72]**</td>
<td valign="middle" align="center">1.23</td>
<td valign="middle" align="center">[1.12,1.35]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">East Midlands</td>
<td valign="middle" align="center">1.14</td>
<td valign="middle" align="center">[1.08,1.21]**</td>
<td valign="middle" align="center">0.96</td>
<td valign="middle" align="center">[0.87,1.06]</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">East of England</td>
<td valign="middle" align="center">0.86</td>
<td valign="middle" align="center">[0.81,0.90]**</td>
<td valign="middle" align="center">0.83</td>
<td valign="middle" align="center">[0.76,0.90]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Wales</td>
<td valign="middle" align="center">0.37</td>
<td valign="middle" align="center">[0.17,0.80]*</td>
<td valign="middle" align="center">0.36</td>
<td valign="middle" align="center">[0.11,1.19]</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Missing</td>
<td valign="middle" align="center">0.97</td>
<td valign="middle" align="center">[0.91,1.03]</td>
<td valign="middle" align="center">0.74</td>
<td valign="middle" align="center">[0.68,0.82]**</td>
</tr>
<tr>
<td colspan="5" valign="middle" align="left">IDACI Deciles</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">1 (deprived)</td>
<td valign="middle" align="center">0.73</td>
<td valign="middle" align="center">[0.68,0.77]**</td>
<td valign="middle" align="center">0.82</td>
<td valign="middle" align="center">[0.75,0.90]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">2</td>
<td valign="middle" align="center">0.82</td>
<td valign="middle" align="center">[0.77,0.87]**</td>
<td valign="middle" align="center">0.88</td>
<td valign="middle" align="center">[0.80,0.97]*</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">3</td>
<td valign="middle" align="center">0.89</td>
<td valign="middle" align="center">[0.83,0.94]**</td>
<td valign="middle" align="center">0.90</td>
<td valign="middle" align="center">[0.82,1.00]*</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">4</td>
<td valign="middle" align="center">0.92</td>
<td valign="middle" align="center">[0.86,0.97]**</td>
<td valign="middle" align="center">1.03</td>
<td valign="middle" align="center">[0.93,1.14]</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">5</td>
<td valign="middle" align="center">Ref</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">Ref</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">6</td>
<td valign="middle" align="center">1.10</td>
<td valign="middle" align="center">[1.03,1.17]**</td>
<td valign="middle" align="center">1.03</td>
<td valign="middle" align="center">[0.94,1.14]</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">7</td>
<td valign="middle" align="center">1.18</td>
<td valign="middle" align="center">[1.11,1.26]**</td>
<td valign="middle" align="center">1.11</td>
<td valign="middle" align="center">[1.00,1.22]*</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">8</td>
<td valign="middle" align="center">1.20</td>
<td valign="middle" align="center">[1.13,1.28]**</td>
<td valign="middle" align="center">1.14</td>
<td valign="middle" align="center">[1.03,1.25]*</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">9</td>
<td valign="middle" align="center">1.36</td>
<td valign="middle" align="center">[1.27,1.45]**</td>
<td valign="middle" align="center">1.31</td>
<td valign="middle" align="center">[1.18,1.45]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">10 (affluent)</td>
<td valign="middle" align="center">1.48</td>
<td valign="middle" align="center">[1.39,1.58]**</td>
<td valign="middle" align="center">1.57</td>
<td valign="middle" align="center">[1.42,1.74]**</td>
</tr>
<tr>
<td valign="middle" align="left" style="padding-left:1em">Missing</td>
<td valign="middle" align="center">0.87</td>
<td valign="middle" align="center">[0.73,1.05]</td>
<td valign="middle" align="center">0.80</td>
<td valign="middle" align="center">[0.59,1.07]</td>
</tr>
<tr>
<td valign="middle" align="left">Observations</td>
<td valign="middle" align="center">529,910</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">578,809</td>
<td valign="middle" align="left"></td>
</tr>
<tr>
<td valign="middle" align="left">Pseudo R-squared</td>
<td valign="middle" align="center">0.026</td>
<td valign="middle" align="left"></td>
<td valign="middle" align="center">0.027</td>
<td valign="middle" align="left"></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Notes: Adjusted for all other covariates listed in the table. *p &#x003C; 0.05, **p &#x003C; 0.01. aOR = adjusted odds ratios. Conf. Int. = confidence interval. NPD = national pupil dataset. HES = hospital episode statistics; NHS = national health service. IDACI = income deprivation affecting children index.</p>
</table-wrap-foot>
</table-wrap>
</sec>
</sec>
<sec>
<title>Discussion</title>
<p>This study is the first to link administrative records from schools and hospitals for all children and adolescents attending state-funded schools in England for four 1-year birth cohorts (~2.2 million children). It builds upon previous studies that have demonstrated the public benefit and challenges for data sharing across educational and health services for specific subgroups [<xref ref-type="bibr" rid="ref-8">8</xref>, <xref ref-type="bibr" rid="ref-13">13</xref>, <xref ref-type="bibr" rid="ref-35">35</xref>, <xref ref-type="bibr" rid="ref-36">36</xref>], and in other countries [<xref ref-type="bibr" rid="ref-9">9</xref>&#x2013;<xref ref-type="bibr" rid="ref-14">14</xref>]. We evaluated two deterministic algorithms implemented by NHS Digital and found that although linkage rates were high and improved over time, pupils from ethnic minority groups or living in areas of high deprivation were disproportionately less likely to match to HES.</p>
<sec>
<title>Key findings</title>
<p>Our finding that the linkage rate was 99% for the youngest cohort is encouraging for future studies using multi-step deterministic algorithms in England. This linkage rate is similar to studies in Scotland, Wales and Australia that used probabilistic linkage methods [<xref ref-type="bibr" rid="ref-11">11</xref>, <xref ref-type="bibr" rid="ref-13">13</xref>, <xref ref-type="bibr" rid="ref-14">14</xref>, <xref ref-type="bibr" rid="ref-37">37</xref>&#x2013;<xref ref-type="bibr" rid="ref-39">39</xref>]. For instance, linkage rates for the annual Scottish Governments pupils census linked to the community health index database ranged between 86.3% and 95% [<xref ref-type="bibr" rid="ref-14">14</xref>], while two other Scottish studies found linkage rates of 99.7% [<xref ref-type="bibr" rid="ref-13">13</xref>] and 81.8% [<xref ref-type="bibr" rid="ref-11">11</xref>].</p>
<p>We found that between 2.3&#x2013;7.6% of ethnic minority pupils were not linked to health records. Ethnic differences reported in previous linkage success reflect differences in the quality of registration of Chinese, Asian and Hispanic names [<xref ref-type="bibr" rid="ref-8">8</xref>, <xref ref-type="bibr" rid="ref-27">27</xref>, <xref ref-type="bibr" rid="ref-28">28</xref>]. The differences in linkage rates by ethnic minority in linkage steps that relaxed the requirement to agree on exact full name suggest that inconsistencies in forenames and surnames explain the lower linkage rates for ethnic minority pupils. Residential instability may also be relevant: lower rates of linkage for pupils from ethnic minorities at steps 1 and 2 between PDS and HES (i.e. stage 2), could be due to poor recording of postcode, as reported in other studies [<xref ref-type="bibr" rid="ref-40">40</xref>, <xref ref-type="bibr" rid="ref-41">41</xref>]. It is also estimated that 20% of children aged 0 to 15 years are born outside the UK, which may have a differential impact on linkage success [<xref ref-type="bibr" rid="ref-42">42</xref>]. Additional steps in the deterministic algorithm that incorporate phonetic systems codes for other languages [<xref ref-type="bibr" rid="ref-43">43</xref>, <xref ref-type="bibr" rid="ref-44">44</xref>], or methods that discriminate partial agreements in string comparisons [<xref ref-type="bibr" rid="ref-45">45</xref>&#x2013;<xref ref-type="bibr" rid="ref-48">48</xref>], or probabilistic linkage methods could be used to further improve linkage rates for ethnic minorities [<xref ref-type="bibr" rid="ref-40">40</xref>, <xref ref-type="bibr" rid="ref-48">48</xref>].</p>
<p>We found that pupils living in more deprived neighbourhoods were less likely to link to health records than pupils living in more affluent areas. Previous studies have suggested that families from more affluent areas are more likely to comply with the administrative process [<xref ref-type="bibr" rid="ref-8">8</xref>]. However, pupils living in London were less likely to link to HES records than in other regions, even after accounting for sociodemographic characteristics. This difference may reflect higher rates of international emigration from London, less use of health services, differential use of private health services, or poorer quality of identifiers in London.</p>
<p>Improvements in the quality of recording of identifiers in schools and health data systems likely account for improved linkage rates over time. Changes in health systems governing collection of patient identifiers, such as the implementation of NHS Numbers for Babies (NN4B) service on 29th October 2002, the introduction of Registration ONline system (RON) on 1st July 2009, the correction of a postcode extraction error by NHS Digital on 1st April 2013, have been shown to improve the completeness of identifiers used in the linkage [<xref ref-type="bibr" rid="ref-20">20</xref>]. Retrospective correction of this extraction error and re-linkage by NHS Digital of birth episodes to subsequent HES records, would be expected to improve linkage to NPD in earlier years.</p>
</sec>
<sec>
<title>Strengths and limitations</title>
<p>Our study demonstrates very high linkage rates between educational and HES records for pupils attending state schools in England. The governance for this project addressed the challenges of cross-sectoral linkage between health and educational institutions in England whilst avoiding disclosure during the linkage process [<xref ref-type="bibr" rid="ref-16">16</xref>]. Use of multiple steps at each stage of linkage, and of identifiers recorded over multiple years for each child, were critical to achieving high linkage rates. Preliminary findings indicate that two-thirds of the linked HES records related to at least one admission, excluding the birth admission (to be reported elsewhere). The linkage algorithms used for this project are currently being used to link educational and health records for all pupils in England born academic years 1995/96 onwards and will be relevant for other studies linking data to HES or NPD (or both) [<xref ref-type="bibr" rid="ref-17">17</xref>].</p>
<p>Linking educational data with hospital and death records creates new possibilities for studying a wide spectrum of policy-relevant questions. For example, the availability of data across the child life course could enable studies into the impact of health on education and education on health. Linked data for all children will be made available for applications for research from government and academia in 2021 [<xref ref-type="bibr" rid="ref-49">49</xref>, <xref ref-type="bibr" rid="ref-50">50</xref>].</p>
<p>Record-level indicators of the linkage process (i.e. variables indicating the step in our rule-based linkage algorithms at which a pair of records linked) were shared by NHS Digital to enable us to evaluate linkage biases. We used this information to demonstrate the value of later steps in the algorithm for linking pupils from ethnic minority and deprived areas. However, we did not have information on country of birth, and so could not assess whether linkage rates were lower for children who were born outside England. Future studies should consider sharing information about the completeness or quality of the identifiers to identify whether changes in data entry systems could address missed links in these more vulnerable groups [<xref ref-type="bibr" rid="ref-16">16</xref>].</p>
<p>A limitation and advantage were the system changes in administrative data resulting in improvements in identifier and linkage quality and additional data collections from both services. These changes can introduce variation in linkage error over time, for instance, patients with fewer contacts with health services or more mobile populations could have out-of-date residential information in PDS disproportionately affecting linkage quality, which analysts need to consider when investigating trends.</p>
<p>A further limitation is that since no gold-standard dataset defining true match status was available, we could not derive standard measures of linkage quality (sensitivity/recall, false match rate and positive predictive value/precision). Approaches for estimating rates of false matches in further linkage between HES and NPD could be applied, for example by applying the linkage algorithms to a set of &#x2018;negative controls&#x2019; (i.e. NPD records for which we are certain there should be no link in HES or vice versa) and counting how many records were erroneously linked [<xref ref-type="bibr" rid="ref-51">51</xref>, <xref ref-type="bibr" rid="ref-52">52</xref>]. This would allow an estimation of false match rates, but would not allow identification of which records were falsely matched. Existing &#x2018;gold-standard&#x2019; data for health records in England for specific sub populations also have the potential to be used in the future evaluations of linkage quality [<xref ref-type="bibr" rid="ref-53">53</xref>]. Future studies could develop representative gold-standard data using known links from UK cohort studies, such as the Millennium Cohort Study or Next Steps to allow linkage error to be fully measured [<xref ref-type="bibr" rid="ref-54">54</xref>, <xref ref-type="bibr" rid="ref-55">55</xref>].</p>
</sec>
<sec>
<title>Implications</title>
<p>We created a de-identified linked database that brings together data from the Department for Education (education and social care) and hospitalisation data for all children &#x2013; the ECHILD Database. This resource will be made available for approved researchers later in 2021 for purposes that benefit health, wellbeing, education and the provision of health or social care. The ECHILD dataset will enable a step change in the scale and depth of research into the inter-relationships between health, education and social care across the life course, and how services across England vary in their responses.</p>
<p>Our linkage created a de-identified bridging file that combines pseudo-identifiers from education (anonymised Pupil Matching Reference) and HES. This bridging file can be used by the data providers to link to further datasets for approved studies, without the need to link real-world identifiers such as names and postcodes. As the data systems for capturing identifiers change, as is currently happening at NHS Digital [<xref ref-type="bibr" rid="ref-56">56</xref>], our evaluation of linkage success will need to be repeated and linkage metrics provided to researchers.</p>
<p>Researchers addressing questions relating to ethnic minority or deprived groups need to consider whether to adjust for missing data among these groups due to missed links. Statistical techniques include weighting or imputation, depending on the research objectives [<xref ref-type="bibr" rid="ref-57">57</xref>].</p>
</sec>
</sec>
<sec>
<title>Conclusion</title>
<p>We found high linkage rates between administrative education and hospital data for pupils in four cohorts born between academic years 1990/91-2004/05 in England. Linkage rates improved over time, but ethnic minorities and pupils living in deprived neighbourhoods were disproportionally affected by linkage error. Evidence from comparing linked and unlinked populations provides measures that can be used to take into account potential biases due to linkage error.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Files</title>
<supplementary-material id="sup-a">
<label>Supplementary Appendices</label> 
<media mimetype="application" mime-subtype="pdf" xlink:href="ijpds-06-1671-s001.pdf"/>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<title>Acknowledgements</title>
<p>This research benefits from and contributes to the NIHR Children and Families Policy Research Unit, but was not commissioned by the National Institute for Health Research (NIHR) Policy Research Programme. We are grateful to Gary Connell (Department for Education), Garry Coleman (NHS Digital) and their teams for supporting this work. We thank to the ECHILD team: Dr. David Etoori, Dr. Louise Mc Grath-Lone, Matthew Lilliman and Dr Erin Walker.</p>
</ack>
<sec>
<title>Data availability</title>
<p>The data underlying this article cannot be shared publicly due to data sharing agreements with NHS Digital and Department for Education.</p>
</sec>
<sec>
<title>Ethics statement</title>
<p>Research ethics approval was granted (project ID 232547, REC reference 17/LO/1494) and data sharing agreements are in place with NHS Digital (NIC- 27404) and the Department for Education (DR150701.02). The Confidentiality Advisory Group confirmed that this research is exempt from review (reference 15/CAG/0004) because it only uses pseudonymised NHS data.</p>
</sec>
<fn-group>
<fn fn-type="other">
<label>Supplementary appendices</label>
<p>Supplementary Appendix 1: Description of Linkage bridging files transferred to UCL Data Safe Haven and to the Office of National Statistics Secure Research Service</p>
<p>Supplementary Appendix 2: Timelines of the four cohorts alongside availability of data from Hospital Episode Statistics, National Pupil Dataset data and Personal Demographics Service</p>
<p>Supplementary Appendix 3: Description of data resources used in the linkage.</p>
<p>Supplementary Appendix 4: Description of linkage between Personal Demographics Service and National Pupil Dataset</p>
<p>Supplementary Appendix 5: Description of demographic variables in National Pupil Dataset</p>
<p>Supplementary Appendix 6. Performance of linkage stages</p>
<p>Supplementary Appendix 7. Linking rates</p>
<p>Supplementary Appendix 8. Standardized differences and P-values</p>
<p>Supplementary Appendix 9. Linkage evaluation Logit models</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="ref-1"><label>1</label><mixed-citation publication-type="journal"><string-name><surname>Herbert</surname> <given-names>A</given-names></string-name>, <string-name><surname>Wijlaars</surname> <given-names>L</given-names></string-name>, <string-name><surname>Zylbersztejn</surname> <given-names>A</given-names></string-name>, <string-name><surname>Cromwell</surname> <given-names>D</given-names></string-name>, <string-name><surname>Hardelid</surname> <given-names>P</given-names></string-name>. <article-title>Data Resource Profile: Hospital Episode Statistics Admitted Patient Care (HES APC)</article-title>. <source>Int J Epidemiol</source>. <year>2017</year>;<volume>46</volume>(<issue>4</issue>):<fpage>1093</fpage>-<lpage>i</lpage>. <pub-id pub-id-type="doi">10.1093/ije/dyx015</pub-id>.</mixed-citation></ref>
<ref id="ref-2"><label>2</label><mixed-citation publication-type="journal"><string-name><surname>Jay</surname> <given-names>MA</given-names></string-name>, <string-name><surname>Grath-Lone</surname> <given-names>LM</given-names></string-name>, <string-name><surname>Gilbert</surname> <given-names>R</given-names></string-name>. <article-title>Data Resource: the National Pupil Database (NPD)</article-title>. <source>International Journal of Population Data Science</source>. <year>2019</year>;<volume>4</volume>(<issue>1</issue>). <pub-id pub-id-type="doi">10.23889/ijpds.v4i1.1101</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>3</label><mixed-citation publication-type="journal"><string-name><surname>Crawford</surname> <given-names>C</given-names></string-name>, <string-name><surname>Dearden</surname> <given-names>L</given-names></string-name>, <string-name><surname>Greaves</surname> <given-names>E</given-names></string-name>. <article-title>The drivers of month-of-birth differences in children&#x2019;s cognitive and non-cognitive skills</article-title>. <source>J R Stat Soc Ser A Stat Soc</source>. <year>2014</year>;<volume>177</volume>(<issue>4</issue>):<fpage>829</fpage>&#x2013;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.1111/rssa.12071</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>4</label><mixed-citation publication-type="journal"><string-name><surname>Zylbersztejn</surname> <given-names>A</given-names></string-name>, <string-name><surname>Gilbert</surname> <given-names>R</given-names></string-name>, <string-name><surname>Hjern</surname> <given-names>A</given-names></string-name>, <string-name><surname>Wijlaars</surname> <given-names>L</given-names></string-name>, <string-name><surname>Hardelid</surname> <given-names>P</given-names></string-name>. <article-title>Child mortality in England compared with Sweden: a birth cohort study</article-title>. <source>Lancet</source>. <year>2018</year>;<volume>391</volume>(<issue>10134</issue>):<fpage>2008</fpage>-<lpage>18</lpage>. <pub-id pub-id-type="doi">10.1016/S0140-6736(18)30670-6</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>5</label><mixed-citation publication-type="journal"><string-name><surname>Herbert</surname> <given-names>A</given-names></string-name>, <string-name><surname>Gilbert</surname> <given-names>R</given-names></string-name>, <string-name><surname>Gonzalez-Izquierdo</surname> <given-names>A</given-names></string-name>, <string-name><surname>Li</surname> <given-names>L</given-names></string-name>. <article-title>Violence, self-harm and drug or alcohol misuse in adolescents admitted to hospitals in England for injury: a retrospective cohort study</article-title>. <source>BMJ Open</source>. <year>2015</year>;<volume>5</volume>(<issue>2</issue>):<fpage>e006079</fpage>. <pub-id pub-id-type="doi">10.1136/bmjopen-2014-006079</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>6</label><mixed-citation publication-type="journal"><string-name><surname>Coathup</surname> <given-names>V</given-names></string-name>, <string-name><surname>Boyle</surname> <given-names>E</given-names></string-name>, <string-name><surname>Carson</surname> <given-names>C</given-names></string-name>, <string-name><surname>Johnson</surname> <given-names>S</given-names></string-name>, <string-name><surname>Kurinzcuk</surname> <given-names>JJ</given-names></string-name>, <string-name><surname>Macfarlane</surname> <given-names>A</given-names></string-name>, <etal>et al</etal>. <article-title>Gestational age and hospital admissions during childhood: population based, record linkage study in England (TIGAR study)</article-title>. <source>BMJ</source>. <year>2020</year>;<volume>371</volume>:<fpage>m4075</fpage>. <pub-id pub-id-type="doi">10.1136/bmj.m4075</pub-id>.</mixed-citation></ref>
<ref id="ref-7"><label>7</label><mixed-citation publication-type="journal"><string-name><surname>Harron</surname> <given-names>K</given-names></string-name>, <string-name><surname>Gilbert</surname> <given-names>R</given-names></string-name>, <string-name><surname>Fagg</surname> <given-names>J</given-names></string-name>, <string-name><surname>Guttmann</surname> <given-names>A</given-names></string-name>, <string-name><surname>van der Meulen</surname> <given-names>J</given-names></string-name>. <article-title>Associations between pre-pregnancy psychosocial risk factors and infant outcomes: a population-based cohort study in England</article-title>. <source>Lancet Public Health</source>. <year>2021</year>;<volume>6</volume>(<issue>2</issue>):<fpage>e97</fpage>&#x2013;<lpage>e105</lpage>. <pub-id pub-id-type="doi">10.1016/S2468-2667(20)30210-3</pub-id></mixed-citation></ref>
<ref id="ref-8"><label>8</label><mixed-citation publication-type="journal"><string-name><surname>Downs</surname> <given-names>JM</given-names></string-name>, <string-name><surname>Ford</surname> <given-names>T</given-names></string-name>, <string-name><surname>Stewart</surname> <given-names>R</given-names></string-name>, <string-name><surname>Epstein</surname> <given-names>S</given-names></string-name>, <string-name><surname>Shetty</surname> <given-names>H</given-names></string-name>, <string-name><surname>Little</surname> <given-names>R</given-names></string-name>, <etal>et al</etal>. <article-title>An approach to linking education, social care and electronic health records for children and young people in South London: a linkage study of child and adolescent mental health service data</article-title>. <source>BMJ Open</source>. <year>2019</year>;<volume>9</volume>(<issue>1</issue>):<fpage>e024355</fpage>. <pub-id pub-id-type="doi">10.1136/bmjopen-2018-024355</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>9</label><mixed-citation publication-type="journal"><string-name><surname>Jones</surname> <given-names>KH</given-names></string-name>, <string-name><surname>Ford</surname> <given-names>DV</given-names></string-name>, <string-name><surname>Thompson</surname> <given-names>S</given-names></string-name>. <article-title>A Profile of the SAIL Databank on the UK Secure Research Platform</article-title>. <source>International journal of population data science</source>. <year>2019</year>;<volume>4</volume>(<issue>2</issue>). <pub-id pub-id-type="doi">10.23889/ijpds.v4i2.1134</pub-id>.</mixed-citation></ref>
<ref id="ref-10"><label>10</label><mixed-citation publication-type="book"><string-name><surname>Lynch</surname> <given-names>J</given-names></string-name>. <chapter-title>The South Australian Early Childhood Data Project</chapter-title>. <publisher-name>School of Public Health</publisher-name>: <publisher-loc>University of Adelaide</publisher-loc>; <year>2016 2016</year>. Report No.: 4.</mixed-citation></ref>
<ref id="ref-11"><label>11</label><mixed-citation publication-type="journal"><string-name><surname>MacKay</surname> <given-names>DF</given-names></string-name>, <string-name><surname>Smith</surname> <given-names>GCS</given-names></string-name>, <string-name><surname>Dobbie</surname> <given-names>R</given-names></string-name>, <string-name><surname>Pell</surname> <given-names>JP</given-names></string-name>. <article-title>Gestational Age at Delivery and Special Educational Need: Retrospective Cohort Study of 407,503 Schoolchildren</article-title>. <source>PLOS Medicine</source>. <year>2010</year>;<volume>7</volume>(<issue>6</issue>):<fpage>e1000289</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pmed.1000289</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>12</label><mixed-citation publication-type="journal"><string-name><surname>Maret-Ouda</surname> <given-names>J</given-names></string-name>, <string-name><surname>Tao</surname> <given-names>W</given-names></string-name>, <string-name><surname>Wahlin</surname> <given-names>K</given-names></string-name>, <string-name><surname>Lagergren</surname> <given-names>J</given-names></string-name>. <article-title>Nordic registry-based cohort studies: Possibilities and pitfalls when combining Nordic registry data</article-title>. <source>Scand J Public Health</source>. <year>2017</year>;<volume>45</volume>(<supplement>17</supplement>_suppl):<fpage>14</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1177/1403494817702336</pub-id>.</mixed-citation></ref>
<ref id="ref-13"><label>13</label><mixed-citation publication-type="journal"><string-name><surname>Stewart</surname> <given-names>CH</given-names></string-name>, <string-name><surname>Dundas</surname> <given-names>R</given-names></string-name>, <string-name><surname>Leyland</surname> <given-names>AH</given-names></string-name>. <article-title>The Scottish school leavers cohort: linkage of education data to routinely collected records for mortality, hospital discharge and offspring birth characteristics</article-title>. <source>BMJ Open</source>. <year>2017</year>;<volume>7</volume>(<issue>7</issue>). <pub-id pub-id-type="doi">10.1136/bmjopen-2016-015027</pub-id>.</mixed-citation></ref>
<ref id="ref-14"><label>14</label><mixed-citation publication-type="journal"><string-name><surname>Wood</surname> <given-names>R</given-names></string-name>, <string-name><surname>Clark</surname> <given-names>D</given-names></string-name>, <string-name><surname>King</surname> <given-names>A</given-names></string-name>, <string-name><surname>Mackay</surname> <given-names>D</given-names></string-name>, <string-name><surname>Pell</surname> <given-names>J</given-names></string-name>. <article-title>Novel cross-sectoral linkage of routine health and education data at an all-Scotland level: a feasibility study</article-title>. <source>The Lancet</source>. <year>2013</year>;<volume>382</volume>:<fpage>S10</fpage>. <pub-id pub-id-type="doi">10.1016/S0140-6736(13)62435-6</pub-id>.</mixed-citation></ref>
<ref id="ref-15"><label>15</label><mixed-citation publication-type="journal"><string-name><surname>Doidge</surname> <given-names>JC</given-names></string-name>, <string-name><surname>Harron</surname> <given-names>KL</given-names></string-name>. <article-title>Reflections on modern methods: linkage error bias</article-title>. <source>Int J Epidemiol</source>. <year>2019</year>;<volume>48</volume>(<issue>6</issue>):<fpage>2050</fpage>&#x2013;<lpage>60</lpage>. <pub-id pub-id-type="doi">10.1093/ije/dyz203</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>16</label><mixed-citation publication-type="journal"><string-name><surname>Gilbert</surname> <given-names>R</given-names></string-name>, <string-name><surname>Lafferty</surname> <given-names>R</given-names></string-name>, <string-name><surname>Hagger-Johnson</surname> <given-names>G</given-names></string-name>, <string-name><surname>Harron</surname> <given-names>K</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>L-C</given-names></string-name>, <string-name><surname>Smith</surname> <given-names>P</given-names></string-name>, <etal>et al</etal>. <article-title>GUILD: GUidance for Information about Linking Data sets</article-title>. <source>J Public Health (Oxf)</source>. <year>2018</year>;<volume>40</volume>(<issue>1</issue>):<fpage>191</fpage>&#x2013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1093/pubmed/fdx037</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>17</label><mixed-citation publication-type="website"><collab>ECHILD</collab>. <article-title>The Education and Child Health Insights from Linked Data 2021</article-title> [Available from: <uri>https://www.ucl.ac.uk/child-health/research/population-policy-and-practice-research-and-teaching-department/cenb-clinical-20</uri>.</mixed-citation></ref>
<ref id="ref-18"><label>18</label><mixed-citation publication-type="book"><string-name><surname>Green</surname> <given-names>F</given-names></string-name>, <string-name><surname>Anders</surname> <given-names>J</given-names></string-name>, <string-name><surname>Henderson</surname> <given-names>M</given-names></string-name>, <string-name><surname>Henseke</surname> <given-names>G</given-names></string-name>. <chapter-title>Who Chooses Private Schooling in Britain and Why?</chapter-title> In: <source>Societies LCfLaLCiKEa</source>, editor. <publisher-name>Institute of education</publisher-name>, UCL2017.</mixed-citation></ref>
<ref id="ref-19"><label>19</label><mixed-citation publication-type="journal"><string-name><surname>Harron</surname> <given-names>K</given-names></string-name>, <string-name><surname>Gilbert</surname> <given-names>R</given-names></string-name>, <string-name><surname>Cromwell</surname> <given-names>D</given-names></string-name>, <string-name><surname>van der Meulen</surname> <given-names>J</given-names></string-name>. <article-title>Linking Data for Mothers and Babies in De-Identified Electronic Health Data</article-title>. <source>PLoS One</source>. <year>2016</year>;<volume>11</volume>(<issue>10</issue>):<fpage>e0164667</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0164667</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>20</label><mixed-citation publication-type="journal"><string-name><surname>Zylbersztejn</surname> <given-names>AG</given-names></string-name>, <string-name><surname>Ruth; Hardelid,</surname> <given-names>Pia</given-names></string-name>. <article-title>Impact of changes to data collection on a national birth cohort from administrative health records in England</article-title>. <source>PLoS One</source>. <year>2020</year>.</mixed-citation></ref>
<ref id="ref-21"><label>21</label><mixed-citation publication-type="website"><collab>NHS Digital</collab>. <article-title>A Guide to Linked Mortality Data from Hospital Episode Statistics and the Office for National Statistics</article-title>. <year>2015</year> [Available from: <uri>https://digital.nhs.uk/data-and-information/data-tools-and-services/data-services/linked-hes-ons-mortality-data#ons-mortality-datafrom:</uri>.</mixed-citation></ref>
<ref id="ref-22"><label>22</label><mixed-citation publication-type="website"><collab>(HSCIC) THaSCIC</collab>. <article-title>Methodology for creation of the HES Patient ID (HESID)</article-title> <year>2014</year> [Available from: <uri>https://webarchive.nationalarchives.gov.uk/20180328130852tf_/http://content.digital.nhs.uk/media/1370/HES-Hospital-Episode-Statistics-Replacement-of-the-HES-patient-ID/pdf/HESID_Methodology.pdf/</uri>.</mixed-citation></ref>
<ref id="ref-23"><label>23</label><mixed-citation publication-type="book"><collab>Commons THo</collab>. <chapter-title>Department of Health: The National Programme for IT in the NHS, Twentieth Report of Session 2006&#x2013;07</chapter-title>. <publisher-loc>The Stationery Office Limited</publisher-loc>: <publisher-name>House of Commons, Committee of Public Accounts</publisher-name>; <year>2007</year> <day>11</day>/<month>04</month>/<year>2007</year>.</mixed-citation></ref>
<ref id="ref-24"><label>24</label><mixed-citation publication-type="website"><collab>Statistics OfN</collab>. <article-title>Personal Demographics Service data Office for National Statistics2020</article-title> [Available from: <uri>https://www.ons.gov.uk/census/censustransformationprogramme/administrativedatacensusproject/datasourceoverviews/personaldemographicsservicedata</uri>.</mixed-citation></ref>
<ref id="ref-25"><label>25</label><mixed-citation publication-type="book"><string-name><surname>Boyd</surname> <given-names>A</given-names></string-name>, <string-name><surname>Thomas</surname> <given-names>R</given-names></string-name>, <string-name><surname>Macleod</surname> <given-names>J</given-names></string-name>. <chapter-title>NHS Number and the systems used to manage them: an overview for research users</chapter-title>. <publisher-name>Cohort &#x0026; Longitudinal Studies Enhancement Resources (CLOSER)</publisher-name>: <publisher-loc>Population Health Sciences, Bristol Medical School, University of Bristol</publisher-loc>; <year>2018</year>.</mixed-citation></ref>
<ref id="ref-26"><label>26</label><mixed-citation publication-type="website"><collab>NHS Digital</collab>. <article-title>Personal Demographics Service fair processing</article-title> <year>2020</year> [Available from: <uri>https://digital.nhs.uk/services/demographics/personal-demographics-service-fair-processing#: :text=The%20Personal%20Demographics%20Service%20(PDS,(known%20as%20demographic%20information)</uri>.</mixed-citation></ref>
<ref id="ref-27"><label>27</label><mixed-citation publication-type="book"><string-name><surname>Bohensky</surname> <given-names>M</given-names></string-name>. <chapter-title>Bias in data linkage studies</chapter-title>. In: <string-name><surname>Harron</surname> <given-names>KG</given-names></string-name>, <string-name><surname>Harvey</surname></string-name>; <string-name><surname>Dibben</surname></string-name>, <string-name><surname>Chris</surname></string-name>, editor. <publisher-name>Methodological Developments in Data Linkage</publisher-name>: <publisher-loc>Wiley</publisher-loc>; <year>2016</year>. p. <fpage>63</fpage>&#x2013;<lpage>82</lpage>.</mixed-citation></ref>
<ref id="ref-28"><label>28</label><mixed-citation publication-type="journal"><string-name><surname>Harron</surname> <given-names>KL</given-names></string-name>, <string-name><surname>Doidge</surname> <given-names>JC</given-names></string-name>, <string-name><surname>Knight</surname> <given-names>HE</given-names></string-name>, <string-name><surname>Gilbert</surname> <given-names>RE</given-names></string-name>, <string-name><surname>Goldstein</surname> <given-names>H</given-names></string-name>, <string-name><surname>Cromwell</surname> <given-names>DA</given-names></string-name>, <etal>et al</etal>. <article-title>A guide to evaluating linkage quality for the analysis of linked data</article-title>. <source>Int J Epidemiol</source>. <year>2017</year>;<volume>46</volume>(<issue>5</issue>):<fpage>1699</fpage>&#x2013;<lpage>710</lpage>. <pub-id pub-id-type="doi">10.1093/ije/dyx177</pub-id>.</mixed-citation></ref>
<ref id="ref-29"><label>29</label><mixed-citation publication-type="journal"><string-name><surname>Austin</surname> <given-names>PC</given-names></string-name>. <article-title>Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples</article-title>. <source>Stat Med</source>. <year>2009</year>;<volume>28</volume>(<issue>25</issue>):<fpage>3083</fpage>&#x2013;<lpage>107</lpage>. <pub-id pub-id-type="doi">10.1002/sim.3697</pub-id>.</mixed-citation></ref>
<ref id="ref-30"><label>30</label><mixed-citation publication-type="journal"><string-name><surname>Jay</surname> <given-names>MA</given-names></string-name>, <string-name><surname>Gilbert</surname> <given-names>R</given-names></string-name>. <article-title>Special educational needs, social care and health.</article-title> <source>Arch Dis Child</source>. <year>2020</year>. <pub-id pub-id-type="doi">10.1136/archdischild-2019-317985</pub-id>.</mixed-citation></ref>
<ref id="ref-31"><label>31</label><mixed-citation publication-type="book"><string-name><surname>Bayoumi</surname> <given-names>AM</given-names></string-name>. <chapter-title>STDDIFF: Stata module to compute Standardized differences for continuous and categorical variables</chapter-title>. In: <source>Economics BCDo</source>, editor. <publisher-name>Statistical Software Components</publisher-name> S4582752016.</mixed-citation></ref>
<ref id="ref-32"><label>32</label><mixed-citation publication-type="book"><collab>Statistics OfN</collab>. <chapter-title>Office for National Statistics. Mortality Statistics?</chapter-title>: <source>Metadata</source>. <publisher-name>Office for National Statistics.: The Office for National Statistics (ONS)</publisher-name>; <year>2015</year>. p. <fpage>35</fpage>&#x2013;<lpage>41</lpage>.</mixed-citation></ref>
<ref id="ref-33"><label>33</label><mixed-citation publication-type="journal"><string-name><surname>Zylbersztejn</surname> <given-names>A</given-names></string-name>, <string-name><surname>Gilbert</surname> <given-names>R</given-names></string-name>, <string-name><surname>Hardelid</surname> <given-names>P</given-names></string-name>. <article-title>Developing a national birth cohort for child health research using a hospital admissions database in England: The impact of changes to data collection practices</article-title>. <source>PLOS ONE</source>. <year>2020</year>;<volume>15</volume>(<issue>12</issue>):<fpage>e0243843</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0243843</pub-id>.</mixed-citation></ref>
<ref id="ref-34"><label>34</label><mixed-citation publication-type="journal"><string-name><surname>Austin</surname> <given-names>PC</given-names></string-name>. <article-title>Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples</article-title>. <source>Stat Med</source>. <year>2009</year>;<volume>28</volume>(<issue>25</issue>):<fpage>3083</fpage>&#x2013;<lpage>107</lpage>. <pub-id pub-id-type="doi">10.1002/sim.3697</pub-id>.</mixed-citation></ref>
<ref id="ref-35"><label>35</label><mixed-citation publication-type="journal"><string-name><surname>Fleming</surname> <given-names>M</given-names></string-name>, <string-name><surname>Fitton</surname> <given-names>CA</given-names></string-name>, <string-name><surname>Steiner</surname> <given-names>MFC</given-names></string-name>, <string-name><surname>McLay</surname> <given-names>JS</given-names></string-name>, <string-name><surname>Clark</surname> <given-names>D</given-names></string-name>, <string-name><surname>King</surname> <given-names>A</given-names></string-name>, <etal>et al</etal>. <article-title>Educational and health outcomes of children and adolescents receiving antidepressant medication: Scotland-wide retrospective record linkage cohort study of 766 237 schoolchildren</article-title>. <source>Int J Epidemiol</source>. <year>2020</year>;<volume>49</volume>(<issue>4</issue>):<fpage>1380</fpage>&#x2013;<lpage>91</lpage>. <pub-id pub-id-type="doi">10.1093/ije/dyaa002</pub-id>.</mixed-citation></ref>
<ref id="ref-36"><label>36</label><mixed-citation publication-type="journal"><string-name><surname>Fleming</surname> <given-names>M</given-names></string-name>, <string-name><surname>Fitton</surname> <given-names>CA</given-names></string-name>, <string-name><surname>Steiner</surname> <given-names>MFC</given-names></string-name>, <string-name><surname>McLay</surname> <given-names>JS</given-names></string-name>, <string-name><surname>Clark</surname> <given-names>D</given-names></string-name>, <string-name><surname>King</surname> <given-names>A</given-names></string-name>, <etal>et al</etal>. <article-title>Educational and health outcomes of children treated for asthma: Scotland-wide record linkage study of 683 716 children</article-title>. <source>European Respiratory Journal</source>. <year>2019</year>;<volume>54</volume>(<issue>3</issue>). <pub-id pub-id-type="doi">10.1183/13993003.02309-2018</pub-id>.</mixed-citation></ref>
<ref id="ref-37"><label>37</label><mixed-citation publication-type="journal"><string-name><surname>Holman</surname> <given-names>CDAJ</given-names></string-name>, <string-name><surname>Bass</surname> <given-names>AJ</given-names></string-name>, <string-name><surname>Rouse</surname> <given-names>IL</given-names></string-name>, <string-name><surname>Hobbs</surname> <given-names>MST</given-names></string-name>. <article-title>Population-based linkage of health records in Western Australia: development of a health services research linked database</article-title>. <source>Australian and New Zealand Journal of Public Health</source>. <year>1999</year>;<volume>23</volume>(<issue>5</issue>):<fpage>453</fpage>&#x2013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-842X.1999.tb01297.x</pub-id>.</mixed-citation></ref>
<ref id="ref-38"><label>38</label><mixed-citation publication-type="journal"><string-name><surname>Jones</surname> <given-names>KH</given-names></string-name>, <string-name><surname>Ford</surname> <given-names>DV</given-names></string-name>, <string-name><surname>Jones</surname> <given-names>C</given-names></string-name>, <string-name><surname>Dsilva</surname> <given-names>R</given-names></string-name>, <string-name><surname>Thompson</surname> <given-names>S</given-names></string-name>, <string-name><surname>Brooks</surname> <given-names>CJ</given-names></string-name>, <etal>et al</etal>. <article-title>A case study of the Secure Anonymous Information Linkage (SAIL) Gateway: A privacy-protecting remote access system for health-related research and evaluation</article-title>. <source>Journal of Biomedical Informatics</source>. <year>2014</year>;<volume>50</volume>:<fpage>196</fpage>&#x2013;<lpage>204</lpage>. <pub-id pub-id-type="doi">10.1016/j.jbi.2014.01.003</pub-id>.</mixed-citation></ref>
<ref id="ref-39"><label>39</label><mixed-citation publication-type="website"><collab>Wellcome Trust</collab>. <article-title>Public health research data forum. enabling data linkage to maximise the value of public health research data: full report</article-title>. <year>2015</year> [Available from: <uri>https://wellcome.ac.uk/sites/default/files/enabling-data-linkage-to-maximise-value-of-public-health-research-data-phrdf-mar15.pdf</uri>.</mixed-citation></ref>
<ref id="ref-40"><label>40</label><mixed-citation publication-type="journal"><string-name><surname>Hagger-Johnson</surname> <given-names>G</given-names></string-name>, <string-name><surname>Harron</surname> <given-names>K</given-names></string-name>, <string-name><surname>Goldstein</surname> <given-names>H</given-names></string-name>, <string-name><surname>Aldridge</surname> <given-names>R</given-names></string-name>, <string-name><surname>Gilbert</surname> <given-names>R</given-names></string-name>. <article-title>Probabilistic linkage to enhance deterministic algorithms and reduce data linkage errors in hospital administrative data</article-title>. <source>J Innov Health Inform</source>. <year>2017</year>;<volume>24</volume>(<issue>2</issue>):<fpage>891</fpage>. <pub-id pub-id-type="doi">10.14236/jhi.v24i2.891</pub-id>.</mixed-citation></ref>
<ref id="ref-41"><label>41</label><mixed-citation publication-type="journal"><string-name><surname>Roberts</surname> <given-names>E</given-names></string-name>, <string-name><surname>Doidge</surname> <given-names>JC</given-names></string-name>, <string-name><surname>Harron</surname> <given-names>KL</given-names></string-name>, <string-name><surname>Hotopf</surname> <given-names>M</given-names></string-name>, <string-name><surname>Knight</surname> <given-names>J</given-names></string-name>, <string-name><surname>White</surname> <given-names>M</given-names></string-name>, <etal>et al</etal>. <article-title>National administrative record linkage between specialist community drug and alcohol treatment data (the National Drug Treatment Monitoring System (NDTMS)) and inpatient hospitalisation data (Hospital Episode Statistics (HES)) in England: design, method and evaluation</article-title>. <source>BMJ Open</source>. <year>2020</year>;<volume>10</volume>(<issue>11</issue>):<fpage>e043540</fpage>. <pub-id pub-id-type="doi">10.1136/bmjopen-2020-043540</pub-id>.</mixed-citation></ref>
<ref id="ref-42"><label>42</label><mixed-citation publication-type="book"><string-name><surname>Krausove</surname> <given-names>A</given-names></string-name>, <string-name><surname>Vargas-Silva</surname> <given-names>C</given-names></string-name>. <chapter-title>BRIEFING England: Census Profile</chapter-title>. <publisher-name>The Migration Observatory</publisher-name>, <publisher-loc>University of Oxford</publisher-loc>; <year>2014</year>.</mixed-citation></ref>
<ref id="ref-43"><label>43</label><mixed-citation publication-type="journal"><string-name><surname>Zahoransky</surname> <given-names>D</given-names></string-name>, <string-name><surname>Pol&#x00E1;&#x0161;ek</surname> <given-names>I</given-names></string-name>. <article-title>Text Search of Surnames in Some Slavic and Other Morphologically Rich Languages Using Rule Based Phonetic Algorithms</article-title>. <source>IEEE/ACM Transactions on Audio, Speech, and Language Processing</source>. <year>2015</year>;<volume>23</volume>(<issue>3</issue>):<fpage>553</fpage>&#x2013;<lpage>63</lpage>. <pub-id pub-id-type="doi">10.1109/TASLP.2015.2393393</pub-id>.</mixed-citation></ref>
<ref id="ref-44"><label>44</label><mixed-citation publication-type="book"><string-name><surname>Harron</surname> <given-names>K</given-names></string-name>. <chapter-title>An Introduction to Data Linkage</chapter-title>. In: <source>ADRN Publication UoE</source>, editor. <publisher-name>Better Knowledge Better Society</publisher-name>. <publisher-loc>The Administrative Data Research Network</publisher-loc> <year>2016</year>.</mixed-citation></ref>
<ref id="ref-45"><label>45</label><mixed-citation publication-type="journal"><string-name><surname>Gong</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Oard</surname> <given-names>D</given-names></string-name>. <article-title>Matching person names through name transformation</article-title>. <source>Proceeding of the 18th ACM Conference on Information and Knowledge Management</source>. <volume>2009</volume>:<fpage>1875</fpage>&#x2013;<lpage>8</lpage>.</mixed-citation></ref>
<ref id="ref-46"><label>46</label><mixed-citation publication-type="journal"><string-name><surname>Newcombe</surname> <given-names>HB</given-names></string-name>, <string-name><surname>Fair</surname> <given-names>ME</given-names></string-name>, <string-name><surname>Lalonde</surname> <given-names>P</given-names></string-name>. <article-title>Discriminating powers of partial agreements of names for linking personal records. Part II: The empirical test</article-title>. <source>Methods Inf Med</source>. <year>1989</year>;<volume>28</volume>(<issue>2</issue>):<fpage>92</fpage>&#x2013;<lpage>6</lpage>.</mixed-citation></ref>
<ref id="ref-47"><label>47</label><mixed-citation publication-type="journal"><string-name><surname>Newcombe</surname> <given-names>HB</given-names></string-name>, <string-name><surname>Fair</surname> <given-names>ME</given-names></string-name>, <string-name><surname>Lalonde</surname> <given-names>P</given-names></string-name>. <article-title>Discriminating powers of partial agreements of names for linking personal records. Part I: The logical basis</article-title>. <source>Methods Inf Med</source>. <year>1989</year>;<volume>28</volume>(<issue>2</issue>):<fpage>86</fpage>&#x2013;<lpage>91</lpage>.</mixed-citation></ref>
<ref id="ref-48"><label>48</label><mixed-citation publication-type="book"><string-name><surname>Treeratpituk</surname> <given-names>P</given-names></string-name>, <string-name><surname>Giles</surname> <given-names>CL</given-names></string-name>. <chapter-title>Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching</chapter-title>. <publisher-name>Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence</publisher-name>. <year>2012</year>.</mixed-citation></ref>
<ref id="ref-49"><label>49</label><mixed-citation publication-type="website"><string-name><surname>UK</surname> <given-names>A</given-names></string-name>. <article-title>ECHILD: Linking children&#x2019;s health and education data for England</article-title> <year>2021</year> [Available from: <uri>https://www.adruk.org/our-work/browse-all-projects/echild-linking-childrens-health-and-education-data-for-england-142/</uri>.</mixed-citation></ref>
<ref id="ref-50"><label>50</label><mixed-citation publication-type="website"><string-name><surname>London</surname> <given-names>UC</given-names></string-name>. <article-title>Education and Child Health Insights from Linked Data</article-title> <year>2021</year> [Available from: <uri>https://www.ucl.ac.uk/child-health/research/population-policy-and-practice-research-and-teaching-department/cenb-clinical-20</uri>.</mixed-citation></ref>
<ref id="ref-51"><label>51</label><mixed-citation publication-type="journal"><string-name><surname>Harron</surname> <given-names>K</given-names></string-name>, <string-name><surname>Doidge</surname> <given-names>JC</given-names></string-name>, <string-name><surname>Goldstein</surname> <given-names>H</given-names></string-name>. <article-title>Assessing data linkage quality in cohort studies</article-title>. <source>Annals of Human Biology</source>. <year>2020</year>;<volume>47</volume>(<issue>2</issue>):<fpage>218</fpage>&#x2013;<lpage>26</lpage>. <pub-id pub-id-type="doi">10.1080/03014460.2020.1742379</pub-id>.</mixed-citation></ref>
<ref id="ref-52"><label>52</label><mixed-citation publication-type="book"><string-name><surname>Doidge</surname> <given-names>J</given-names></string-name>, <string-name><surname>Christen</surname> <given-names>P</given-names></string-name>, <string-name><surname>Harron</surname> <given-names>K</given-names></string-name>. <chapter-title>Quality Review: Quality Assessment in Data Linkage</chapter-title>. <publisher-name>Government Analysis Function &#x0026; Office for National Statistics</publisher-name>; <year>2020</year>.</mixed-citation></ref>
<ref id="ref-53"><label>53</label><mixed-citation publication-type="journal"><string-name><surname>Harron</surname> <given-names>K</given-names></string-name>, <string-name><surname>Wade</surname> <given-names>A</given-names></string-name>, <string-name><surname>Gilbert</surname> <given-names>R</given-names></string-name>, <string-name><surname>Muller-Pebody</surname> <given-names>B</given-names></string-name>, <string-name><surname>Goldstein</surname> <given-names>H</given-names></string-name>. <article-title>Evaluating bias due to data linkage error in electronic healthcare records</article-title>. <source>BMC Medical Research Methodology volume</source>. <year>2014</year>;<volume>14</volume>(<issue>36</issue>). <pub-id pub-id-type="doi">10.1186/1471-2288-14-36</pub-id>.</mixed-citation></ref>
<ref id="ref-54"><label>54</label><mixed-citation publication-type="book"><collab>University College London UIoE</collab>, <chapter-title>Centre for Longitudinal Studies, NHS Digital.</chapter-title>. <publisher-name>Next Steps: Linked Health Administrative Datasets (Hospital Episode Statistics)</publisher-name>, <publisher-loc>England</publisher-loc>, <year>1997&#x2013;2017</year>. UK Data Service (2020).</mixed-citation></ref>
<ref id="ref-55"><label>55</label><mixed-citation publication-type="journal"><string-name><surname>Hockley</surname> <given-names>C</given-names></string-name>, <string-name><surname>Quigley</surname> <given-names>M</given-names></string-name>, <string-name><surname>Hughes</surname> <given-names>G</given-names></string-name>, <string-name><surname>Calderwood</surname> <given-names>L</given-names></string-name>, <string-name><surname>Joshi</surname> <given-names>H</given-names></string-name>, <string-name><surname>Davidson</surname> <given-names>L</given-names></string-name>. <article-title>Linking Millennium Cohort data to birth registration and hospital episode records</article-title>. <source>Paediatric and perinatal epidemiology</source>. <year>2008</year>;<volume>22</volume>:<fpage>99</fpage>&#x2013;<lpage>109</lpage>. <pub-id pub-id-type="doi">10.1111/j.1365-3016.2007.00902.x</pub-id>.</mixed-citation></ref>
<ref id="ref-56"><label>56</label><mixed-citation publication-type="website"><collab>NHS Digital</collab>. <article-title>Hospital Episode Statistics data changes in 2021</article-title> <year>2021</year> [Available from: <uri>https://digital.nhs.uk/data-and-information/data-tools-and-services/data-services/hospital-episode-statistics/hospital-episode-statistics-data-changes-in-2021</uri>.</mixed-citation></ref>
<ref id="ref-57"><label>57</label><mixed-citation publication-type="journal"><string-name><surname>Goldstein</surname> <given-names>H</given-names></string-name>, <string-name><surname>Harron</surname> <given-names>K</given-names></string-name>, <string-name><surname>Wade</surname> <given-names>A</given-names></string-name>. <article-title>The analysis of record-linked data using multiple imputation with data value priors</article-title>. <source>Stat Med</source>. <year>2012</year>;<volume>31</volume>(<issue>28</issue>):<fpage>3481</fpage>&#x2013;<lpage>93</lpage>. <pub-id pub-id-type="doi">10.1002/sim.5508</pub-id>.</mixed-citation></ref>
</ref-list>
</back>
</article>