<?xml version="1.0"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "JATS-journalpublishing1.dtd" [
]>
<article xml:lang="en" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML"
  dtd-version="1.2" article-type="abstract">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">IJPDS</journal-id>
      <journal-title-group>
        <journal-title>International Journal of Population Data Science</journal-title>
        <abbrev-journal-title>IJPDS</abbrev-journal-title>
      </journal-title-group>
      <issn pub-type="epub">2399-4908</issn>
      <publisher>
        <publisher-name>Swansea University</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.23889/ijpds.v10i3.3249</article-id>
      <article-id pub-id-type="publisher-id">10:3:214</article-id>
      <title-group>
        <article-title>Developing Generalisable Linkage Methodologies for Administrative Datasets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Quinn</surname>
            <given-names initials="R">Leah</given-names>
          </name>
          <xref ref-type="aff" rid="affil-1">1</xref>
        </contrib>
      </contrib-group>
      <aff id="affil-1"><label>1</label><institution>Office for National Statistics, Manchester,
        United Kingdom</institution></aff>
      <pub-date date-type="pub" publication-format="electronic">
        <day>01</day>
        <month>06</month>
        <year>2025</year>
      </pub-date>
      <pub-date date-type="collection" publication-format="electronic">
        <year>2025</year>
      </pub-date>
      <volume>8</volume>
      <issue>4</issue>
      <elocation-id>3249</elocation-id>
      <permissions>
        <license license-type="open-access"
          xlink:href="https://creativecommons.org/licences/by/4.0/">
          <license-p>This work is licenced under a Creative Commons Attribution 4.0 International
            License.</license-p>
        </license>
      </permissions>
      <self-uri xlink:href="https://ijpds.org/article/view/3249">This article is available from the
        IJPDS website at: https://ijpds.org/article/view/3249</self-uri>
    </article-meta>
  </front>
  <body>
    <sec>
      <title>Objectives</title>
      <p>Bespoke data linkage methodologies are time consuming to develop. The Generalisable Linkage
        of Administrative Demographic Index Service (GLADIS) project aims to minimise the need for
        bespoke methods by designing generalisable linkage methodologies to link a variety of
        administrative datasets to a population spine.</p>
    </sec>
    <sec>
      <title>Methods</title>
      <p>A pipeline module was created for each key stage in linkage projects. We will present novel
        methods such as automatically identifying probabilistic parameters without manual user
        involvement, and using record ‘explosion’ to efficiently account for alternative variables
        in clusters by creating additional derived rows for linking. Together, the GLADIS
        methodologies not only allow good quality linkage but also minimise the required user input
        and the necessary expertise of the linkage analyst. Gold standard datasets were used to
        assist in developing and quality assuring each module’s methods, and efficiency was
        prioritised to enable methods to scale to Big Data.</p>
    </sec>
    <sec>
      <title>Results</title>
      <p>As of spring 2025, the pre-processing, deterministic, and blocking modules successfully met
        MVP user requirements, with probabilistic in the final stages. The modules successfully
        scale to population level datasets, and can be used in succession to clean, block, and link
        datasets to a person spine with minimal user input and acceptable precision and recall based
        on gold standard datasets. The modules are flexible, accepting a range of linkage variables
        and formats, and provide users with the ability to use a default matching strategy or to
        customise deterministic matchkeys for their own needs; users are also provided with a choice
        of two probabilistic score thresholds to account for differing quality requirements. Further
        improvements are ongoing to add additional adaptability and functionality.</p>
    </sec>
    <sec>
      <title>Conclusion</title>
      <p>GLADIS will make linked data more accessible, increase the consistency and comparability of
        linked datasets, and improve compatibility for onward linkages. This will support the
        Integrated Data Service; a flagship platform to enable quick, easy access to data to
        facilitate analysis for the public good and inform decision making.</p>
    </sec>
  </body>
</article>