<?xml version="1.0"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "JATS-journalpublishing1.dtd" [
]>
<article xml:lang="en" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML"
  dtd-version="1.2" article-type="abstract">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">IJPDS</journal-id>
      <journal-title-group>
        <journal-title>International Journal of Population Data Science</journal-title>
        <abbrev-journal-title>IJPDS</abbrev-journal-title>
      </journal-title-group>
      <issn pub-type="epub">2399-4908</issn>
      <publisher>
        <publisher-name>Swansea University</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.23889/ijpds.v9i5.2899</article-id>
      <article-id pub-id-type="publisher-id">9:5:407</article-id>
      <title-group>
        <article-title>Enhancing integration of administrative databases in South Africa's HIV program: Validation of record linkage using non-representative gold standard</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Lauren</surname>
            <given-names initials="E">Evelyn</given-names>
          </name>
          <xref ref-type="aff" rid="affil-1">1</xref>
          <xref ref-type="aff" rid="affil-2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Gareta</surname>
            <given-names initials="D">Dickman</given-names>
          </name>
          <xref ref-type="aff" rid="affil-3">3</xref>
          <xref ref-type="aff" rid="affil-4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Shumba</surname>
            <given-names initials="K">Khumbo</given-names>
          </name>
          <xref ref-type="aff" rid="affil-2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Onoya</surname>
            <given-names initials="D">Dorina</given-names>
          </name>
          <xref ref-type="aff" rid="affil-2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Bor</surname>
            <given-names initials="J">Jacob</given-names>
          </name>
          <xref ref-type="aff" rid="affil-2">2</xref>
          <xref ref-type="aff" rid="affil-3">3</xref>
        </contrib>
      </contrib-group>
      <aff id="affil-1"><label>1</label><institution>Department of Biostatistics, Boston University School of Public Health</institution></aff>
      <aff id="affil-2"><label>2</label><institution>Health Economics and Epidemiology Research Office</institution></aff>
      <aff id="affil-3"><label>3</label><institution>Africa Health Research Institute</institution></aff>
      <aff id="affil-4"><label>4</label><institution>Institute of Social and Preventive Medicine, University of Bern</institution></aff>
      <pub-date date-type="pub" publication-format="electronic">
        <day>18</day>
        <month>09</month>
        <year>2024</year>
      </pub-date>
      <pub-date date-type="collection" publication-format="electronic">
        <year>2024</year>
      </pub-date>
      <volume>9</volume>
      <issue>5</issue>
      <elocation-id>2899</elocation-id>
      <permissions>
        <license license-type="open-access" xlink:href="https://creativecommons.org/licences/by/4.0/">
          <license-p>This work is licenced under a Creative Commons Attribution 4.0 International License.</license-p>
        </license>
      </permissions>
      <self-uri xlink:href="https://ijpds.org/article/view/2899">This article is available from the IJPDS website at: https://ijpds.org/article/view/2899</self-uri>
    </article-meta>
  </front>
  <body>
    <sec>
      <title>Introduction</title>
      <p>Linked administrative data are widely used in epidemiology to capture patient data across multiple databases. Linkage error rates, critical to measure linkage performance, are rarely reported due to difficulty in obtaining representative gold standard. We propose a training and validation approach for linkage procedures that yield unbiased performance estimates even with a non-representative gold standard.</p>
    </sec>
    <sec>
      <title>Methods</title>
      <p>We linked patient records from two non-deduplicated databases for HIV monitoring in South Africa, TIER.Net and NHLS laboratory database, using a network-based probabilistic linkage and deduplication approach. National IDs (gold standard) were available for a non-representative minority of records (10%). We calculated sensitivity (Sen, share of true matches identified by the algorithm) and positive predictive value (PPV, share of algorithm-identified matches that were true matches). We adjusted for bias due to informative missingness in National IDs using inverse probability weights to break the link between missingness and match probability.</p>
    </sec>
    <sec>
      <title>Results</title>
      <p>111,755 record pairs were considered. National IDs were not missing completely at random. Match probabilities for National ID record pairs exhibited substantially less uncertainty (mid-range match probabilities), inflating Sen and PPV. Before bias correction, Sen and PPV were estimated at 97.0% and 97.8% respectively. After bias correction for missing National IDs, Sen and PPV were estimated at 95.7% and 96.6%. Failure to address this bias understated the overlinkage rate (100% - PPV) by 35% and the underlinkage rate (100% - Sen) by 30%.</p>
    </sec>
    <sec>
      <title>Conclusion</title>
      <p>Failure to adjust for informative missingness in the gold standard may lead to biased validation metrics and over/underconfidence in linked data.</p>
    </sec>
  </body>
</article>