<?xml version="1.0"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "JATS-journalpublishing1.dtd" [
]>
<article xml:lang="en" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML"
  dtd-version="1.2" article-type="abstract">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">IJPDS</journal-id>
      <journal-title-group>
        <journal-title>International Journal of Population Data Science</journal-title>
        <abbrev-journal-title>IJPDS</abbrev-journal-title>
      </journal-title-group>
      <issn pub-type="epub">2399-4908</issn>
      <publisher>
        <publisher-name>Swansea University</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.23889/ijpds.v9i5.2763</article-id>
      <article-id pub-id-type="publisher-id">9:5:274</article-id>
      <title-group>
        <article-title>Probabilistic Record Linkage for Families (PRLF): A Discussion of the Development and Validation of this Open-Source Linkage Tool</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Prindle</surname>
            <given-names initials="J">John</given-names>
          </name>
          <xref ref-type="aff" rid="affil-1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Suthar</surname>
            <given-names initials="H">Himal</given-names>
          </name>
          <xref ref-type="aff" rid="affil-1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Putnam-Hornstein</surname>
            <given-names initials="E">Emily</given-names>
          </name>
          <xref ref-type="aff" rid="affil-2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Foust</surname>
            <given-names initials="R">Regan</given-names>
          </name>
          <xref ref-type="aff" rid="affil-1">1</xref>
        </contrib>
      </contrib-group>
      <aff id="affil-1"><label>1</label><institution>USC Suzanne Dworak-Peck School of Social Work</institution></aff>
      <aff id="affil-2"><label>2</label><institution>UNC School of Social Work</institution></aff>
      <pub-date date-type="pub" publication-format="electronic">
        <day>18</day>
        <month>09</month>
        <year>2024</year>
      </pub-date>
      <pub-date date-type="collection" publication-format="electronic">
        <year>2024</year>
      </pub-date>
      <volume>9</volume>
      <issue>5</issue>
      <elocation-id>2763</elocation-id>
      <permissions>
        <license license-type="open-access" xlink:href="https://creativecommons.org/licences/by/4.0/">
          <license-p>This work is licenced under a Creative Commons Attribution 4.0 International License.</license-p>
        </license>
      </permissions>
      <self-uri xlink:href="https://ijpds.org/article/view/2763">This article is available from the IJPDS website at: https://ijpds.org/article/view/2763</self-uri>
    </article-meta>
  </front>
  <body>
    <p>Linking administrative records across programs can yield person-centered information, including client characteristics, public service trajectories, and outcomes and help to answer policy-related questions. Several solutions are available for undertaking record linkage, producing linkage keys for merging data sources for positively matched pairs of records. In this session, we will demonstrate a new application of the Python RecordLinkage package to family-based record linkages with machine learning algorithms for probability scoring, which we call probabilistic record linkage for families (PRLF). First, we will demonstrate the utility of PRLF with a simulation of administrative records and assess linkage accuracy with variations in match rates and data degradation. Second, we will compare generalized linear model estimates across three record linkage solutions (PRLF, ChoiceMaker, and Link Plus). Findings from the simulation study indicate linkage accuracy is largely influenced by degradation (e.g., missing data fields, erroneous or incomplete values) compared to the proportion of simulated matches between datasets. Results from the methods comparison using real world data indicate that all three solutions, when optimized, provide similar results for researchers. We discuss the strengths of our process, such as the use of ensemble methods, to improve match accuracy. We then will identify caveats of record linkage in the context of administrative data. The tool was developed in Python to allow for researchers to work with open-source software and adjust the basic workflow to fit their linkage needs. We will identify several partnerships where this collaboration has worked successfully and empower attendees with access to this useful tool.</p>
  </body>
</article>