<?xml version="1.0"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "JATS-journalpublishing1.dtd" [
]>
<article xml:lang="en" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML"
  dtd-version="1.2" article-type="abstract">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">IJPDS</journal-id>
      <journal-title-group>
        <journal-title>International Journal of Population Data Science</journal-title>
        <abbrev-journal-title>IJPDS</abbrev-journal-title>
      </journal-title-group>
      <issn pub-type="epub">2399-4908</issn>
      <publisher>
        <publisher-name>Swansea University</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.23889/ijpds.v10i3.3123</article-id>
      <article-id pub-id-type="publisher-id">10:3:104</article-id>
      <title-group>
        <article-title>Developing a Research-Ready-Data-Asset (RRDA) for Welsh primary care data
          within the SAIL Databank: enhancing data quality and reproducible research.</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Abbasizanjani</surname>
            <given-names initials="H">Hoda</given-names>
          </name>
          <xref ref-type="aff" rid="affil-1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Bedston</surname>
            <given-names initials="S">Stuart</given-names>
          </name>
          <xref ref-type="aff" rid="affil-1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Akbari</surname>
            <given-names initials="A">Ashley</given-names>
          </name>
          <xref ref-type="aff" rid="affil-1">1</xref>
        </contrib>
      </contrib-group>
      <aff id="affil-1"><label>1</label><institution>Swansea University, Swansea, United Kingdom</institution></aff>
      <pub-date date-type="pub" publication-format="electronic">
        <day>01</day>
        <month>06</month>
        <year>2025</year>
      </pub-date>
      <pub-date date-type="collection" publication-format="electronic">
        <year>2025</year>
      </pub-date>
      <volume>8</volume>
      <issue>4</issue>
      <elocation-id>3123</elocation-id>
      <permissions>
        <license license-type="open-access"
          xlink:href="https://creativecommons.org/licences/by/4.0/">
          <license-p>This work is licenced under a Creative Commons Attribution 4.0 International
            License.</license-p>
        </license>
      </permissions>
      <self-uri xlink:href="https://ijpds.org/article/view/3123">This article is available from the
        IJPDS website at: https://ijpds.org/article/view/3123</self-uri>
    </article-meta>
  </front>
  <body>
    <sec>
      <title>Objectives</title>
      <p>We aimed to develop a high-performance RRDA for the Welsh Longitudinal General Practice (WLGP) data to standardise curation, enhance reproducibility, improve query performance and add additional value/features for research. The RRDA provides a curated normalised asset with a comprehensive clinical code look-up and assigned activities type.</p>
    </sec>
    <sec>
      <title>Methods</title>
      <p>WLGP data has a long-format event-list structure with potential data quality issues, including duplicates, re-inserted GP-to-GP-transferred records, and missing/invalid entries. To address these, the RRDA involves three steps: data cleaning, data curation using patient's GP registration history from demographic data, and transforming data into a structured, normalised format to eliminate redundancy and support faster, flexible large-scale queries.</p>
      <p>The WLGP-RRDA includes a look-up of primary care official/local codes (Read-V2/SNOMED/EMIS/Vision). Additionally, we implemented a four-layer approach for identifying healthcare providers, patient access mode, interaction type, and details of individual codes to capture the complexity of activities, enabling patient-practice interaction analysis.</p>
    </sec>
    <sec>
      <title>Results</title>
      <p>Curating WLGP data (1990-2024, 4,565m records, 5m people) revealed significant improvements in data quality and completeness over time, with data retaining rates after cleaning/curation increased from 38% to 94%. Similarly, patient inclusion in WLGP-RRDA improved from 43% to 98% during the same period, indicating improved data accuracy and Welsh residents coverage.</p>
      <p>The normalisation process resulted in an efficient three-table structure with unique integer keys for clinical codes and events, optimising database performance/scalability. The extensive clinical code look-up improved coverage of events with known descriptions, showing increased local/SNOMED code use since the pandemic.</p>
      <p>Additionally, implementing a multi-layered approach to identify interaction types (e.g., face-to-face/remote consultations) using official/local code hierarchies enabled analysis of national trends in GP activities and impact of the pandemic.</p>
    </sec>
    <sec>
      <title>Conclusion</title>
      <p>The WLGP-RRDA development enhanced data quality and streamlines the research processes through a reproducible, maintainable, standardised curation and a multi-layered approach to extract activity types. This methodology/RRDA benefits SAIL users and wider across other environments with similar data, through our shared resources to promote transparency and collaboration.</p>
    </sec>
  </body>
</article>