<?xml version="1.0"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "JATS-journalpublishing1.dtd" [
]>
<article xml:lang="en" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML"
  dtd-version="1.2" article-type="abstract">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">IJPDS</journal-id>
      <journal-title-group>
        <journal-title>International Journal of Population Data Science</journal-title>
        <abbrev-journal-title>IJPDS</abbrev-journal-title>
      </journal-title-group>
      <issn pub-type="epub">2399-4908</issn>
      <publisher>
        <publisher-name>Swansea University</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.23889/ijpds.v10i3.3188</article-id>
      <article-id pub-id-type="publisher-id">10:3:156</article-id>
      <title-group>
        <article-title>Using SQL for accessing and working with large administrative data in TREs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Vilcan</surname>
            <given-names initials="T">Tudor</given-names>
          </name>
          <xref ref-type="aff" rid="affil-1">1</xref>
        </contrib>
      </contrib-group>
      <aff id="affil-1"><label>1</label><institution>Office for National Statistics, Titchfied,
        United Kingdom</institution></aff>
      <pub-date>
        <day>01</day>
        <month>06</month>
        <year>2025</year>
      </pub-date>
      <pub-date date-type="collection" publication-format="electronic">
        <year>2025</year>
      </pub-date>
      <volume>8</volume>
      <issue>4</issue>
      <elocation-id>3188</elocation-id>
      <permissions>
        <license license-type="open-access"
          xlink:href="https://creativecommons.org/licences/by/4.0/">
          <license-p>This work is licenced under a Creative Commons Attribution 4.0 International
            License.</license-p>
        </license>
      </permissions>
      <self-uri xlink:href="https://ijpds.org/article/view/3188">This article is available from the
        IJPDS website at: https://ijpds.org/article/view/3188</self-uri>
    </article-meta>
  </front>
  <body>
    <sec>
      <title>Objectives</title>
      <p>The objective of this presentation is two-fold. First, it will explain why SQL is needed
        for large admin data provision and how this is operationalised in practice in TREs. Second,
        it will provide researchers with guidance and instructions for manipulating, cleaning and
        analysing data in SQL.</p>
    </sec>
    <sec>
      <title>Methods</title>
      <p>Administrative datasets such as the invaluable Longitudinal Education Outcomes (LEO) are
        characterized by very large size and number of variables, as well as deep row counts. This
        entails it is not suitable to be provisioned via means such as flat files and software more
        familiar to researchers, such as STATA, SPSS or R. SQL is a valuable and sometimes
        indispensable tool for the provision of such data, as it can provide adequate storage,
        selective access to prevent disclosure risk and tools for data manipulation. The
        presentation will also cover weaknesses of SQL and when other software is more appropriate
        to use.</p>
    </sec>
    <sec>
      <title>Results</title>
      <p>Through this presentation, we hope to ensure researchers will have a much better
        understanding of how and why data is provisioned through SQL, especially large
        administrative datasets. They will also learn techniques and principles of SQL usage,
        alongside useful pieces of code and syntax. In addition, the presentation will highlight
        when SQL should be used and when it is time to employ a different software, as it has its
        distinct strengths and weaknesses.</p>
    </sec>
    <sec>
      <title>Conclusion</title>
      <p>Through understanding the use of SQL to access, manipulate and analyse large administrative
        datasets in TREs, researchers will be able to maximise the potential of their research and
        have a better experience overall.</p>
    </sec>
  </body>
</article>