<?xml version="1.0"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "JATS-journalpublishing1.dtd" [
]>
<article xml:lang="en" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML"
  dtd-version="1.2" article-type="abstract">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">IJPDS</journal-id>
      <journal-title-group>
        <journal-title>International Journal of Population Data Science</journal-title>
        <abbrev-journal-title>IJPDS</abbrev-journal-title>
      </journal-title-group>
      <issn pub-type="epub">2399-4908</issn>
      <publisher>
        <publisher-name>Swansea University</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.23889/ijpds.v10i3.3159</article-id>
      <article-id pub-id-type="publisher-id">10:3:124</article-id>
      <title-group>
        <article-title>Perinatal mental health: the role of social inequalities and domestic abuse
          on maternal outcomes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Loane</surname>
            <given-names initials="M">Maria</given-names>
          </name>
          <xref ref-type="aff" rid="affil-1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Given</surname>
            <given-names initials="J">Joanne</given-names>
          </name>
          <xref ref-type="aff" rid="affil-1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Rosato</surname>
            <given-names initials="M">Michael</given-names>
          </name>
          <xref ref-type="aff" rid="affil-1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Leavey</surname>
            <given-names initials="G">Gerry</given-names>
          </name>
          <xref ref-type="aff" rid="affil-1">1</xref>
        </contrib>
      </contrib-group>
      <aff id="affil-1"><label>1</label><institution>Ulster University, Belfast, United Kingdom</institution></aff>
      <pub-date date-type="pub" publication-format="electronic">
        <day>01</day>
        <month>06</month>
        <year>2025</year>
      </pub-date>
      <pub-date date-type="collection" publication-format="electronic">
        <year>2025</year>
      </pub-date>
      <volume>8</volume>
      <issue>4</issue>
      <elocation-id>3159</elocation-id>
      <permissions>
        <license license-type="open-access"
          xlink:href="https://creativecommons.org/licences/by/4.0/">
          <license-p>This work is licenced under a Creative Commons Attribution 4.0 International
            License.</license-p>
        </license>
      </permissions>
      <self-uri xlink:href="https://ijpds.org/article/view/3159">This article is available from the
        IJPDS website at: https://ijpds.org/article/view/3159</self-uri>
    </article-meta>
  </front>
  <body>
    <sec>
      <title>Objectives</title>
      <p>Surveys are a widely used and important research resource, whose creation and curation
        involve skilled, labour-intensive tasks. This abstract details an initiative to improve the
        tooling available to the data community, to control the risk of unintended disclosure, in
        line with the Anonymisation Decision Making Framework.</p>
    </sec>
    <sec>
      <title>Methods</title>
      <p>An initial step in assessing dataset disclosivity is the identification of key variables
        (KVs) (variables which, when combined, can indicate individual units) and the subsequent
        computation of frequency counts for combinations of these variables. These counts are a
        prerequisite for achieving k-anonymity, but can also be used in further risk calculations.
        Their centrality to our processes prompted us to improve the performance of this algorithm.
        We achieved a significant improvement over the original sdcMicro R package. We do this by
        decomposing KV values into "bitmasks" (0s and 1s) that are then easily manipulable by native
        CPU instructions.</p>
    </sec>
    <sec>
      <title>Results</title>
      <p>In the sdcMicro R package these calculations use the data.table library which, while
        performant, can be improved upon by our algorithm especially in the common case of the
        dataset containing missing values.</p>
      <p>We tested using the UK Quarterly Labour Force survey, on a Dell XPS 15 9520 laptop. Our
        implementation makes maximum use of the number of CPU cores and runs the combinations in
        parallel. For a single combination of 4 KVs we achieve the following<fn>
          <p>Average over 20 iterations</p>
        </fn>.</p>
      <list list-type="bullet">
        <list-item>
          <p>Time to compute bitmasks 0.408s</p>
        </list-item>
        <list-item>
          <p>Time to compute frequencies 0.285s</p>
        </list-item>
        <list-item>
          <p>Total time 0.693s</p>
        </list-item>
      </list>
      <p>For all 4-way combinations from the 8 KVs (70 combinations) we achieve the following.</p>
      <list list-type="bullet">
        <list-item>
          <p>Time to compute bitmasks 1.489s</p>
        </list-item>
        <list-item>
          <p>Time to compute frequencies 28.586</p>
        </list-item>
        <list-item>
          <p>Total time 30.075s</p>
        </list-item>
      </list>
    </sec>
    <sec>
      <title>Conclusion</title>
      <p>Our chief aim is to contribute code to the community, which allows seamless integration of
        these performant computations into regular Python applications. Therefore, following
        publication, this Python wrapped C++ code will be available via GitHub. For broader
        applicability, the integration of our algorithm into sdcMicro itself could prove useful.</p>
    </sec>
  </body>
</article>