Opening up access to official data with Low Fidelity Synthetic Data (LFSD)
In the latest study on Low Fidelity Synthetic Data, published in the International Journal of Population Data Science (IJPDS), a new framework of four essential checks that balance the use of LFSD whilst protecting patient confidentiality is presented. Professor Gillian Raab of the Scottish Longitudinal Study (SLS), University of Edinburgh and co-authors Sophie McCall of RDS Metadata Catalogue and Liam Cavin from the National Records of Scotland NRS have developed the following four checks:
- labelling – to make sure whoever sees the data knows that it is not the original
- structure - so that it resembles the original
- disclosure - no real or apparent disclosure of confidential information
- documentation – to understand the original records and how the LFSD was created
These four checks, already been embedded in the LFSD creation process at Research Data Scotland, provide a framework to meet key requirements. The study, ‘Four checks for low-fidelity synthetic data: recommendations for disclosure control and quality evaluation’ provides assurance that LFSD that satisfies these checks can be made more widely available for research.
Making government data available safely
Large numbers of confidential records about citizens are held by National Statistics Organizations and other bodies, such as the NHS. They are used to inform policy and enable vital health and other research to take place. Researchers can apply for access to these records, usually in the restricted settings known as Trusted Research Environments (TREs). In her recent report on UK Health Data Prof Cathie Sudlow argues that we are letting patients and their families down by the difficulties researchers meet in accessing these records. However, there are also legitimate concerns about data breaches that might undermine individual privacy and lead to a loss of reputation for the data custodian.
Is High Fidelity Synthetic Data (HFSD) the answer to these problems?
Synthetic datasets are created from original confidential data but contain no records that correspond to real individuals. HFSD is intended to reproduce the results that a researcher would get if the original data had been used. Examples are HFSD on Cancer Registrations made available by Public Health England, prescription data from the MHRA and linked Census data from the Scottish Longitudinal Study (SLS). However, data custodians share a concern that certain types of High Fidelity Synthetic Data has the potential to disclose confidential information and give the wrong answers to important questions.
Low Fidelity Synthetic Data (LFSD)
This is where LFSD can step in to help alleviate the concerns expressed by data custodians as it is created from the original by taking one variable at a time and randomly sorting each one. NHS England’s pilot project has released LFSD for hospital episodes and Research Data Scotland (RDS) allows potential researchers to request LFSD from the RDS metadata catalogue, including data from the National Records of Scotland (NRS) 2001 and 2011 Censuses of Scotland. LFSD is not intended to give the right answers to research questions, but only to let people see what the original data looks like and work out how they might use it.
Professor Gillian Raab highlights the purpose of this paper as "Encouraging best practice by providing 4 simple checks that those planning to release LFSD can use before making it available."
Further links
Article in British Medical Journal
Administrative Data Research UK (ADRUK) statement on synthetic data
Click here to read the full article

Gillian Raab, Professor Emirita, Edinburgh Napier University, part-time Research Fellow, Scottish Centre for Administrative Data (SCADR), University of Edinburgh
Raab, G., McCall, S. and Cavin, L. (2025) “Four checks for low-fidelity synthetic data: recommendations for disclosure control and quality evaluation”, International Journal of Population Data Science, 10(2). doi: 10.23889/ijpds.v10i2.2972.