New automated tracker set to improve data quality and transparency for research
Researchers at the Universities of Sheffield and Aberdeen have successfully created a new automated system that defines and tracks specific properties and relationships of the data within the Grampian Data Safe Haven, one of the four Scottish-government accredited regional safe havens in Scotland.
The research, published in the International Journal of Population Data Science (IJPDS), is an important contribution into increasing open and transparent data processing in Trusted Research Environments (TREs), as the new system will improve the quality and trustworthiness of sensitive data for research.
One of the biggest challenges for sensitive data research is the lack of transparency of how data are prepared for research. For ethical reasons, researchers do not have permission to see the individual, identifiable data, and therefore they must rely on specialist staff to extract the data from databases, link data across multiple datasets, and pseudonymise data on their behalf.
Whilst this is an important step in the process of using data for research, this lack of transparency has the potential for significant errors to be unintentionally introduced in the preparation of data that could have detrimental outcomes for research, particularly if such decisions are not captured or recorded. Even if this information is recorded, it is likely captured across a number of different systems and is a manual process that is both time consuming and inconsistently performed.
The newly created prototype provides a critical contribution to other research organisations wanting to introduce a mechanism for automatically tracking important information about how data is prepared within a trusted research environment (TRE),. It will further assist data analysts, researchers and information governance teams to authenticate and audit data workflows, and is a first step in an interoperable, federated approach to data production.
Lead author, Katherine O’Sullivan, added “It is important to acknowledge that the design and creation of this new automated system was guided through a process of public involvement and engagement to ensure high-quality functionality. To co-create with people who are not directly involved in research was important to us because we have a duty of care to the public in how we safely and securely handle their data. We also have an obligation to researchers to ensure the data we provide them is accurate and correct as their results and recommendations are frequently implemented in clinical and care settings.’
Click here to view the full article
Katherine O’Sullivan, Head of Secure Data Services, University of Sheffield
O'Sullivan, K., Markovic, M., Dymiter, J., Scheliga, B., Odo, C. and Wilde, K. (2025) “Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research Environments”, International Journal of Population Data Science, 10(2). doi: 10.23889/ijpds.v10i2.2464