Supporting the timely, easy and cost effective access to high-quality linked data via the Custodian-Controlled Data Repository

Main Article Content

Matthias Schneider Chris Radbone Andrew Stanley Anthony Woollacott Timothy Churches James Farrow Paul Basso Tina Hardin Martin McNamara James Harrison
Published online: Aug 24, 2018

While linkage units perform population data linkage with high efficiency, other parts of the workflow from custodians to researchers remain largely outside the control of linkage operators. Most importantly, resource constraints at data custodians often limit quality control (QC) efforts and lead to delays in the data delivery to researchers.

Objectives and Approach
overcome these challenges, we have created the Data integration Unit (DIU) to undertake content data management and delivery in conjunction with the custodians who remain the principal data curators. Data is managed in the Custodian-Controlled Data Repository (CDDR) a highly secure virtual repository for data storage, analysis and access, established and operated by the DIU. Stringent controls for user access and data flows ensure that data is provided safely by custodians. Overseen by the custodians, DIU staff undertake QC activities, and integrate and deliver multiple datasets for approved linkage projects.

Long-term data storage in the CCDR decreases data custodian workloads by reducing the frequency of content data provision to periodic updates. Feedback loops built into the QC process allow custodians to improve their datasets by learning from data issues identified by the DIU.

Extensive QC undertaken by DIU staff on individual datasets and data validation across multiple datasets held in the CCDR ensure that the quality of data provided to researchers is improved. Moreover, DIU staff dedicated to data integration provide faster content data delivery to researchers. Lastly, the CCDR reduces the number of custodians researchers need to liaise with for data provision.

Operational since February 2018, the DIU has delivered content data for several linkage projects, based on key datasets stored in the CCDR. The incorporation of additional datasets is currently negotiated.

Recognising recent developments in secure analytics infrastructure, the further evolution of the CCDR towards a cloud-based model is anticipated.

Article Details