From online banking to biobanking: designing and implementing a data delivery platform for researchers of the China Kadoorie Biobank

Main Article Content

Sam Sansome

Abstract

Introduction
As a large prospective cohort study of 512,891 participants, we routinely integrate data via linkage to population health outcomes to deliver up-to-date research datasets. Building on experience from the private sector, we have implemented a platform that supports the delivery and audit of secure datasets to internal and external researchers.


Objectives and Approach
We aimed to create a platform that delivers secure research datasets for preliminary analyses and fieldwork with dynamic censor dates. It must also provide multiple static versions of an analysis-ready database with fixed censor dates. Individual participant outcome data from population health sources (death and disease registries and health insurance agencies) can be integrated and linked regularly to other health-related data, e.g., genetic, bioinformatics, and medical device. With knowledge of data management strategies used in major financial institutions, we have produced systems that successfully implement the techniques of data warehousing, multiple concurrent environments and secure dataset access and delivery.


Results
As at the end of 2017, we had over 300 registered and approved researchers eligible to request datasets. Using our platform, we have recorded over 150 requests and successfully delivered over 100 de-personalised and encrypted datasets to external researchers around the world. In addition, we have supplied secure datasets to over 20 Global Health MSc and DPhil students studying at The University. The platform currently hosts 4 completed analysis-ready databases with censor dates ranging from 31st December 2013 to 11 years of follow-up as of 31st December 2016. A 5th analysis-ready database (with the most recent outcome data from participants’ death, disease and hospitalisations) is already under development. We plan during 2018 to make available more data sources and outcome data to our external researchers.


Conclusion/Implications
We have developed a versatile platform that delivers secure datasets for researchers from a selection of analysis-ready databases, each with differing censor dates and available data sources. This platform is scalable and can accommodate regular integration of known follow-up data sources along with new and emerging data sources.

Introduction

In Alberta, 2,400 youth with chronic needs transition to adulthood every year, and many are not prepared for this change. Transferring youth from pediatric to adult-oriented care is poorly managed. To improve this process, we need to know how youth patients use health services during this period.

Objectives and Approach

We used the Alberta Health Services Corporate Data Repository (CDR-9), which collects records of ambulatory visits, to define a cohort of patients with chronic disease using pediatric tertiary care; data is available from 2008 to 2016. Personal health numbers allowed for deterministic data linkage to CDR-9, registry data (e.g., death dates, moves out of province), and area deprivation indices. Eligible patients were: (a) between ages 12-15 years in 2008 (for \(\geq\)2 years observation in adulthood, after age 18), (b) involved with a Chronic Care Clinic (CCC) at Alberta Children’s Hospital, and (c) had repeated CCC visits with \(\geq\)3 months between visits.

Results

We identified 26 Chronic Care Clinics (CCC) at Alberta Children’s Hospital (Calgary, Alberta), with stakeholder input. Using CDR-9, a total of 10,111 patients at the hospital were identified who were 12 to 15 years old at the start of the study window (in 2008), and who visited a CCC before age 18. Less than 1% (n=418) were excluded due to moving out of province or having an invalid personal heath number. Final sample sizes were captured across 3 algorithms (A1, A2, A3), based on frequency of CCC visits within a 2-year period: (i) A1: 2 CCC visits (N=4123); (ii) A2: \(\geq\)3 CCC visits (N=2242); (iii) A3: \(\geq\)4 CCC visits (N=1344).

Conclusion/Implications

Our identified cohort of youth affected by chronic conditions is the first of its kind in Alberta, and can answer important questions about patterns of service utilization in other sectors of care. Our next step is to link the cohort to population-level datasets (e.g., physician claims, NACRS, CIHI-DAD).

Article Details

How to Cite
Sansome, S. (2018) “From online banking to biobanking: designing and implementing a data delivery platform for researchers of the China Kadoorie Biobank”, International Journal of Population Data Science, 3(4). doi: 10.23889/ijpds.v3i4.807.