Data safe havens to combine health and genomic data: benefits and challenges IJPDS (2017) Issue 1, Vol 1:327 Proceedings of the IPDLN Conference (August 2016)

Main Article Content

Kerina H Jones
Arron S Lacey
Brian L Perkins
Mark I Rees




Data safe havens can bring together and combine a rich array of anonymised person-based data for research and policy evaluation within a secure setting. To date, the majority of available datasets have been structured micro-data derived from routine health-related records. Possibilities are opening up for the greater reuse of genomic data such as Genome Wide Association studies (GWAS) and Whole Exome/Genome Sequencing (WES or WGS). However, there are considerable challenges to be addressed if the benefits of using these data in combination with health-related data are to be realized safely.

We explore the benefits and challenges of using genomic datasets with health-related data, and using the Secure Anonymised Information Linkage (SAIL) system as a case study, the implications and way forward for Data Safe Havens in seeking to incorporate genomic data for use with health-related data.

The benefits of using GWAS, WES and WGS data in conjunction with health-related data include the potential to explore genetics at a population level and open up novel research areas. These include the ability to increasingly stratify and personalize how medical indications are detected and treated through precision medicine by understanding rare conditions and adding socioeconomic and environmental context to genomic data. Among the challenges are: data availability, computing capacity, technical solutions, legal and regulatory frameworks, public perceptions, individual privacy and organizational risk. Many of the challenges within these areas are common to person-based data in general, and often Data Safe Havens have been designed to address these. But there are also aspects of these challenges, and other challenges, specific to genomic data. These include issues due to the unknown clinical significance of genomic information now or in the future, with corresponding risks for privacy and impact on individuals.

Genomic data sets contain vast amounts of valuable information, some of which is currently undefined, but which may have direct bearing on individual health at some point. The use of these data in combination with health-related data has the potential to bring great benefits, better clinical trial stratification, epidemiology project design and clinical improvements. It is, therefore, essential that such data are surrounded by a properly-designed, robust governance framework including technical and procedural access controls that enable the data to be used safely.


To determine the acceptability of using data on medicines dispensed in primary care to inform out-patient treatment of patients with difficult-to-treat asthma.


Consultant respiratory physicians' access to a summary of all relevant medicines dispensed by community pharmacists to patients with difficult to treat asthma was piloted in 2015 (therapy review (TR)). Dispensed medicine data were collected using the patient’s unique NHS identifier. This information was aggregated monthly for the year before the patient attended their clinic appointment. Patients gave consent and the summary data were used to assess concordance with therapy and inform a discussion about future management.

Semi-structured interviews were conducted with eight patients who had received TR and eight respiratory physicians: two with access to the summary. The interviews aimed to highlight

  • the experiences of patients and physicians on the utility of therapy reviews
  • the views of physicians without access to summaries on the prospective use of therapy reviews.

With the participants consent, interviews were recorded and transcribed. Thematic analysis of grouped responses was conducted using NVivo software.


All physicians agreed that poor compliance remains a significant concern when treating patients with difficult asthma and supported the use of TR. Physicians with experience of TR identified reliability over current methods of assessing compliance; ability to inform future treatment; and assistance in the discussion of concordance as advantages. The lag of three months in available dispensed data was a disadvantage. Physicians without experience of TR raised concern that use may lead to confrontation: reflected in the experience of one patient who expressed that TR discouraged them from improving compliance. Additional interventions are needed to improve compliance. Opinions from other patients were positive and supported the inclusion of TR as part of a consultation.

Physicians with experience of TR found the summary accessible, if access to computers containing specific software to view TR was available. This limitation was considered potentially problematic and physicians without access to TR expressed a preference to accessing TR via NHS Portal - a secure online platform permitting registered users access to patient-level information.


This demonstrates the positive impact of using data about primary care dispensed medicine in secondary care to assess medicine concordance and inform individual patient's ongoing treatment. This supplements other data collected from clinical tests and patient-physician discussion. Development of a more efficient system to access the summary data is required before it is more widely used.

Article Details

How to Cite
Jones, K. H., Lacey, A. S., Perkins, B. L. and Rees, M. I. (2017) “Data safe havens to combine health and genomic data: benefits and challenges: IJPDS (2017) Issue 1, Vol 1:327 Proceedings of the IPDLN Conference (August 2016)”, International Journal of Population Data Science, 1(1). doi: 10.23889/ijpds.v1i1.348.

Most read articles by the same author(s)