Linking cohort data with administrative health data to develop a new hypertension prediction model to aid precision health approach

Main Article Content

Mohammad Chowdhury
Tanvir Turin
Alex Leung
Maeve O’Beirne
Khokan Sikdar
Hude Quan


Hypertension is a common medical condition, affecting 1 in 5 Canadians, and is a major risk factor for heart attack, stroke, and kidney disease. Predicting the risk of developing incident hypertension may help to inform targeted preventive strategies.

Objectives and Approach
Identification of major risk factors and incorporation into a multivariable model for risk stratification may help to identify individuals who are at highest risk for developing incident hypertension and would potentially benefit most from intervention. The goal of the proposed research is to develop a robust hypertension prediction model for the general population using the Alberta Tomorrow Project (ATP) cohort data linked with Alberta’s administrative health data. ATP is Alberta's largest population health cohort, contains baseline data on socio-demographic characteristic, personal and family history of disease, medication use, lifestyle and health behavior, environmental exposures, physical measures and bio samples.

Alberta’s administrative health data additionally provides information on health care utilization, enrollment, drugs, physician services, and hospital services. A prediction model for hypertension will be developed using logistic regression where information on candidate variables for the model will be gathered from ATP data and outcome (incident hypertension) will be ascertained from administrative health data (physicians/practitioner claim data and hospital discharge abstract data). Lacking follow-up information in current ATP data has laid the foundation of linking the two data sources through an anonymous unique person identifier (e.g. PHN) that will eventually provide follow-up information on ATP participants who are free of hypertension at baseline developed the disease as well as information on other potential variables.

The proposed prediction model will help to identify individuals at highest risk for developing hypertension and those who may benefit most from targeted healthy behavioral interventions and/or treatment. Such identification of high risk people may help prevent hypertension as well as the continuing costly cycle of managing hypertension and its complications.


Surveys suggest that there is a dichotomy in how citizens view research for public good and research for commercial gain. As a consequence, the idea that a research initiative, such as a learning health system, for both public and commercial benefit may be controversial and reduce public trust.

Objectives and Approach

This study aims to investigate what informed citizens considered to be appropriate uses of health data in a learning health system. Two paired four-day juries were run, with different jurors but the same purpose, expert witnesses and facilitators. Overall, 694 people applied to be jurors; 36 were selected to match criteria based on national demographics and their prior privacy views. Jurors considered whether and why eight exemplar data uses of depersonalised patient data were acceptable. The exemplars were data uses planned by the learning health system initiative to improve care pathways (planned uses), and possible unplanned data uses.


All planned uses were considered appropriate by most, but not all, jurors, as they had the potential of benefitting the public through improving care. Positive health outcomes were more acceptable than improved efficiency of services, given jurors prior beliefs about how the NHS operates raising concerns about whether improving efficiency would lead to inequitable distribution or closure of services. The potential uses were considered appropriate where there were improvements in drugs, treatments, or lower NHS costs. Some jurors became more accepting of commercial uses as they understood them better. Commercial uses that prioritised generating profit and did not produce health benefits for the public were unacceptable, regardless of any safeguards for the data. Commercial gain that occurred secondary to achieving public benefit were generally accepted.


Juries elicit more informed and nuanced judgement from citizens than surveys. Jurors tended to be more accepting of data sharing to both private and public sector after the jury process. Many jurors accept commercial gain if public benefit is achieved. Some were suspicious of data sharing for efficiency gains.

Article Details

How to Cite
Chowdhury, M., Turin, T., Leung, A., O’Beirne, M., Sikdar, K. and Quan, H. (2018) “Linking cohort data with administrative health data to develop a new hypertension prediction model to aid precision health approach”, International Journal of Population Data Science, 3(4). doi: 10.23889/ijpds.v3i4.599.

Most read articles by the same author(s)

1 2 3 4 5 6 > >>