Validating the QCOVID risk prediction algorithm for risk of mortality from COVID-19 in the adult population in Wales, UK
Main Article Content
Abstract
Introduction
COVID-19 risk prediction algorithms can be used to identify at-risk individuals from short-term serious adverse COVID-19 outcomes such as hospitalisation and death. It is important to validate these algorithms in different and diverse populations to help guide risk management decisions and target vaccination and treatment programs to the most vulnerable individuals in society.
Objectives
To validate externally the QCOVID risk prediction algorithm that predicts mortality outcomes from COVID-19 in the adult population of Wales, UK.
Methods
We conducted a retrospective cohort study using routinely collected individual-level data held in the Secure Anonymised Information Linkage (SAIL) Databank. The cohort included individuals aged between 19 and 100 years, living in Wales on 24th January 2020, registered with a SAIL-providing general practice, and followed-up to death or study end (28th July 2020). Demographic, primary and secondary healthcare, and dispensing data were used to derive all the predictor variables used to develop the published QCOVID algorithm. Mortality data were used to define time to confirmed or suspected COVID-19 death. Performance metrics, including R2 values (explained variation), Brier scores, and measures of discrimination and calibration were calculated for two periods (24th January–30th April 2020 and 1st May–28th July 2020) to assess algorithm performance.
Results
1,956,760 individuals were included. 1,192 (0.06%) and 610 (0.03%) COVID-19 deaths occurred in the first and second time periods, respectively. The algorithms fitted the Welsh data and population well, explaining 68.8% (95% CI: 66.9-70.4) of the variation in time to death, Harrell’s C statistic: 0.929 (95% CI: 0.921-0.937) and D statistic: 3.036 (95% CI: 2.913-3.159) for males in the first period. Similar results were found for females and in the second time period for both sexes.
Conclusions
The QCOVID algorithm developed in England can be used for public health risk management for the adult Welsh population.
Introduction
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection was first identified in Wuhan, China [1]. On the 24th January 2020, the UK recorded its first case of SARS-CoV-2 and as of 22nd August 2021, there have been 6,492,906 confirmed cases with 131,640 COVID-19-related deaths in the UK [2, 3]. Research has shown that increased age, being male, certain minority ethnic groups, and having pre-existing conditions such as diabetes, cardiovascular disease, and obesity are associated with serious adverse COVID-19 outcomes, including hospitalisation and death [4–9].
To protect the most vulnerable, and to minimise the burden on the National Health Service (NHS) and its staff, it is important to identify those at greatest risk of serious adverse COVID-19 outcomes [10, 11]. COVID-19 risk prediction algorithms can be used to identify and prioritise at-risk individuals for targeting vaccination and treatments as well as to inform risk management decisions and policy as the pandemic evolves [12].
The New and Emerging Respiratory Virus Threats Advisory Group (NERVTAG)’s effort to develop a population risk assessment framework led to the development and validation of the QCOVID tool, a population-based prediction algorithm to predict the risk of being admitted to hospital or dying from COVID-19 across an adult population [3, 13, 14]. The algorithm was initially developed and validated on a cohort of six million primary care patients from 1,205 English practices contributing to the QResearch database, which allows linkage at the individual-level to general practitioner (GP) primary care data, death records, hospital admissions data and COVID-19 test results. Predictive demographic, clinical, and pharmaceutical variables (Box 1) were based on the clinical vulnerability group criteria used to identify those advised to shield at the start of the pandemic, and risk factors associated with adverse outcomes for respiratory diseases [15, 16].
Box 1: List of predictor variables for the QCOVID risk equations
Demographic
- Age in years on 24th January 2020
- Biological sex at birth
- Townsend Deprivation Score
- Ethnicity
- What is your housing category - care home, homeless or neither?
Lifestyle
- Body Mass Index
Conditions on current shielding patient list
- Have you had chemotherapy in the last 12 months?
- Have you had radiotherapy in the last 6 months?
- Have you had a bone marrow or stem cell transplant in the last 6 months?
- Have you had a solid organ transplant (lung, liver, stomach, pancreas, spleen, heart or thymus)?
- Do you have sickle cell disease or severe combined immune deficiency syndromes?
- Do you have cystic fibrosis, bronchiectasis or alveolitis?
- Have you a cancer of the blood or bone marrow such as leukaemia, myelodysplastic syndromes, lymphoma or myeloma and are at any stage of treatment?
- Do you have lung or oral cancer?
- Do you have congenital heart disease or have you had surgery for it in the past?
Conditions moderately associated with increased risk of complications as per current NHS guidance
- Do you have a learning disability or Down’s Syndrome?
- Chronic Kidney Disease (CKD) stage
- Do you have asthma?
- Do you have diabetes?
- Do you have Parkinson’s disease?
- Do you have cerebral palsy?
- Do you have epilepsy?
- Do you have rheumatoid arthritis or Systemic lupus erythematosus?
- Do you have dementia?
- Do you have chronic obstructive pulmonary disease (COPD)?
- Do you have motor neurone disease, multiple sclerosis, myasthenia, or Huntington’s chorea?
- Do you have coronary heart disease?
- Do you have heart failure?
Other medical conditions that investigators hypothesized to confer elevated risk
- Do you have peripheral vascular disease?
- Do you have severe mental illness?
- Have you had a prior fracture of hip, wrist, spine or humerus?
- Do you have atrial fibrillation?
- Do you have cirrhosis of the liver?
- Do you have pulmonary hypertension or pulmonary fibrosis?
- Have you had a thrombosis or pulmonary embolus?
- Have you had a stroke or transient ischaemic attack?
Concurrent medications
- Have you been prescribed immunosuppressants four or more times in the previous 6 months?
- Have you been prescribed anti-leukotriene or long acting beta2-agonists (LABA) four or more times in the previous 6 months?
- Have you been prescribed oral prednisolone containing preparations prescribed four or more times in the previous 6 months?
Replication of results in diverse populations is an important component of scientific research and is especially important for validation of prediction algorithms generated using routine data where the results may be used to plan clinical management of individual patients. It was decided to replicate and compare the performance of the algorithm in each of the four nations in the UK to ensure validity and contribute to the application of the algorithm in managing responses to the outbreak. A recent published study validated the QCOVID predictive algorithm in estimating the risk of mortality from COVID-19 in 35 million adult residents of England by the Office for National Statistics using linked Census 2011 data [17]. The aim of our study was to externally validate the QCOVID risk prediction algorithm to estimate mortality outcomes from COVID-19 in adults in Wales, UK. This paper replicates the English validation study and follows the RECORD and TRIPOD reporting guidelines [18, 19].
Methods
Study design and data sources
This study used routinely collected anonymised health and demographic data held in the Secure Anonymised Information Linkage (SAIL) Databank to create a retrospective population-based individual-level linked e-cohort. The SAIL Databank is a Trusted Research Environment (TRE), which hosts linkable anonymised individual and household-level health, demographic, administrative and environmental data for the population of Wales [20, 21].
Following the emergence of the SARS-CoV-2 infection and the subsequent COVID-19 pandemic, two population-level cohorts (known as C16 and C20) were created to support rapid analysis, provide evidence in understanding the evolving pandemic, and evaluate national interventions attempting to reduce the spread of infection [22]. The C20 contains all individuals alive and living in Wales from 1st January 2020 and followed up until death, emigration/break in Welsh residency, or cohort end date (currently 30th June 2021). This cohort is updated on a monthly basis to extend the available follow-up time. The C16 acts as a contextual comparative cohort and contains all individuals alive and living in Wales on 1st January 2016 and followed up until death, emigration/break in Welsh residency, or 31st December 2019.
For this study, we used the C20 to create a cohort of all individuals aged 19–100 years, living in Wales and registered with a SAIL providing general practice on 24th January 2020. The 24th January 2020 was chosen as the cohort entry as this is the date of the first confirmed COVID-19 case in the UK. Individuals were followed up until death or study end date (28th July 2020), with the study divided into two time periods, 24th January 2020–30th April 2020 and 1st May 2020–28thJuly 2020, to match the English validation study [17]. Individuals who had died prior to 1st May 2020 were excluded from the second time period analysis.
Predictor variables
To validate the QCOVID algorithm, the C20 cohort was linked to the Welsh Longitudinal General Practice (WLGP), Patient Episode Database for Wales (PEDW), Wales Dispensing DataSet (WDDS), and Office for National Statistics (ONS) Census 2011 (CENW) data [23] to derive the pre-existing conditions and demographic characteristics that were used to develop the QCOVID algorithm (Box 1).
The C20 cohort was used to define age, sex, and Townsend score. Townsend score is a measure of deprivation, based on the area of residence, and a higher score implies a higher level of deprivation. The CENW is linked to derive ethnicity (i.e. Bangladeshi, Black African, Black Caribbean, Chinese, Indian, Pakistani, Mixed, Other, and White) [24]. The ethnicity variable had a category corresponding to ‘not recorded/unknown’. This category was used whenever the corresponding value was missing.
The majority of pre-existing conditions were identified in the WLGP primary care data source using Read codes version 2 (CTV2). Where no timeframe was stated, a lookback period from 1st January 1998 to 24th January 2020 was used. For body mass index (BMI), the latest BMI measurement within 5 years to 24th January 2020 was used. BMI records outside this time period as well as BMIs <15 and >47 were set to missing. If an individual had multiple BMI records on the latest date, the highest BMI was included. Predicted values using all QCOVID predictor variables with age interactions from linear regression models, were used to impute any missing BMI values. Recorded BMI is dependent on the condition of interest and healthcare utilisation activity of the individual, therefore, it is possible to have individuals with no BMI recorded when using routinely collected healthcare data. For diabetes, if the latest health record had defined an individual with both type 1 and type 2 diabetes, type 2 took precedence [3]. For the housing covariate, if the latest record defined an individual being homeless and living in a care home, then living in a care home took precedence. For the learning disabilities covariate, if the latest record identified an individual as having learning disabilities and Down’s syndrome, then Down’s syndrome was prioritised.
Office of Population Censuses and Surveys (OPCS) Classification of Interventions and Procedures version 4 (OPCS-4) coded conditions in the inpatient (PEDW) data were used to identify chemotherapy status, Chronic Kidney Disease (CKD) stages, congenital heart disease surgery, bone marrow or stem cell transplant, radiotherapy, and solid organ transplant.
DMD (Dictionary of Medical Devices) coded prescriptions in the WDDS were used to identify individuals who had been dispensed immunosuppressants, anti-leukotriene or long acting beta2-agonists (LABA), or oral prednisolone at least four or more times within 6-months prior to 24th January 2020.
Outcome of interest – death involving COVID-19
We utilised a combination of data held in ONS Annual District Death Extract (ADDE) and Annual District Death Daily (ADDD), Welsh Demographic Service Dataset (WDSD) and Consolidated Death Data Source (CDDS) to identify all deaths, inclusive of in-hospital and out of hospital deaths, of Welsh residents. Deaths involving COVID-19 (confirmed or suspected) were identified using the tenth revision of the International Classification of Diseases (ICD-10) codes U07.1 or U07.2, or from text fields containing the causes of death within the data sources. Time to death from COVID-19 was calculated separately in the first period (24th January 2020–30th April 2020) and the second period (1st May 2020–28th July 2020).
Algorithm validation
The QCOVID risk equations (version 1) reported in the original study were fitted for males and females separately [3, 14]. The original paper utilised the Fine-Gray sub-distribution hazard model which is commonly used to estimate incidence of outcomes where competing risks exist. It relates covariates to the cumulative incidence function (CIF) of the outcome of interest [25, 26]. The following modifications for the Welsh adult population were required due to data issues. At the time of analysis, Systemic Anti-Cancer Therapy (SACT) data were not available, therefore, anyone receiving chemotherapy within 12-months of 24th January 2020 was assigned the chemotherapy group B (middle severity group) coefficients from the original study [27]. Due to low cohort numbers and subsequent outcome numbers for some ethnic groups, we collapsed ethnic groups to ensure ethnic minority populations or groups were not excluded from our study. Black Caribbean individuals were assigned Black African coefficients, Chinese individuals were assigned the coefficients for the Other ethnic group, and, all White ethnic groups were assigned the White British coefficients.
Performance metrics, including measures of discrimination and calibration, were calculated to validate the predicted risk of death from COVID-19 using the QCOVID algorithm at 97 days for the first period and 88 days for the second period [28–30]. We calculated R2 values, D statistic, Harrell’s C statistic and Brier scores with corresponding 95% confidence intervals for the total cohort by sex and over the two time periods. The performance measurements were also calculated by age bands, ethnicity and Townsend deprivation quintiles. The R2 values refer to the proportion of variation in survival time explained by the model while the Brier score measures predictive accuracy. The D statistic and Harrell’s C statistic are discrimination measures that quantify the separation in survival between patients with different levels of predicted risks, and the extent to which people with higher risk scores have earlier events, respectively. To measure calibration, we compared the mean observed and predicted risks within each twentieths of predicted risk (20 groups) for the two time periods. Observed risks were derived in each of the 20 groups using non-parametric estimates of the cumulative incidences.
Results
Overall, there were 1,956,760 individuals aged 19-100 years included in the final analysis for Wales. Of these, 967,975 (49.5%) were male with a mean age of 50.8 (SD 18.7) and the majority of individuals were from White ethnic backgrounds (1,741,527, 89.0%) (Table 1). In comparison with the English validation cohort and original cohort (Supplementary Table 1), these distributions of demographic characteristics were similar except for ethnicity with a lower proportion of individuals from ethnic minority backgrounds in Wales, but also a higher proportion (6.5%) of individuals missing this information (Table 1). The Welsh cohort had similar prevalence of pre-existing conditions when compared to the English validation cohort and original cohort. However, the proportion of people with higher BMI, CKD, respiratory cancer, venous thromboembolism (VTE), coronary heart disease (CHD) and osteoporotic fractures was slightly higher in the Welsh data and slightly lower for immunosuppressant use, dementia, or a serious mental illness compared to the English validation cohort. The proportions of people with missing BMI values, pulmonary hypertension and VTE were slightly higher in the Welsh data compared to original cohort.
Overall cohort | COVID-19 deaths in first period (24 th Jan–30 th Apr 2020) | COVID-19 deaths in second period (1 st May–28 th Jul 2020) | |||||||
---|---|---|---|---|---|---|---|---|---|
N | % | N | % | N | % | ||||
Overall | 1,956,760 | 1192 | 610 | ||||||
Sex | |||||||||
Male | 967,975 | 49.47 | 674 | 56.54 | 299 | 49.02 | |||
Female | 988,785 | 50.53 | 518 | 43.46 | 311 | 50.98 | |||
Age, years | 50.8 | 18.7 | 79.4 | 11.8 | 81.0 | 11.1 | |||
Age group, years | |||||||||
19-29 | 318,681 | 16.29 | * | * | |||||
30-39 | 313,802 | 16.04 | * | * | |||||
40-49 | 304,363 | 15.55 | 16 | 1.34 | * | ||||
50-59 | 353,539 | 18.07 | 61 | 5.12 | 28 | 4.59 | |||
60-69 | 291,042 | 14.87 | 132 | 11.07 | 49 | 8.03 | |||
70-79 | 240,840 | 12.31 | 305 | 25.59 | 136 | 22.30 | |||
80-89 | 111,631 | 5.70 | 429 | 35.99 | 250 | 40.98 | |||
≥90 | 22,862 | 1.17 | 242 | 20.30 | 138 | 22.62 | |||
Ethnicity | |||||||||
Bangladeshi | 7,011 | 0.36 | * | * | |||||
Black^ | 8,312 | 0.42 | * | * | |||||
Indian | 8,885 | 0.45 | * | * | |||||
Mixed | 27,582 | 1.41 | * | * | |||||
Other^ | 27,786 | 1.42 | * | * | |||||
Pakistani | 7,688 | 0.39 | * | 0 | 0.00 | ||||
White | 1,741,527 | 89.00 | 1113 | 93.37 | 579 | 94.92 | |||
Not recorded | 127,969 | 6.54 | 52 | 4.36 | 19 | 3.11 | |||
Townsend deprivation quintile | |||||||||
1 (most affluent) | 335,459 | 17.14 | 156 | 13.09 | 98 | 16.07 | |||
2 | 413,486 | 21.13 | 221 | 18.54 | 129 | 21.15 | |||
3 | 559,024 | 28.57 | 369 | 30.96 | 179 | 29.34 | |||
4 | 453,474 | 23.17 | 304 | 25.50 | 141 | 23.11 | |||
5 (most deprived) | 195,317 | 9.98 | 142 | 11.91 | 63 | 10.33 | |||
Accommodation | |||||||||
Neither homeless nor care home | 1,940,224 | 99.15 | 987 | 82.80 | 476 | 78.03 | |||
Care home or nursing home | 16,536 | 0.85 | 205 | 17.20 | 134 | 21.97 | |||
Body-mass index, kg/m2 | |||||||||
<18.5 | 21,944 | 1.12 | 53 | 4.45 | 33 | 5.41 | |||
18.5 to <25 | 316,569 | 16.18 | 277 | 23.34 | 161 | 26.39 | |||
25 to <30 | 375,501 | 19.19 | 300 | 25.17 | 154 | 25.25 | |||
≥30 | 403,871 | 20.64 | 294 | 24.66 | 114 | 18.69 | |||
Not recorded | 838,875 | 42.87 | 268 | 22.48 | 148 | 24.26 | |||
Chronic kidney disease | |||||||||
No Chronic Kidney disease | 1,874,451 | 95.79 | 869 | 72.90 | 412 | 67.54 | |||
Stage 3 | 72,669 | 3.71 | 252 | 21.14 | 165 | 27.05 | |||
Stage 4 | 3,928 | 0.20 | 30 | 2.52 | 20 | 3.28 | |||
Stage 5 | 5,712 | 0.29 | 41 | 3.44 | 13 | 2.13 | |||
Learning disability | |||||||||
No learning disability | 1,928,040 | 98.53 | 1163 | 97.57 | 587 | 96.23 | |||
Learning disability | 28,486 | 1.46 | 29 | 2.43 | 23 | 3.77 | |||
Down Syndrome | 234 | 0.01 | 0 | 0.00 | 0 | 0.00 | |||
Chemotherapy | 0.00 | ||||||||
No chemotherapy in past 12-months | 1,949,761 | 99.64 | 1167 | 97.90 | 597 | 97.87 | |||
Chemotherapy in past 12-months | 6,999 | 0.36 | 25 | 2.10 | 13 | 2.13 | |||
Cancer and immunosuppression | |||||||||
Blood cancer | 10,547 | 0.54 | 38 | 3.19 | 14 | 2.30 | |||
Respiratory cancer | 5,691 | 0.29 | 20 | 1.68 | 10 | 1.64 | |||
Radiotherapy in past 6-months | 1,827 | 0.09 | * | * | |||||
Bone marrow transplant in past 6-months | 56 | 0.00 | 0 | 0 | 0 | 0.00 | |||
Solid organ transplant | 806 | 0.04 | * | * | |||||
Prescribed immunosuppressant medication by GP | 2,884 | 0.15 | * | * | |||||
Prescribed leukotriene or LABA | 38,658 | 1.98 | 59 | 4.95 | 42 | 6.89 | |||
Prescribed regular prednisolone | 15,819 | 0.81 | 61 | 5.12 | 28 | 4.59 | |||
Other comorbidities | |||||||||
Diabetes | 161,227 | 8.24 | 359 | 30.12 | 178 | 29.18 | |||
COPD | 66,937 | 3.42 | 209 | 17.53 | 100 | 16.39 | |||
Asthma | 290,490 | 14.85 | 186 | 15.60 | 109 | 17.87 | |||
Rare pulmonary diseases | 9,471 | 0.48 | 26 | 2.18 | 12 | 1.97 | |||
Pulmonary hypertension or pulmonary fibrosis | 3,741 | 0.19 | 17 | 1.43 | 14 | 2.30 | |||
Coronary heart disease | 89,686 | 4.58 | 239 | 20.05 | 137 | 22.46 | |||
Stroke | 55,336 | 2.83 | 233 | 19.55 | 121 | 19.84 | |||
Atrial fibrillation | 62,712 | 3.20 | 253 | 21.22 | 140 | 22.95 | |||
Congestive cardiac failure | 30,937 | 1.58 | 151 | 12.67 | 99 | 16.23 | |||
Venous thromboembolism | 43,708 | 2.23 | 111 | 9.31 | 54 | 8.85 | |||
Peripheral vascular disease | 18,639 | 0.95 | 77 | 6.46 | 36 | 5.90 | |||
Congenital heart disease | 17,071 | 0.87 | 30 | 2.52 | 12 | 1.97 | |||
Dementia | 18,840 | 0.96 | 304 | 25.50 | 160 | 26.23 | |||
Parkinson’s disease | 5,717 | 0.29 | 40 | 3.36 | 32 | 5.25 | |||
Epilepsy | 26,112 | 1.33 | 31 | 2.60 | 19 | 3.11 | |||
Rare neurological conditions | 5,789 | 0.30 | * | * | |||||
Cerebral palsy | 1,318 | 0.07 | 0 | 0.00 | 0 | 0.00 | |||
Severe mental illness | 282,709 | 14.45 | 209 | 17.53 | 109 | 17.87 | |||
Osteoporotic fracture | 73,679 | 3.77 | 154 | 12.92 | 96 | 15.74 | |||
Rheumatoid arthritis or SLE | 22,485 | 1.15 | 35 | 2.94 | 16 | 2.62 | |||
Cirrhosis of the liver | 7,210 | 0.37 | 17 | 1.43 | * | ||||
Sickle cell disease | 1,094 | 0.06 | 0 | 0 | 0 | 0.00 |
In total, there were 1,192 (0.06%) COVID-19 deaths during the first period and 610 (0.03%) in the second period, which was similar to the English validation (0.08% and 0.04%, respectively) [16]. In general, individuals who died from COVID-19 during the first period were more likely to be male (674, 56.5%), aged 70 years and older (976, 81.9%), with diabetes, CKD, obesity, and cardio-pulmonary diseases being the pre-existing conditions with the highest proportions of death (Table 1). Individuals who died from COVID-19 during the second period had similar characteristics to the first period, however, with a slight change to the sex ratio (56.5% of deaths in first period were in males compared to 51.0% deaths in the second period were in females).
The performance metrics calculated to validate the predicted risk of death from COVID-19 using the QCOVID algorithm are presented in Table 2 [3, 14]. The metrics have been provided for both sexes and time periods. In the first time-period for males, the algorithm explained 68.8% (95% CI: 66.9–70.4) of the variation in time to death, the Harrell’s C statistic was 0.929 (95% CI: 0.921–0.937), the D statistic was 3.036 (95% CI: 2.913–3.159) and Brier score was 0.0007. Similar results were found for females and in the second time period. Similar results were also found in the English validation, the D statistics was 3.761 (3.732–3.789), Harrell’s C statistic was 0.935 (95% CI: 0.933–0.937) and Brier score was 0.0013 in males in the first period, with similar results found in females and in the second time period [17]. Performance metrics by age band, ethnicity and Townsend deprivation quintile can be found in the Appendices (Supplementary Tables 2–5).
First period (24 th January 2020–30 th April 2020) | Second period (1 st May 2020–28 th July 2020) | ||||
---|---|---|---|---|---|
COVID-19 death in females | COVID-19 deaths in males | COVID-19 death in females | COVID-19 deaths in males | ||
R-squared statistic | 0.691 (0.671–0.710) | 0.688 (0.669–0.704) | 0.721 (0.698–0.742) | 0.711 (0.686–0.733) | |
D statistic | 3.062 (2.922–3.202) | 3.036 (2.913–3.159) | 3.293 (3.113–3.472) | 3.207 (3.024–3.390) | |
Harrell’s C statistic | 0.930 (0.920–0.940) | 0.929 (0.921–0.937) | 0.950 (0.942–0.959) | 0.933 (0.921–0.945) | |
Brier score | 0.0005 | 0.0007 | 0.0003 | 0.0003 |
The Harrell’s C statistic varied across the age bands and time periods (Figures 1, 2), with acceptable discrimination (>0.7) in both time periods for males and females, and across age groups. The oldest group (90+ years old) yielded poorer discrimination for both males and females as well as the youngest male group in the first time period. In the second time period, it was not possible to plot the Harrell’s C statistic for the youngest age groups for females (19-39 and 40-44 years) or for 19-39 years in males due to low numbers. Whilst the Harrell’s C statistic was slightly lower in Wales compared to England across sex and age groups, the pattern of reduced discrimination for certain age groups was similar.
The calibration plots in Figure 3 showed that the predicted and observed risks of COVID-19 related death were similar for both males and females in the first time period, demonstrating the QCOVID equations were well calibrated. However, there was slight under-prediction in the highest risk category for COVID-19 death which was also demonstrated in the English validation and original cohorts [3, 17]. Predicted and observed risks of COVID-19 related death in the second time period can be found in Supplementary Figure 1.
Figure 4 demonstrates that the sensitivity at different absolute risk thresholds for COVID-19-related deaths was higher for females in the top 13 centiles compared to males in the first period and was higher in females than males across the second period. 60.2% and 65.4% of deaths occurred in those in the top 5% for predicted absolute risk of death from COVID-19 in the first time period for males and females respectively; 64.9% and 72.0% of deaths occurred in those in the top 5% for predicted absolute risk of death from COVID-19 in the second time period for males and females, respectively (Supplementary Table 6).
Discussion
The results from this validation of the QCOVID risk prediction algorithm show that the models fit the Welsh population data well and yielded similar results, but with less precision (predictably, given the smaller population size) compared to the English validation and original study. This study used individual-level linked data on the adult population of Wales, registered with a SAIL providing general practice, which is independent of the original and validation study populations [22]. Use of SAIL Databank allowed linkage across primary and secondary health care data with mortality outcome data to allow replication of the original and English validation studies and inclusive of all predictor variables [3, 17].
The risk models from the original QCOVID and English validation paper were based on GP data largely from England [3, 17]. Age standardised death rates in Wales pre-pandemic were about 6% higher than in England [31]. Some differences in prediction accuracy are expected and this is consistent with the higher observed to predicted mortality numbers at the higher end of risk in Figure 3 [32]. The predicted and observed risks of COVID-19-related death were similar across most of the predicted risk distribution, demonstrating the models were well calibrated, (60.2-72.0% of deaths occurred in the top 5% for predicted absolute risk of death), apart from the highest 20th of risk where the risk of death was higher in Wales, as shown in Figure 3. This is similar to the English validation study, which demonstrated 65.9-77.2% of death occurred in individuals in the top 5% for predicted absolute risk of death [17].
The overall Harrell’s C statistic was >0.9 for males and females for both time periods, demonstrating good overall discrimination of the models. Lower and more varied Harrell’s C statistics across the age bands are likely due to a smaller population and more deaths occurring in the first period during the first peak of the UK pandemic [33].
Despite the predictive model performance metrics indicating that the algorithm performed well on the Welsh data, there are a number of limitations. The Welsh cohort was restricted to individuals registered to a SAIL providing general practice, therefore, results are based on 80% population coverage (330/412, of all general practices in Wales). This restriction was necessary due to the amount of predictor variables that required primary care GP data. Whilst we were able to calculate all predictor variables required, 42.8% of our cohort did not have a BMI recorded in the previous five years, therefore, missing observations were imputed. Also, this study was designed to replicate the English validation study and therefore focussed on COVID-19-related deaths, COVID-19-related hospital admissions will be presented in a subsequent paper. Additionally, as highlighted in the English validation study, testing for COVID-19 was limited in the early stages of the pandemic and therefore some of the early deaths might not be recorded as being COVID-19-related. As this study period covers the start of the pandemic, outcomes relate to the COVID-19 Wild type triggered wave and does not include subsequent Alpha and Delta variant waves. Finally, it was not possible to calculate performance metrics for some age groups and ethnic groups. Due to low numbers of some ethnic groups and consequent death we collapsed some ethnic groups to ensure privacy protection whilst including them in our study. We combined Black African and Black Caribbean groups, and Chinese and Other groups. This analysis was carried out on a smaller and less ethnically diverse population compared to the original studies [3, 17].
Conclusion
This validation of the QCOVID algorithm indicates that the risk prediction models are applicable on a population independent of the original study, which has not been reported before. Our validation is based on Welsh primary care registered patients, for whom the QCOVID algorithm was not modelled on, whereas the original study was based on English primary care registered patients. The Welsh validation offers evidence that the QCOVID algorithm can be used for public health risk management and also could be applied to other populations. This study covered the first wave of the pandemic in Wales/the UK; however, with the emergence of new variants of concern, subsequent new waves of infection and changes in presentation in symptoms of SARS-CoV-2 it is important to adapt these algorithms over longer periods and assess their predictive ability in the context of the evolving pandemic. Further work will include applying an updated algorithm to assess the predictive risk of COVID-19 death and hospitalisation over a longer period of time. We will also assess the impact of the national vaccination program to see how changes in immunity level have impacted adverse COVID-19 outcomes.
Acknowledgments
This study makes use of anonymised data held in the SAIL Databank. This work uses data provided by patients and collected by the NHS as part of their care and support and the Understanding Patient Data initiative. We would also like to acknowledge all data providers who make anonymised data available for research. We wish to acknowledge the collaborative partnership that enabled acquisition and access to the de-identified data, and sharing of necessary methodological documentation and scripts which led to this output. This is a collaboration between colleagues at University of Oxford, University of Edinburgh, University of Nottingham, Office for National Statistics, London School of Hygiene and Tropical Medicine, University College London, Office of the Chief Medical Officer, Department of Health and Social Care, NHS Digital, University of Leicester, University of Cambridge, NHS England, Queen Mary University of London, University of Liverpool, Queen’s University Belfast, Association of Local Authority Medical Advisors, Imperial College London, and Swansea University Health Data Research UK. Swansea University Health Data Research UK team is under the direction of the Welsh Government Technical Advisory Cell (TAC) and includes the following groups and organizations: the SAIL Databank, Administrative Data Research (ADR) Wales, Digital Health and Care Wales (DHCW), Public Health Wales, NHS Shared Services Partnership (NWSSP) and the Welsh Ambulance Service Trust (WAST). All research conducted has been completed under the permission and approval of the SAIL independent Information Governance Review Panel (IGRP) project number 0911. KK is supported by the National Institute for Health Research (NIHR) Applied Research Collaboration East Midlands (ARC EM) and the NIHR Leicester Biomedical Research Centre (BRC).
This work was supported by the Con-COV team funded by the Medical Research Council (grant number: MR/V028367/1). This work was supported by Health Data Research UK, which receives its funding from HDR UK Ltd (HDR-9006) and the Medical Research Council (MR/ S027750/1). HDR UK Ltd is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation (BHF) and the Wellcome Trust. This work was supported by the ADR Wales programme of work. The ADR Wales programme of work is aligned to the priority themes as identified in the Welsh Government’s national strategy: Prosperity for All. ADR Wales brings together data science experts at Swansea University Medical School, staff from the Wales Institute of Social and Economic Research, Data and Methods (WISERD) at Cardiff University and specialist teams within the Welsh Government to develop new evidence which supports Prosperity for All by using the SAIL Databank at Swansea University, to link and analyse anonymized data. ADR Wales is part of the Economic and Social Research Council (part of UK Research and Innovation) funded ADR UK (grant ES/S007393/1). This work was supported by the Wales COVID-19 Evidence Centre, funded by Health and Care Research Wales.
Conflicts of interest
AS is a member of the Scottish Government’s COVID-19 Chief Medical Officer’s Advisory Group and its Standing Committee on Pandemics; he is also a member of NERVTAG’s Risk Stratification Subgroup. KK is member of NERVTAG subgroup and member of the Scientific Advisory Group for Emergencies (SAGE). JHC reports grants from National Institute for Health Research (NIHR) Biomedical Research Centre, Oxford, grants from John Fell Oxford University Press Research Fund, grants from Cancer Research UK (CR-UK) grant number C5255/A18085, through the Cancer Research UK Oxford Centre, grants from the Oxford Wellcome Institutional Strategic Support Fund (204826/Z/16/Z) and other research councils, during the conduct of the study. JHC is an unpaid director of QResearch, a not-for-profit organisation which is a partnership between the University of Oxford and EMIS Health who supply the QResearch database used for this work. JHC is a founder and shareholder of ClinRisk ltd and was its medical director until 31st May 2019. ClinRisk Ltd produces open and closed source software to implement clinical risk algorithms (outside this work) into clinical computer systems. JHC is chair of the NERVTAG risk stratification subgroup and a member of SAGE COVID-19 groups and the NHS group advising on prioritisation of use of monoclonal antibodies in COVID-19 infection. RAL is a member of the Welsh Government COVID-19 Technical Advisory Group.
Ethics statement
The data used in this study are available in the SAIL Databank at Swansea University, Swansea, UK, but as restrictions apply they are not publicly available. All proposals to use SAIL data are subject to review by an independent Information Governance Review Panel (IGRP). Before any data can be accessed, approval must be given by the IGRP. The IGRP contains a multidisciplinary professional group, including members of the public, and it gives careful consideration to each project to ensure proper and appropriate use of SAIL data. When access has been granted, it is gained through a privacy protecting safe haven and remote access system referred to as the SAIL Gateway. SAIL has established an application process to be followed by anyone who would like to access data via SAIL at https://www.saildatabank.com/application-process. Participant consent was not required for this study as all data is anonymised and further encrypted.
Abbreviations
SAIL | Secure Anonymised Information Linkage |
NERVTAG | New and Emerging Respiratory Virus Threats Advisory Group |
RECORD | REporting of studies Conducted using Observational Routinely-collected health Data |
WLGP | Welsh Longitudinal General Practice |
PEDW | Patient Episode Database for Wales |
WDDS | Wales Dispensing DataSet |
CENW | Census 2011 data |
BMI | Body Mass Index |
ICD-10 | International Classification of Diseases, Tenth Revision |
OPCS-4 | OPCS Classification of Interventions and Procedures version 4 |
CKD | Chronic Kidney Disease |
ONS | Office for National Statistics |
ADDE | Annual District Death Extract |
ADDD | Annual District Death Daily |
WDSD | Welsh Demographic Service Dataset |
CDDS | Consolidated Death Data Source |
DMD | Dictionary of Medicines and Devices |
SACT | Systemic Anti-Cancer Therapy |
References
-
Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, et al. Clinical features of patients infected With 2019 novel coronavirus in Wuhan, China. The Lancet. 2020;395(10223):497–506. 10.1016/S0140-6736(20)30183-5
https://doi.org/10.1016/S0140-6736(20)30183-5 -
Coronavirus cases: [Internet]. Worldometer. [cited 2021 Aug 23]. Available from: https://www.worldometers.info/coronavirus/
-
Clift AK, Coupland CA, Keogh RH, Diaz-Ordaz K, Williamson E, Harrison EM, et al. Living risk prediction algorithm (QCOVID) for risk of hospital admission and mortality from coronavirus 19 in adults: National derivation and Validation cohort study. BMJ. 2020;371:m3731. 10.1136/bmj.m3731
https://doi.org/10.1136/bmj.m3731 -
Zhou F, Yu T, Du R, Fan G, Liu Y, Liu Z, et al. Clinical course and risk factors for mortality of adult inpatients with Covid 19 in Wuhan, China A retrospective cohort study. The Lancet. 2020;395(10229):1054–62. 10.1016/S0140-6736(20)30566-3
https://doi.org/10.1016/S0140-6736(20)30566-3 -
Harrison SL, Fazio-Eynullayeva E, Lane DA, Underhill P, Lip GY. Comorbidities associated with mortality in 31,461 adults with COVID-19 in the United states: A federated electronic medical record analysis. PLOS Medicine. 2020;17(9). 10.1371/journal.pmed.1003321
https://doi.org/10.1371/journal.pmed.1003321 -
Richardson S, Hirsch JS, Narasimhan M, Crawford JM, McGinn T, Davidson KW, et al. Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City area. JAMA. 2020;323(20):2052. https://doi.owbreakorg/10.1001/jama.2020.6775
-
Singh AK, Gillies CL, Singh R, Singh A, Chudasama Y, Coles B, et al. Prevalence of co-morbidities and their association with mortality in patients with COVID-19: A systematic review and meta-analysis. Diabetes, Obesity and Metabolism. 2020;22(10):1915–24. 10.1111/dom.14124
https://doi.org/10.1111/dom.14124 -
Sattar N, McInnes IB, McMurray JJV. Obesity is a risk factor for severe covid-19 infection: Multiple potential mecahanisms. Circulation. 2020;142(1):4–6. 10.1161/circulationaha.120.047659
https://doi.org/10.1161/circulationaha.120.047659 -
Docherty AB, Harrison EM, Green CA, Hardwick HE, Pius R, Norman L, et al. Features of 20 133 UK patients in hospital With Covid-19 using the ISARIC WHO clinical Characterisation Protocol: Prospective observational cohort study. BMJ. 2020;369:m1985. 10.1136/bmj.m1985
https://doi.org/10.1136/bmj.m1985 -
Smith GD, Spiegelhalter D. Shielding from covid-19 should be stratified by risk. BMJ. 2020;369:m2063. 10.1136/bmj.m2063
https://doi.org/10.1136/bmj.m2063 -
Hollinghurst J, Lyons J, Fry R, Akbari A, Gravenor M, Watkins A, et al. The impact of COVID-19 on adjusted mortality risk in care homes for older adults in Wales, UK: a retrospective population-based cohort study for mortality in 2016–2020. Age and Ageing. 2020;50(1):25–31. 10.1093/ageing/afaa207
https://doi.org/10.1093/ageing/afaa207 -
Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E, et al. Prediction models for diagnosis and prognosis of covid-19: Systematic review and critical appraisal. BMJ. 2020;369:m1328. 10.1136/bmj.m1328
https://doi.org/10.1136/bmj.m1328 -
New and Emerging Respiratory Virus Threats Advisory Group [Internet]. GOV.UK. GOV.UK; 2021 [cited 2021Nov10]. Available from: https://www.gov.uk/government/groups/new-and-emerging-respiratory-virus-threats-advisory-group
-
Welcome to The Qcovid®risk calculator [Internet]. University of Oxford. [cited 2021Aug18]. Available from: https://qcovid.org/
-
Shielded Patient List [Internet]. Nhs choices. NHS; [cited 2021Aug3]. Available from: https://digital.nhs.uk/coronavirus/shielded-patient-list
-
Who is at high risk from coronavirus (clinically extremely vulnerable) [Internet]. Nhs choices. NHS; [cited 2021Aug3]. Available from: https://www.nhs.uk/conditions/coronavirus-covid-19/people-at-higher-risk/who-is-at-high-risk-from-coronavirus-clinically-extremely-vulnerable/
-
Nafilyan V, Humberstone B, Mehta N, Diamond I, Coupland C, Lorenzi L, et al. An external validation of the QCovid risk prediction algorithm for risk of mortality from COVID-19 in adults: a national validation cohort study in England. The Lancet Digital Health. 2021;3(7). 10.1016/S2589-7500(21)00080-7
https://doi.org/10.1016/S2589-7500(21)00080-7 -
Benchimol EI, Smeeth L, Guttmann A, Harron K, Moher D, Petersen I, et al. The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) Statement. PLOS Medicine. 2015;12(10). 10.1371/journal.pmed.1001885
https://doi.org/10.1371/journal.pmed.1001885 -
Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med 2015;162:55–63. 10.7326/M14-0697
https://doi.org/10.7326/M14-0697 -
Lyons RA, Jones KH, John G, Brooks CJ, Verplancke J-P, Ford DV, et al. The SAIL databank: Linking multiple health and social care datasets. BMC Medical Informatics and Decision Making. 2009;9(1). 10.1186/1472-6947-9-3
https://doi.org/10.1186/1472-6947-9-3 -
Ford DV, Jones KH, Verplancke J-P, Lyons RA, John G, Brown G, et al. The SAIL databank: Building a national architecture for e-health research and evaluation. BMC Health Services Research. 2009;9(1). 10.1186/1472-6963-9-157
https://doi.org/10.1186/1472-6963-9-157 -
Lyons J, Akbari A, Torabi F, Davies GI, North L, Griffiths R, et al. Understanding and responding To COVID-19 in Wales: Protocol for a privacy-protecting data platform for enhanced epidemiology and evaluation of interventions. BMJ Open. 2020;10(10). 10.1136/bmjopen-2020-043010
https://doi.org/10.1136/bmjopen-2020-043010 -
Gateway HI [Internet]. 2021 [cited 2021Oct26]. Available from: https://web.www.healthdatagateway.org/dataset/
-
UK data Service: Census data [Internet]. 2011 UK Townsend Deprivation Scores |UK Data Service |Census Data. 2017 [cited 2021Aug18]. Available from: https://statistics.ukdataservice.ac.uk/dataset/2011-uk-townsend-deprivation-scores
-
Austin PC, Steyerberg EW, Putter H. Fine-Gray subdistribution hazard models to simultaneously estimate the absolute risk of different event types: Cumulative total failure probability may exceed 1. Statistics in Medicine. 2021;40(19):4200–12. 10.1002/sim.9023
https://doi.org/10.1002/sim.9023 -
Fine JP, Gray RJ. A Proportional Hazards Model for the Subdistribution of a Competing Risk. J Am Stat Assoc 1999;94:496–509. 10.1080/01621459.1999.10474144
https://doi.org/10.1080/01621459.1999.10474144 -
Coronavirus (COVID-19) risk assessment; [cited 2021Aug18]. Available from: https://digital.nhs.uk/coronavirus/risk-assessment
-
Royston P. Explained Variation for Survival Models. The Stata Journal: Promoting communications on statistics and Stata. 2006;6(1):83–96. 10.1177/2F1536867X0600600105
https://doi.org/10.1177/2F1536867X0600600105 -
Harrell FE, Lee KL, Mark DB. Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine. 1996;15(4):361–87. https://pubmed.ncbi.nlm.nih.gov/8668867/
-
Royston P, Sauerbrei W. A new measure of prognostic separation in survival data. Statistics in Medicine. 2004;23(5):723–48. 10.1002/sim.1621
https://doi.org/10.1002/sim.1621 -
Cornish D. Monthly Mortality Analysis, England and Wales: July 2021 [Internet]. Monthly mortality analysis, England and Wales - Office for National Statistics. Office for National Statistics; 2021 [cited 2021Nov16]. Available from: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/bulletins/monthlymortalityanalysisenglandandwales/july2021
-
Avoidable mortality in the uk: 2019 [Internet]. Avoidable mortality in the UK - Office for National Statistics. Office for National Statistics; 2021 [cited 2021Aug18]. Available from: https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/causesofdeath/bulletins/avoidablemortalityinenglandandwales/2019
-
Person. Excess deaths in your neighbourhood during the coronavirus (covid-19) pandemic [Internet]. Excess deaths in your neighbourhood during the coronavirus (COVID-19) pandemic - Office for National Statistics. Office for National Statistics; 2021 [cited2021Aug23. Available from: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/articles/excessdeathsinyourneighbourhoodduringthecoronaviruscovid19pandemic/2021-08-03