Describing a complex primary health care population to support future decision support initiatives
Main Article Content
Abstract
Introduction
Developing decision support tools using data from a health care organization, to support care within that organization, is a promising paradigm to improve care delivery and population health. Descriptive epidemiology may be a valuable supplement to stakeholder input towards selection of potential initiatives and to inform methodological decisions throughout tool development. We additionally propose that to properly characterize complex populations in large-scale descriptive studies, both simple statistical and machine learning techniques can be useful.
Objective
To describe sociodemographic, clinical, and health care use characteristics of primary care clients served by the Alliance for Healthier Communities, which provides team-based primary health care through Community Health Centres (CHCs) across Ontario, Canada.
Methods
We used electronic health record data from adult ongoing primary care clients served by CHCs in 2009-2019. We performed traditional table-based summaries for each characteristic; and applied three unsupervised learning techniques to explore patterns of common condition co-occurrence, care provider teams, and care frequency.
Results
There were 221,047 eligible clients. Sociodemographics: We described 13 characteristics, stratified by CHC type and client multimorbidity status. Clinical characteristics: Eleven-year prevalence of 24 investigated conditions ranged from 1% (Hepatitis C) to 63% (chronic musculoskeletal problem) with non-uniform risk across the care history; multimorbidity was common (81%) with variable co-occurrence patterns. Health care use characteristics: Most care was provided by physician and nursing providers, with heterogeneous combinations of other provider types. A subset of clients had many issues addressed within single-visits and there was within- and between-client variability in care frequency. In addition to substantive findings, we discuss methodological considerations for future decision support initiatives.
Conclusions
We demonstrated the use of methods from statistics and machine learning, applied with an epidemiological lens, to provide an overview of a complex primary care population and lay a foundation for stakeholder engagement and decision support tool development.
Introduction
Increasing amounts of “everyday data”, such as that created as a by-product of healthcare delivery, combined with advancements in computing resources and analytical techniques, provide unprecedented opportunities to increase knowledge and support health [1–7]. We are focused on the use of electronic health record (EHR) data from a particular healthcare organization to develop decision support tools whereby the findings or end-product are intended to benefit the organization or the population it serves. This paradigm can support entire populations, including subgroups who have historically been excluded from medical research and clinical guideline development, such as those with complex health needs or barriers to participation [8–11]. Using care-derived data to develop tools that will support the same population(s) that gave rise to the data can happen at any scale: a single clinic, entire organization encompassing several clinics, or other health system notion that represents a defined population [1–4].
Primary care, first contact care provided in a community setting over the life course, is inherently complex [12, 13]. In Canada primary care is covered by provincial healthcare plans, with several different reimbursement models. The Alliance for Healthier Communities employs salaried family physicians who work with other provider types and offer team-based primary health care through 72 Community Health Centres (CHCs) across Ontario. Each CHC is community-governed and motivated by geographic or social need; in general, their focus is on serving clients who face barriers to care and challenges, such as poverty and mental illness, that increase their risk for poor health [14–16]. Compared with other primary care models in Ontario, the population served by CHCs tends to include a larger proportion of people in low income neighbourhoods, who are new to Canada, and have serious mental illness and/or chronic conditions, while exhibiting lower than expected emergency department rates [14, 17]. Population health is a central element of the Alliance care model, which officially adopted a learning health system model in October 2020, demonstrating a formal commitment to using their data to inform and improve care [18, 19].
Multiple different types of decision support initiatives may be pursued. A first step towards any initiative is identifying needs of clients and providers, which is often driven by internal stakeholders [4]. Descriptive epidemiology is instrumental in outlining health states and needs of populations [20], and may be beneficial to add into these early stages of problem selection and project development both to identify new areas to explore and to support existing ideas. For example, describing how clients are represented in EHR data at a population level may complement clinical experience to identify potential bias or misrepresentation that analyses need to account for to obtain meaningful results [21–23]. In addition to proposed benefits for organizational initiatives, descriptive studies are needed to contribute towards closing the gap in understanding about the basic functions of primary care in general [24].
Historically, more methodological research and attention has been given to analytic epidemiology than descriptive epidemiology, despite descriptive studies being common and valuable in research and public health contexts [25]. To properly understand complex EHR data, we propose using both simple statistical techniques traditionally used in descriptive epidemiology and more complex techniques from machine learning, applied with an epidemiological lens. Simple techniques alone may provide an oversimplified or incorrect view of certain characteristics, which could lead to ineffective or harmful decisions later-on. So, in pursuing our primary purpose of better understanding care provided by the Alliance, we explore the suitability of a variety of techniques for epidemiology of a separate primary care system with its own EHR.
We present the first large-scale descriptive and exploratory study of data for a large longitudinal cohort receiving primary care services (in our case, 220,000+ clients served by the Alliance) using a combination of traditional statistical and machine learning methodology. Our objective was to summarize sociodemographic, clinical, and health care use characteristics of this population. We used unsupervised learning techniques to identify patterns of multimorbidity, care provider teams, and care access frequency. Findings provide a foundation for future decision support and machine learning initiatives at the Alliance, including those related to their existing interest in using EHR data to segment populations and tailor care. In addition to substantive findings, this work more generally demonstrates the application of an epidemiological lens and use of a variety of methods from statistics and machine learning to effectively describe a complex population and contribute to early stages of an organization’s journey to harness value from their EHR data.
Methods
Study population and data source
The Alliance Business Intelligence and Reporting Tools team prepared a de-identified extract of the centralized, structured EHR database from all CHCs for our research use; clients were given unique identifiers to allow tracking of care over time. All CHCs use the same standardized consent for clients, which is on an opt-out basis for research reporting aggregate results such as ours. All providers are expected to record their client interactions into the EHR. This is the legal chart record and employees would have this in their job description and work expectations. All CHCs shifted to a common EHR and data standards in 2000. Not everyone immediately started using the EHR fully, but all standardized data was required at that time. Issues addressed during care are recorded using Electronic Nomenclature and Classification Of Disorders and Encounters for Family Medicine (ENCODE-FM) [26] and International Classification of Disease (ICD)-10 vocabularies [27].
Primary care EHRs represent an open cohort; Supplementary Appendix 1 (Supplementary Figure 1) shows the cohort size along calendar- and observation-based time definitions. Clients eligible for inclusion were over 18 years old in 2009, indicated a CHC as their primary care provider, and had at least one encounter at a CHC in 2009 to 2019. Any additional eligibility for specific analyses is described as needed below.
We followed RECORD reporting guidelines (Supplementary Appendix 2) [28]. Stakeholder engagement throughout the research included the Alliance Director of Research and Evaluation (JR) being involved in all stages of planning and conduct of this research, as well as some preliminary one on one conversations with care providers at CHCs. Two authors (JR and MZ) were also involved in a parallel qualitative study investigating aspects of the Alliance’s learning health system journey [29, 30].
General analysis plan
Sociodemographic, clinical, and health care use characteristics are defined in Supplementary Appendix 3 (Supplementary Table 3). Methods specific to each category are described below; we performed “table-based summaries” for all, whereby categorical variables were summarized by counts and percentages, and continuous variables by the range, median, mean, and standard deviation. Where specified, findings were stratified by client multimorbidity status (defined below) or CHC “Urban At-Risk” status, referring to CHCs located in major urban geographical areas that serve priority populations defined by homelessness and/or mental health and substance use challenges [17]. Given their unique care mandate, these CHCs are considered a distinct “peer group” within the larger organization; another CHC peer group includes those with Rural Geography. CHCs without special designation still focus on clients with barriers to care but may serve those in rural or urban settings and do not solely serve clients with the aforementioned complexities [17].
Sociodemographic characteristics
We conducted table-based summaries for select fields from the structured EHR client characteristic table and for certain ENCODE-FM-derived variables, treated as present if ever recorded in a client’s EHR. Missingness of the former occurred at the 1) CHC or provider level, whereby a client was not asked about the characteristic and 2) client level, whereby a client was asked and preferred to not respond. Results are presented overall and stratified by Urban At-Risk CHC and multimorbidity status.
Clinical characteristics
We investigated 20 chronic conditions that define multi−morbidity in primary care research [31–33] and an additional four conditions of interest identified by Alliance stakeholders (Hepatitis C, smoking or tobacco use, substance use, lonely or isolated). For each condition, clients were assumed to receive related care upon the first record of a relevant code. We explored conditions in single, composite, and pairwise manners.
Prevalence and incidence
To provide different perspectives on clinical complexity, we calculated two measures of prevalence and one measure of incidence for each of the 24 conditions. We also calculated prevalence of multimorbidity. Our primary multimorbidity definition, including for stratification, was presence of at least three of the 20 chronic conditions [31–33]. We also looked at multimorbidity of at least two conditions, as this is another commonly used definition [32].
- Eleven-year period prevalence, based on calendar time, to assess the burden of conditions over the entire observation period (2009–2019). For each condition, we divided the number of clients who ever received a condition indication by an estimate of the average population size (technical details in Supplementary Appendix 3). Sensitivity analyses included the largest possible denominator: total number of eligible clients, and the smallest reasonable denominator: starting with the middle calendar year (2014), additional clients with at least one visit in adjacent years were added until no prevalence estimate was over 100%. Results are shown overall and Urban At-Risk CHC-stratified.
- Observation-based period prevalence, based on length of client observation, to assess the burden of conditions dependent on the number of years clients received care at a CHC. To calculate this, we separated clients into 11 sub-cohorts based on the number of years (consecutive 365.25 day intervals, rounded up) between their first and last recorded events. For each sub-cohort and condition, we divided the number of clients who ever received a condition indication by the number of clients in the sub-cohort. Results are presented as bar graphs.
- Cumulative incidence, to assess the rate of condition indications by days of observation. We plotted cumulative incidence curves using the R package survival [34]. To prioritize capture of incident condition-related care, we excluded clients with conditions recorded in 2009 from this analysis.
Condition co-occurrence patterns
To assess co-occurrence for each pair of conditions while adjusting for all of the other conditions, we estimated an Ising model (unsupervised machine learning technique) using R package MRFcov [35, 36] for all conditions except Hepatitis C (Alliance-suggested condition that overlaps with one of the 20 chronic conditions). We converted coefficients, representing the strength of association between each condition pair adjusted for all other conditions, to odds ratios and interpreted size using Chen et al. (2010) guidelines [37]. We also viewed the top frequency-based co-occurrences.
Health care use characteristics
We performed table-based summaries of provider and care access characteristics overall and stratified by Urban At-Risk CHC, Rural Geography CHC, and client multimorbidity status.
Providers involved
To identify common care provider teams that clients were exposed to across their care histories, we used non-negative matrix factorization (NMF) [38] (unsupervised machine learning technique) to identify frequently-occurring: 1) “Ever-seen” teams whereby dummy variables were used to indicate whether each provider type was ever involved in care, and 2) Relative “amount-seen” teams based on volume of care whereby the number of events associated with each provider type was normalized within clients. For each version, we ran analyses allowing 2,3,5,10, and 15 topics (provider teams) with the Python package sklearn.decomposition.NMF and the Kullback-Leibler divergence distance metric [39]. We interpreted resulting topics by visual inspection. Provider types were maintained as recorded in the EHR except “Other,” “Unknown,” and “Undefined” were combined. We also summarized the top frequency-based provider types involved in care and referrals. Eligible clients required at least one provider type indication in their EHR.
Care access patterns
We measured complexity of care as the number of events (distinct issues addressed or types of care received) per visit (calendar day of access) to a CHC, and care frequency as the number of calendar days at least one event was recorded per year (365.25 day intervals) and per quarter-year (90.30 day intervals). To investigate frequency of care in terms of magnitude and shape (changes in magnitude across care histories), we performed time series clustering (unsupervised machine learning technique) with the K-Medoids algorithm and dynamic time warping distance metric [40] for 1) short-term clients with 2–3 observation years and 2) long-term clients with 8–10 observation years. For each time interval and cohort, we used R package dtwclust [41] to identify 2,3,4, and 5 clusters. Performance was assessed using the silhouette score and visual inspection.
Results
There were 221,047 eligible clients (Supplementary Appendix 3). Note that we may only be using a subset of the CHC care history for the 64,504 (29.18%) who had at least one care indication in 2009, 141,627 (64.07%) in 2019, and 40,704 (18.4%) who received care in both of these “cohort end” years.
Sociodemographic characteristics
Sociodemographic characteristics are described in Table 1, with remaining sub-strata in Supplementary Appendix 3 Supplementary Table 2. The Urban At-Risk CHCs tended to provide care to clients who were more commonly English-speaking, and had lower levels of education, household income, immigration, stable housing, and/or food security. Clients with multimorbidity tended to be older and more commonly female, reside in rural locations, and had lower levels of education, immigration, stable residence, and/or food security.
Characteristic | Values | All clients n (%) | Urban at risk CHCa n (%) | Multimorbidity n (%) |
Number of clients | 221 047 | 35 998 | 103 172 | |
Age in 2015 | 25-34 | 55 505 (25.11) | 7976 (22.16) | 9346 (9.06) |
35-44 | 45 646 (20.65) | 7540 (20.95) | 15 542 (15.06) | |
45-54 | 44 653 (20.2) | 8186 (22.74) | 23 982 (23.24) | |
55-64 | 37 848 (17.12) | 6790 (18.86) | 25 578 (24.79) | |
65-74 | 23 162 (10.48) | 3644 (10.12) | 17 780 (17.23) | |
75+ | 14 233 (6.44) | 1862 (5.17) | 10 944 (10.61) | |
Geography | Rural | 49 275 (22.29) | 6131 (17.03) | 26 818 (25.99) |
Urban | 167 728 (75.88) | 28 538 (79.28) | 75 011 (72.70) | |
Missing | 4044 (1.83) | 1329 (3.69) | 1343 (1.30) | |
Sex | Female | 127 070 (57.49) | 18 699 (51.94) | 59 946 (58.10) |
Male | 93 294 (42.21) | 17 151 (47.64) | 43 124 (41.80) | |
Other | 331 (0.15) | 43 (0.12) | 19 (0.02) | |
Missing | 352 (0.16) | 105 (0.29) | 83 (0.08) | |
Gender | Female | 41 352 (18.71) | 5509 (15.30) | 21 831 (21.16) |
Gender diverse | 340 (0.15) | 112 (0.31) | 144 (0.14) | |
Male | 29 366 (13.28) | 4585 (12.74) | 14 733 (14.28) | |
Prefer not to answer | 1001 (0.45) | 51 (0.14) | 376 (0.36) | |
Missing | 148 988 (67.4) | 25 741 (71.51) | 66 088 (64.06) | |
Sexual Orientation | Bisexual | 1578 (0.71) | 285 (0.79) | 690 (0.67) |
Gay | 708 (0.32) | 192 (0.53) | 306 (0.30) | |
Heterosexual | 57 065 (25.82) | 8447 (23.47) | 29 105 (28.21) | |
Lesbian | 485 (0.22) | 70 (0.19) | 244 (0.24) | |
Queer | 323 (0.15) | 34 (0.09) | 91 (0.09) | |
Two-Spirit | 128 (0.06) | 80 (0.22) | 61 (0.06) | |
Other | 246 (0.11) | 34 (0.09) | 143 (0.14) | |
Do not know | 924 (0.42) | 201 (0.56) | 485 (0.47) | |
Prefer not to answer | 7561 (3.42) | 877 (2.44) | 4078 (3.95) | |
Missing | 152 029 (68.78) | 25 778 (71.61) | 67 969 (65.88) | |
Highest Level of Education | Post-secondary or equivalent | 84 888 (38.4) | 12 056 (33.49) | 35 763 (34.66) |
Secondary or equivalent | 61 831 (27.97) | 11 783 (32.73) | 32 617 (31.61) | |
Less than high school | 18 941 (8.57) | 3266 (9.07) | 10 618 (10.29) | |
Other | 8507 (3.85) | 719 (2.00) | 4078 (3.95) | |
Do not know | 4860 (2.20) | 1318 (3.66) | 2350 (2.28) | |
Prefer not to answer | 2950 (1.33) | 422 (1.17) | 1585 (1.54) | |
Missing | 39 070 (17.67) | 6434 (17.87) | 16 161 (15.66) | |
Primary Language | English | 167 163 (75.62) | 31 658 (87.94) | 79 599 (77.15) |
French | 22 547 (10.20) | 944 (2.62) | 11 091 (10.75) | |
Other | 26 847 (12.15) | 2948 (8.19) | 10 710 (10.38) | |
Missing | 4490 (2.03) | 448 (1.24) | 1772 (1.72) | |
Race and Ethnicity | Black | 8861 (4.01) | 725 (2.01) | 3757 (3.64) |
East/Southeast Asian | 3739 (1.69) | 484 (1.34) | 1545 (1.50) | |
Indigenous | 2944 (1.33) | 1577 (4.38) | 1641 (1.59) | |
Latino | 4350 (1.97) | 206 (0.57) | 1708 (1.66) | |
Middle Eastern | 2046 (0.93) | 344 (0.96) | 838 (0.81) | |
Other | 567 (0.26) | 148 (0.41) | 306 (0.30) | |
South Asian | 3597 (1.63) | 323 (0.90) | 1852 (1.80) | |
White | 38 464 (17.4) | 4531 (12.59) | 21 504 (20.84) | |
Do not know | 838 (0.38) | 151 (0.42) | 487 (0.47) | |
Prefer not to answer | 2649 (1.20) | 261 (0.73) | 1513 (1.47) | |
Missing | 152 992 (69.21) | 27 248 (75.69) | 68 021 (65.93) | |
Years Since Arrival in Canada | 0to5yr | 13 654 (6.18) | 1191 (3.31) | 3047 (2.95) |
6+ | 51 815 (23.44) | 4940 (13.72) | 22 722 (22.02) | |
None recorded | 155 578 (70.38) | 29 867 (82.97) | 77 403 (75.02) | |
Household Income | $0 to $14,999 | 40 519 (18.33) | 8729 (24.25) | 17 757 (17.21) |
$15,000 to $24,999 | 21 102 (9.55) | 3555 (9.88) | 11 081 (10.74) | |
$25,000 to $39,999 | 20 877 (9.44) | 2988 (8.30) | 10 736 (10.41) | |
$40,000 to $59,999 | 17 245 (7.80) | 2421 (6.73) | 8671 (8.40) | |
$60,000 or more | 28 494 (12.89) | 3862 (10.73) | 12 868 (12.47) | |
Do not know | 15 408 (6.97) | 2658 (7.38) | 6264 (6.07) | |
Prefer not to answer | 27 621 (12.50) | 4130 (11.47) | 14 890 (14.43) | |
Missing | 49 781 (22.52) | 7655 (21.27) | 20 905 (20.26) | |
Household Composition | Couple with children | 53 398 (24.16) | 6759 (18.78) | 20 713 (20.08) |
Couple without child | 39 664 (17.94) | 5945 (16.51) | 22 950 (22.24) | |
Extended family | 7632 (3.45) | 1123 (3.12) | 3581 (3.47) | |
Grandparents with grandchild(ren) | 1746 (0.79) | 247 (0.69) | 1183 (1.15) | |
Siblings | 1622 (0.73) | 250 (0.69) | 669 (0.65) | |
Single parent | 14 445 (6.53) | 2527 (7.02) | 6348 (6.15) | |
Sole member | 32 782 (14.83) | 7445 (20.68) | 18 597 (18.03) | |
Unrelated housemates | 8622 (3.90) | 1567 (4.35) | 2849 (2.76) | |
Other | 8913 (4.03) | 1476 (4.10) | 4202 (4.07) | |
Do not know | 2475 (1.12) | 643 (1.79) | 1279 (1.24) | |
Prefer not to answer | 3727 (1.69) | 491 (1.36) | 1927 (1.87) | |
Missing | 46 021 (20.82) | 7525 (20.90) | 18 874 (18.29) | |
Stable Residence | True | 199 349 (90.18) | 28 227 (78.41) | 90 479 (87.70) |
Food Insecurity | True | 10 985 (4.97) | 2947 (8.19) | 7323 (7.10) |
Clinical characteristics
Prevalence and incidence
Eleven-year period prevalence estimates ranged from 1.48% (Hepatitis C) to 80.97% (multimorbidity of two conditions) overall, with generally higher estimates in Urban At-Risk CHC strata (Table 2). The low sensitivity estimate for the denominator was based on 2012–2015 (n = 148 595).
Condition | All clients n (%) | Urban at risk CHCa n (%) |
Denominatorb | 165 125 | 27 256 |
Hypertension | 68 177 (41.29) | 12 304 (45.14) |
Depression or anxiety | 23 828 (14.43) | 5533 (20.30) |
Chronic musculoskeletal | 104 304 (63.17) | 18 842 (69.13) |
Arthritis | 37 201 (22.53) | 6906 (25.34) |
Osteoporosis | 11 462 (6.94) | 1950 (7.15) |
Asthma or COPDc or chronic bronchitis | 43 837 (26.55) | 9190 (33.72) |
Cardiovascular disease | 23 311 (14.12) | 4673 (17.14) |
Heart failure | 7994 (4.84) | 1564 (5.74) |
Stroke or TIAd | 2967 (1.80) | 585 (2.15) |
Stomach problem | 36 175 (21.91) | 7620 (27.96) |
Colon problem | 24 949 (15.11) | 4974 (18.25) |
Chronic hepatitis | 13 288 (8.05) | 2954 (10.84) |
Diabetes | 35 704 (21.62) | 6912 (25.36) |
Thyroid disorder | 24 793 (15.01) | 4217 (15.47) |
Any cancer | 14 024 (8.49) | 2636 (9.67) |
Kidney disease or failure | 8290 (5.02) | 1555 (5.71) |
Chronic urinary problem | 59 677 (36.14) | 11 131 (40.84) |
Dementia or Alzheimer’s disease | 4776 (2.89) | 898 (3.29) |
Hyperlipidemia | 67 175 (40.68) | 11 659 (42.78) |
Obesity | 38 408 (23.26) | 6455 (23.68) |
Hepatitis C | 2436 (1.48) | 1173 (4.30) |
Smoking or tobacco use | 37 355 (22.62) | 9597 (35.21) |
Substance use | 20 853 (12.63) | 7508 (27.55) |
Lonely or isolated | 17 947 (10.87) | 5149 (18.89) |
Multimorbidity 2+ | 133 704 (80.97) | 24 129 (88.53) |
Multimorbidity 3+ | 103 172 (62.48) | 19 237 (70.58) |
Observation-based period prevalence estimates tended to increase with length of observation; however, cumulative incidence plots for the 156,543 (70.82%) clients without care recorded in 2009 showed the rate of condition indications notably decreased after the first year of observation. Sample plots are in Figure 1; all are in Supplementary Appendix 1 (Supplementary Figures 2, 3).
Condition co-occurrence patterns
Among the 103,172 (46.7%) clients with multimorbidity of at least three chronic conditions, there were 25,162 unique combinations ranging in frequency from 1 (<0.1%) to 845 (0.4%) clients. Figure 2 presents the Ising model results. Pairwise associations between conditions on the log-odds scale ranged from -0.82 (Osteoporosis—Obesity) to 2.93 (Kidney disease or failure—Chronic urinary problem). There was 1 large, 5 medium, 40 small, and 207 very small associations based on odds ratio magnitude. The five largest positive associations were 1) Kidney Disease or Failure—Chronic Urinary Problem, 2) Smoking or Tobacco Use—Substance Use, 3) Cardiovascular Disease—Heart Failure, 4) Hypertension—Hyperlipidemia, and 5) Hypertension—Kidney Disease or Failure. In contrast, the top 5 co-occurring conditions based on raw frequency were 1) Hyperlipidemia—Chronic Musculoskeletal, 2) Hypertension—Chronic Musculoskeletal, 3) Hyperlipidemia—Hypertension, 4) Chronic Urinary Problem—Chronic Musculoskeletal, 5) Asthma or COPD or Chronic Bronchitis—Chronic Musculoskeletal. These directly correspond to the conditions that had the highest marginal frequencies.
Health care use characteristics
Table-based summaries of health care use characteristics are in Supplementary Appendix 3 (Supplementary Table 3). In general, Urban At-Risk CHC and multimorbidity strata had higher health care use while rural geography CHCs were closer to the overall population.
Providers involved
There were 19,394 unique combinations of the 68 distinct provider types seen across the 220,806 (99.9%) clients with at least one provider type recorded. In terms of referrals, 102,088 (46.2%) clients had at least one internal and 143,922 (65.1%) had at least one external referral recorded. Note internal referrals may not have captured “hallway referrals”, whereby a nearby provider provides a quick consult that is not formally recorded.
Figure 3 shows results of the NMF analysis, listing the highest-weighted provider types in each topic down to a weight of 3. For the ever-seen provider team analysis, physician and nursing provider types emerged most prominently overall. In general, as the number of topics increased, additional provider types emerged and then split apart to dominate separate topics. Exceptions were the high-weighted pairings of nurse and physician and of registered practical nurse and nurse practitioner. Overall, 18 of the 68 possible provider types emerged prominently in at least one topic; only one (respirologist) did not also appear in the amount-seen analysis.
The amount-seen provider team analysis had greater weight distributions between provider types within topics. For example, the first of the three-topic analysis had an approximate 1:1:1:6 ratio of care provided by nurse practitioner:nurse:registered practical nurse:physician. In both versions, about half of clients had a non-zero weight for only one of the first two topics; in the amount-seen analysis more clients maintained a non-zero weight on only one topic as the number of topics increased, e.g., 16.6% versus 2.5% at five topics. In general, results suggest most clients received the majority of care from physician, nurse practitioner, or nurse provider types, usually in combination with other provider types at a lower volume of care and with heterogeneous co-occurrence. An example of patterns that emerged for other provider types include differences in timing and weight of dietician/nutritionist and social worker providers between the two analyses. Interpreted alongside the most common provider and referral types (Supplementary Appendix 3 (Supplementary Table 4)), findings suggest referrals to dietitian/nutritionist were more common than to social worker, but frequent or longer-term care was more commonly provided by social workers.
Care access patterns
Complexity of care from a CHC-perspective was primarily low with 80.4% of client-visits associated with a single-issue and under 1.0% with over five issues addressed (higher intensity); however, from a client-perspective, 24,204 (11.0%) experienced at least one visit with over five issues while 38 533 (17.4%) experienced a maximum of one issue per visit across their care history. The mean care access frequency was 6 days per year (standard deviation = 7.4). While 29,191 (13.2%) clients experienced at least one year with over 25 days, 7,455 (3.4%) averaged over 25 days per year across their entire care history. There were 8,700 (3.94%) clients with at least one frequent care period (year with over 25 days care accessed) and complex care episode (visit with over 5 issues addressed).
For the time series clustering analyses, the short-term cohort included 37,920 clients and 93,625 client-years of observation; the long-term cohort included 42,855 clients and 387,035 client-years of observation. The silhouette score was always highest for two clusters (Supplementary Appendix 3 (Supplementary Table 5>)). Visual inspection of plots (Figure 4) showed high variability within and between clients.
Discussion
We used statistical and machine learning techniques to summarize sociodemographic, clinical, and health care use characteristics captured in the EHRs of ongoing primary care clients served by the Alliance. Substantive findings can motivate new topics for future decision support initiatives, or help to refine existing ideas and selection of performance measures for long-term evaluation of implemented interventions. Methods-related findings may inform the approaches used in these endeavours. While our discussion focuses on decision support initiatives, as with any epidemiological study, substantive results may be immediately useful to the population of interest, e.g., to inform clinic-level case management and onboarding of new clients.
Sociodemographic characteristics
The CHC EHRs contain rich sociodemographic information, both the presence and absence of which is informative. Social determinants of health that may increase risk of poor health including lower household income, education, residence instability, and food insecurity were more prevalent in Urban At-Risk CHC and multimorbidity strata (Table 1 and Supplementary Table 2). There appears to be evidence for the healthy immigrant effect [42], assessed by viewing the proportion of people in each category of the years since arrival in Canada variable across the multimorbidity strata: a lower proportion of people with 0-5 years in Canada had multimorbidity as compared to those with 6 or more years in Canada or no arrival information recorded (missing or born in Canada). Completeness rates varied by characteristic and may be due to client, provider, or CHC level decisions. For example, of the 72,059 (32.60%) clients asked about gender only 1,001 (1.39%) preferred to not answer. In contrast, more clients, 171,266 (77.48%), were asked about household income but there was a higher tendency to not answer, 27,621 (16.13%). These findings align with a framework to assess selection bias in EHR data that suggests multiple mechanisms are usually responsible for missingness so the focus should be on “what data are observed [instead of missing] and why?” [43]. While provider-level decisions may be due to inferring certain characteristics or prioritizing information needed for them to direct care, completeness rates are important for decision support tool performance, which can improve with social determinants of health information [44, 45]. In addition to completeness, the value of these data would be further improved if time-stamped and if changes in mutable characteristics like household income could be easily traced over time. Only the sociodemographic variables constructed with ENCODE-FM codes were associated with time of recording in the data extract we used (Supplementary Table 3).
When assessing data quality and completeness, which is emphasized by learning health system and machine learning for EHR guidelines [2, 4, 22, 46, 47], the implications of pursuing decision support initiatives at different levels should also be considered. For example, a subset of CHCs capture self-reported measures of health, which are valuable research outcomes [48]. While these measures are not suitable for analyses with data from all CHCs (CHC population-level initiatives), they should be considered for initiatives specific to the collecting CHCs. Even for sociodemographic variables now collected by all CHCs, there are variable completeness rates between CHCs partly because a subset of CHCs started collecting the information earlier than others. CHCs with higher completeness rates have improved opportunity to increase understanding about typically less well represented groups such as those identifying as gender diverse.
Clinical characteristics
Prevalence and incidence
In operationalizing morbidity measures, the denominator must be defined with the intended end-goal in mind. The eleven-year period prevalence (Table 2) estimates relate to a CHC-based perspective and are useful for long-term system-level planning or developing tools to address amount-based condition priorities, while the observation-based period prevalence (Figure 1 and Supplementary Figure 2) estimates are more aligned with a client-based perspective and absolute measure of risk. Another consideration is that just as ICD-10 or ENCODE-FM codes do not guarantee true condition presence, the absence of care does not verify absence of conditions [49]. For example, clients may not seek primary care when they are healthy, hospitalized, or experiencing barriers to care.
The cumulative incidence plots (Figure 1 and Supplementary Figure 3) demonstrate that “risk” of condition codes is highest in the first year of observation. Clinically this makes sense, as new clients may have a build-up of unmet care needs. Nonetheless, there are important takeaways for initiatives that require cohort construction. For example, predictive models developed for decision support need to account for the almost qualitative change in risk related to being a new client: likely a diagnostic model is more useful in the first year of care and a prognostic model thereafter. Although this care pattern is somewhat unique to primary care settings, methods developed for related problems may be useful. For example, accounting for variable lengths of stay in intensive care unit EHRs [50], or handling cold-starts and sparse data for recommender systems [51].
Condition co-occurrence patterns
There was a high prevalence of multimorbidity, but with thousands of different multimorbidity “compositions” it is hard to see how to make use of the category of multimorbidity.
The standard statistical table-based summaries (e.g., prevalence estimates in Table 2) provide information on the amount of each condition present in the population; the Ising model (Figure 3) provides information on the tendency for any two conditions to co-occur, adjusting for the presence or absence of the other conditions. The conditions with the strongest positive associations were not the same conditions with the highest frequency of occurrence or co-occurrence, so decision support tools targeted at a subset of conditions may consider whether to prioritize conditions accounting for the largest burden of disease in terms of amount or in terms of tendency for co-occurrence. In both cases, multimorbidity presents a long tail problem, with few combinations that are very prominent. Primary care decision support tools will face the challenge of making recommendations on many different and possibly co-occurring conditions. The majority of existing decision support tools and clinical guidelines focus on a single condition at a time; new techniques for providing evidence-based guidelines or recommendations for these vast numbers of combinations are needed [52–55].
Health care use characteristics
Providers involved
Care for ongoing primary care clients was typically led by physicians or by nurse practitioners, with several other provider types possibly also involved in care. These findings align with the general strategy at CHCs to match clients with primary care providers partly on service need and partly on preference. The relative-amount seen NMF analysis (Figure 3, bottom) suggests two prominent care models: nurse practitioners always dominate their own topic (associated clients receive all or almost all care from nurse practitioners) whereas physicians always appear with nurse and nurse practitioner provider types, at lower weights (associated clients receive the most care from physicians, but usually with some amount of nurse and nurse practitioner care). An implication is that decision support tools targeted at common conditions or situations in primary care, such as chronic disease management, may have different provider type end-users; all types of end users should be included in the development process [56]. Other primary health care provider types emerged in their own topics separate from physician and nurse practitioner dominated topics, suggesting that both of the above prominent care models refer to or work with various combinations of other provider types, most commonly social workers, dietitians/nutritionists, chiropodists, community health workers, counselors, and physiotherapists.
The NMF analyses more easily identified prominent patterns of commonly seen provider types and teams than manually sifting through extensive count-based tables created using simple statistical techniques. Another use for NMF is dimensionality reduction or data pre-processing, whereby data are summarized to reduce the number of variables that need to be included in an analysis [38]. For example, NMF-derived topics could be used as inputs to a predictive model instead of separate variables to represent each provider type or specific, manually selected combinations.
Care access patterns
Complexity of care from a CHC system-level perspective was primarily low intensity (few issues addressed per visit, recorded using ENCODE-FM codes; Supplementary Table 3), although this may be partly due to data quality such as if only one issue was recorded in the EHR when multiple were actually addressed in an appointment. Of note, the CHC model of care and compensation permits coding of multiple issues per visit. The subset of clients who experienced higher care complexity (many issues recorded per visit) did not tend to also have high frequency of care. Sporadic visit patterns may be due to unstable living arrangements or demanding life responsibilities; when there is uncertainty about when a client will return, providers may pack together multiple types of care. The marginal distribution of care frequency was right-skewed without a distinct break; most clients experienced lower care frequency, but higher frequencies were also observed. We used time series clustering (Figure 4 and Supplementary Table 5) to try and go beyond statistical techniques or simple cut-off points to find more refined groupings of clients in terms of care frequency patterns. In contrast to expectations, we did not identify consistent, distinct client groupings through the time-series clustering, e.g., to indicate a subpopulation of “frequent visitors”. This may be due to restrictions in our clustering approach and the types of similarity that dynamic time warping captures; however, it may also be that, while cognitively desirable, distinct client clusters in terms of care frequency do not exist. Future analyses could further investigate by trying a different similarity metric or including covariates to account for baseline variability.
Learning health systems
The paradigm that our study is intended to support or inform—using care-derived data in decision support tool development for the same population that gave rise to those data—fits within a learning health system framework, which is defined by Menear et al. (2019) as “dynamic health ecosystems where scientific, social, technological, policy, legal and ethical dimensions are synergistically aligned to enable cycles of continuous learning and improvement to be routinised and embedded across the system, thus enhancing value through an optimised balance of impacts on patient and provider experience, population health and health system costs” [1]. Large-scale descriptive studies like ours may be useful in early stages of learning health system development to inform types of initiatives beyond decision support tool development, such as quality improvement or research [2–4, 7].
The Alliance for Healthier Communities is one of the first documented primary care learning health systems in North America [7, 30]. Our next steps will include presenting our results to stakeholders through the Alliance’s public Lunch ‘n’ Learn webinar series as well as engaging in targeted conversations with stakeholders who have self-identified interest in future research engagement. By presenting our results of how the Alliance ongoing primary care population is represented in EHR data, we hope to learn what does (not) align with expectations, and to identify next steps for decision support initiatives. Altogether this will set the stage for co-development of tools that can move forward towards pilot testing and implementation in CHCs.
Strengths and limitations
Strengths included the strong interdisciplinary approach (authors from different disciplines and methodology that connects epidemiology, computer science, and primary care) used to assess complex, longitudinal EHR data. We used chronic condition definitions recommended for primary care research [31–33], although the algorithms have not been validated for CHCs specifically. Our broad cohort definition supported a high-level overview of the population, but may not be appropriate for specific research questions. Inherent in any study using EHR data are limitations related to data quality [43, 57, 58], so it is important to remember our results showcase how clients and care are represented in the data, which may not fully reflect experiences. For example, by including clients with variable care histories the sociodemographic characteristic profiles may be skewed from the average client population profile; our prevalence estimates use a single-code assumption of presence which prioritizes sensitivity over specificity; and we did not adjust analyses for changes in policy or coding practices over time that may have impacted care access or data entry over time.
Conclusions
We demonstrated the use of simple statistics and machine learning techniques, applied with an epidemiological lens, to describe EHR data and inform future primary care decision support tool development initiatives. Substantive findings lay a foundation for future Alliance initiatives and may be informative for other organizations serving complex primary care populations.
Key suggestions for future initiatives include the need to carefully deliberate the level of analysis, or who a given tool should be targeted at (e.g., all or a subset of CHCs, one or many clinical presentations, all or some providers), and the associated implications for how clients will be represented in the data. Representation will depend on analytical-, system-, provider-, and client-level factors. Decision support initiatives need to consider heterogeneity in conditions and care access patterns, including non-uniform risk of condition indications across observation history.
Acknowledgements
This work was supported by the Canadian Institutes of Health Research Canadian Graduate Scholarship-Doctoral to JKK with supervisor DJL.
Ethics statement
This study was approved by Western University Review Ethics Board project ID 111353.
Statement on conflicts of interest
None declared.
Supplementary appendices
Appendix 1 includes extended results presented through figures.
Appendix 2 includes the RECORD reporting guideline checklist.
Appendix 3 includes extended results presented through tables and technical details.
Abbreviations
CHC | Community Health Centre |
EHR | Electronic Health Record |
ENCODE-FM | Electronic Nomenclature and Classification Of Disorders and Encounters for Family Medicine |
ICD-10 | International Classification of Disease - Version 10 |
NMF | Non-negative Matrix Factorization |
References
-
Menear M, Blanchette M-A, Demers-Payette O, et al. A framework for value-creating learning health systems. Health Res Policy Syst 2019; 17: 79. 10.1186/s12961-019-0477-3
https://doi.org/10.1186/s12961-019-0477-3 -
Foley T, Horwitz L, Zahran R. Realising the potential of learning health systems. Newcastle University: The Learning Healthcare Project.
-
Delaney BC, Peterson KA, Speedie S, et al. Envisioning a learning health care system: The Electronic Primary Care Research Network, a case study. Ann Fam Med 2012; 10: 54–59. 10.1370/afm.1313
https://doi.org/10.1370/afm.1313 -
Lindsell CJ, Gatto CL, Dear ML, et al. Learning from what we do, and doing what we learn: A learning health care system in action. Acad Med 2021; 96: 1291–1299. 10.1097/ACM.0000000000004021
https://doi.org/10.1097/ACM.0000000000004021 -
Jiang F, Jiang Y, Zhi H, et al. Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol 2017; 2: 230–243. 10.1136/svn-2017-000101
https://doi.org/10.1136/svn-2017-000101 -
Friedman CP, Allee NJ, Delaney BC, et al. The science of Learning Health Systems: Foundations for a new journal. Learn Health Syst 2017; 1: e10020. 10.1002/lrh2.10020
https://doi.org/10.1002/lrh2.10020 -
Nash DM, Bhimani Z, Rayner J, et al. Learning health systems in primary care: a systematic scoping review. BMC Fam Pract 2021; 22: 126. 10.1186/s12875-021-01483-z
https://doi.org/10.1186/s12875-021-01483-z -
Robinson JM, Trochim WMK. An examination of community members’, researchers’ and health professionals’ perceptions of barriers to minority participation in medical research: an application of concept mapping. Ethn Health 2007; 12: 521–539. 10.1080/13557850701616987
https://doi.org/10.1080/13557850701616987 -
George S, Duran N, Norris K. A systematic review of barriers and facilitators to minority research participation among African Americans, Latinos, Asian Americans, and Pacific Islanders. Am J Public Health 2014; 104: e16–e31. 10.2105/AJPH.2013.301706
https://doi.org/10.2105/AJPH.2013.301706 -
Odierna DH, Schmidt LA. The effects of failing to include hard-to-reach respondents in longitudinal surveys. Am J Public Health 2009; 99: 1515–1521. 10.2105/AJPH.2007.111138
https://doi.org/10.2105/AJPH.2007.111138 -
Bonevski B, Randell M, Paul C, et al. Reaching the hard-to-reach: a systematic review of strategies for improving health and medical research with socially disadvantaged groups. BMC Med Res Methodol 2014; 14: 42. 10.1186/1471-2288-14-42
https://doi.org/10.1186/1471-2288-14-42 -
Primary care. Balancing health needs, services, and technology. New York, NY: Oxford University Press, Inc.; 1998.
-
CIHR Primary Healthcare Summit 2010 Final Report Summary. Toronto, Ontario: Canadian Institutes of Health Research; 2010.
-
ICES Investigative Report. Toronto, Ont.: Institute for Clinical Evaluative Sciences; 2012.
-
Booth RG, Richard L, Li L, et al. Characteristics of health care related to mental health and substance use disorders among Community Health Centre clients in Ontario: a population-based cohort study. CMAJ Open 2020; 8: E391–E399. 10.9778/cmajo.20190089
https://doi.org/10.9778/cmajo.20190089 -
Albrecht D. Community health centres in Canada. Leadersh Health Serv 1998; 11: 5–10. 10.1108/13660759810202596
https://doi.org/10.1108/13660759810202596 -
Examining community health centres according to geography and priority populations served, 2011/12 to 2012/13: an ICES chartbook. Toronto, Ontario: Institute for Clinical Evaluative Sciences in Ontario; 2015.
-
Alliance for Healthier Communities. Moving Forward as a Learning Health System. Alliance for Healthier Communities, 18 November 2020, https://myemail.constantcontact.com/EPIC-News–Issue-1.html?soid=1108953382524&aid=uzy8bphr91U (18 November 2020, accessed 23 November 2020).
-
Alliance for Healthier Communities. Towards a Learning Health System: Better Care Tomorrow When We Learn from Today. 2020: Alliance for Healthier Communities. https://www.allianceon.org/sites/default/files/documents/Learning%20Health%20System%20report%202020-10-20%20-%20FINAL_JR.pdf
-
Cameron D, Jones IG. John Snow, the Broad Street Pump and Modern Epidemiology. Int J Epidemiol 1983; 12: 393–396. 10.1093/ije/12.4.393
https://doi.org/10.1093/ije/12.4.393 -
Thuraisingam S, Chondros P, Dowsey MM, et al. Assessing the suitability of general practice electronic health records for clinical prediction model development: a data quality assessment. BMC Med Inform Decis Mak 2021; 21: 297. 10.1186/s12911-021-01669-6
https://doi.org/10.1186/s12911-021-01669-6 -
Verma AA, Murray J, Greiner R, et al. Implementing machine learning in medicine. CMAJ 2021; 193: E1351–E1357. 10.1503/cmaj.202434
https://doi.org/10.1503/cmaj.202434 -
Lee S, Xu Y, D’Souza AG, et al. Unlocking the potential of electronic health records for health research. Int J Popul Data Sci 2020; 5: 02. 10.23889/ijpds.v5i1.1123
https://doi.org/10.23889/ijpds.v5i1.1123 -
Westfall JM, Wittenberg HR, Liaw W. Time to invest in primary care research—commentary on findings from an independent congressionally mandated study. J Gen Intern Med 2021; 36: 2117–2120. 10.1007/s11606-020-06560-0
https://doi.org/10.1007/s11606-020-06560-0 -
Fox MP, Murray EJ, Lesko CR, et al. On the need to revitalize descriptive epidemiology. Am J Epidemiol 2022; kwac056. 10.1093/aje/kwac056
https://doi.org/10.1093/aje/kwac056 -
ENCODE-FM. Electronic Nomenclature and Classification Of Disorders and Encounters for Family Medicine. ENCODE-FM, http://aix1.uottawa.ca/~fammed/fmcencod.htm (2020, accessed 6 April 2020).
-
World Health Organization. ICD-10 Version:2019. World Health Organization, https://icd.who.int/browse10/2019/en (2020, accessed 6 April 2020).
-
Benchimol EI, Smeeth L, Guttmann A, et al. The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) Statement. PLOS Med 2015; 12: e1001885. 10.1371/journal.pmed.1001885
https://doi.org/10.1371/journal.pmed.1001885 -
Nash DM, Brown JB, Thorpe C, et al. The Alliance for Healthier Communities as a learning health system for primary care: A qualitative analysis in Ontario, Canada. J Eval Clin Pract 2022; jep.13692. 10.1111/jep.13692
https://doi.org/10.1111/jep.13692 -
Nash DM, Rayner J, Bhatti S, et al. The Alliance for Healthier Communities’ journey to a learning health system in primary care. Learn Health Syst 2021; n/a: e10321. 10.1002/lrh2.10321
https://doi.org/10.1002/lrh2.10321 -
Fortin M, Almirall J, Nicholson K. Development of a research tool to document self-reported chronic conditions in primary care. J Comorbidity 2017; 7: 117–123. 10.15256/joc.2017.7.122
https://doi.org/10.15256/joc.2017.7.122 -
Lee ES, Lee PSS, Xie Y, et al. The prevalence of multimorbidity in primary care: a comparison of two definitions of multimorbidity with two different lists of chronic conditions in Singapore. BMC Public Health 2021; 21: 1409. 10.1186/s12889-021-11464-7
https://doi.org/10.1186/s12889-021-11464-7 -
Lee YAJ, Xie Y, Lee PSS, et al. Comparing the prevalence of multimorbidity using different operational definitions in primary care in Singapore based on a cross-sectional study using retrospective, large administrative data. BMJ Open 2020; 10: e039440. 10.1136/bmjopen-2020-039440
https://doi.org/10.1136/bmjopen-2020-039440 -
Therneau T. A Package for Survival Analysis in R, https://CRAN.R-project.org/package=survival (2021).
-
van Borkulo CD, Borsboom D, Epskamp S, et al. A new method for constructing networks from binary data. Sci Rep 2014; 4: 5918. 10.1038/srep05918
https://doi.org/10.1038/srep05918 -
Clark NJ, Wells K, Lindberg O. Unravelling changing interspecific interactions across environmental gradients using Markov random fields. Ecology 2018; 99: 1277–1283. 10.1002/ecy.2221
https://doi.org/10.1002/ecy.2221 -
Chen H, Cohen P, Chen S. How big is a big odds ratio? Interpreting the magnitudes of odds ratios in epidemiological studies. Commun Stat - Simul Comput 2010; 39: 860–864. 10.1080/03610911003650383
https://doi.org/10.1080/03610911003650383 -
Wang Y-X, Zhang Y-J. Nonnegative Matrix Factorization: A comprehensive review. IEEE Trans Knowl Data Eng 2013; 25: 1336–1353. 10.1109/TKDE.2012.51
https://doi.org/10.1109/TKDE.2012.51 -
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python. J Mach Learn Res 2011; 12: 2825–2830.
-
Aghabozorgi S, Seyed Shirkhorshidi A, Ying Wah T. Time-series clustering – A decade review. Inf Syst 2015; 53: 16–38. 10.1016/j.is.2015.04.007
https://doi.org/10.1016/j.is.2015.04.007 -
Montero P, Vilar JA. TSclust: An R package for time series clustering. J Stat Softw; 62. Epub ahead of print 2014. 10.18637/jss.v062.i01
https://doi.org/10.18637/jss.v062.i01 -
McDonald JT, Kennedy S. Insights into the ‘healthy immigrant effect’: health status and health service use of immigrants to Canada. Soc Sci Med 2004; 59: 1613–1627. 10.1016/j.socscimed.2004.02.004
https://doi.org/10.1016/j.socscimed.2004.02.004 -
Haneuse S, Daniels M. A general framework for considering selection bias in EHR-based studies: What data are observed and why? eGEMs; 4. Epub ahead of print 31 August 2016. 10.13063/2327-9214.1203
https://doi.org/10.13063/2327-9214.1203 -
Chen M, Tan X, Padman R. Social determinants of health in electronic health records and their impact on analysis and risk prediction: A systematic review. J Am Med Inform Assoc 2020; 27: 1764–1773. 10.1093/jamia/ocaa143
https://doi.org/10.1093/jamia/ocaa143 -
Zhao Y, Wood EP, Mirin N, et al. Social determinants in machine learning cardiovascular disease prediction models: A systematic review. Am J Prev Med; 0. Epub ahead of print 27 July 2021. 10.1016/j.amepre.2021.04.016
https://doi.org/10.1016/j.amepre.2021.04.016 -
Wiens J, Saria S, Sendak M, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med 2019; 25: 1337–1340. 10.1038/s41591-019-0548-6
https://doi.org/10.1038/s41591-019-0548-6 -
Arbet J, Brokamp C, Meinzen-Derr J, et al. Lessons and tips for designing a machine learning study using EHR data. J Clin Transl Sci 2020; 5: 1–10. 10.1017/cts.2020.513
https://doi.org/10.1017/cts.2020.513 -
CIHI. Patient-reported outcome measures (PROMs). Canadian Institute for Health Information, https://www.cihi.ca/en/patient-reported-outcome-measures-proms (2022, accessed 7 February 2022).
-
Bagley SC, Altman RB. Computing disease incidence, prevalence and comorbidity from electronic medical records. J Biomed Inform 2016; 63: 108–111. 10.1016/j.jbi.2016.08.005
https://doi.org/10.1016/j.jbi.2016.08.005 -
Zhang L, Chen X, Chen T, et al. DynEHR: Dynamic adaptation of models with data heterogeneity in electronic health records. In: 2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). 2021, pp. 1–4. 10.1109/BHI50953.2021.9508558
https://doi.org/10.1109/BHI50953.2021.9508558 -
Alyari F, Jafari Navimipour N. Recommender systems: A systematic review of the state of the art literature and suggestions for future research. Kybernetes 2018; 47: 985–1017. 10.1108/K-06-2017-0196
https://doi.org/10.1108/K-06-2017-0196 -
Moons KGM, Altman DG, Vergouwe Y, et al. Prognosis and prognostic research: application and impact of prognostic models in clinical practice. BMJ 2009; 338: b606. 10.1136/bmj.b606
https://doi.org/10.1136/bmj.b606 -
O’Caoimh R, Cornally N, Weathers E, et al. Risk prediction in the community: A systematic review of case-finding instruments that predict adverse healthcare outcomes in community-dwelling older adults. Maturitas 2015; 82: 3–21. 10.1016/j.maturitas.2015.03.009
https://doi.org/10.1016/j.maturitas.2015.03.009 -
Goldstein BA, Navar AM, Pencina MJ, et al. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J Am Med Inform Assoc 2017; 24: 198–208. 10.1093/jamia/ocw042
https://doi.org/10.1093/jamia/ocw042 -
Guthrie B, Boyd CM. Clinical guidelines in the context of aging and multimorbidity. Public Policy Aging Rep 2018; 28: 143–149. 10.1093/ppar/pry038
https://doi.org/10.1093/ppar/pry038 -
Kellogg KC, Sendak M, Balu S. AI on the front lines. MIT Sloan Manag Rev, https://sloanreview.mit.edu/article/ai-on-the-front-lines/ (2022, accessed 17 May 2022).
-
Terry AL, Stewart M, Cejic S, et al. A basic model for assessing primary health care electronic medical record data quality. BMC Med Inform Decis Mak 2019; 19: 30. 10.1186/s12911-019-0740-0
https://doi.org/10.1186/s12911-019-0740-0 -
Gianfrancesco MA, Goldstein ND. A narrative review on the validity of electronic health record-based research in epidemiology. BMC Med Res Methodol 2021; 21: 234. 10.1186/s12874-021-01416-5
https://doi.org/10.1186/s12874-021-01416-5