Agreement of acute serious events recorded across datasets using linked Australian general practice, hospital, emergency department and death data: implications for research and surveillance

Main Article Content

Sarah Ahmed
Dr Allan Pollack
Alys Havard
Sallie-Anne Pearson
Kendal Chidwick


Understanding the level of recording of acute serious events in general practice electronic health records (EHRs) is critical for making decisions about the suitability of general practice datasets to address research questions and requirements for linking general practice EHRs with other datasets.

To examine data source agreement of five serious acute events (myocardial infarction, stroke, venous thromboembolism (VTE), pancreatitis and suicide) recorded in general practice EHRs compared with hospital, emergency department (ED) and mortality data.

Data from 61 general practices routinely contributing data to the MedicineInsight database was linked with New South Wales administrative hospital, ED and mortality data. The study population comprised patients with at least three clinical encounters at participating general practices between 2019 and 2020 and at least one record in hospital, ED or mortality data between 2010 and 2020. Agreement was assessed between MedicineInsight diagnostic algorithms for the five events of interest and coded diagnoses in the administrative data. Dates of concordant events were compared.

The study included 274,420 general practice patients with at least one record in the administrative data between 2010 and 2020. Across the five acute events, specificity and NPV were excellent (>98%) but sensitivity (13%-51%) and PPV (30%-75%) were low. Sensitivity and PPV were highest for VTE (50.9%) and acute pancreatitis (75.2%), respectively. The majority (roughly 70-80%) of true positive cases were recorded in the EHR within 30 days of administrative records.

Large proportions of events identified from administrative data were not detected by diagnostic algorithms applied to general practice EHRs within the specific time period. EHR data extraction and study design only partly explain the low sensitivities/PPVs. Our findings support the use of Australian general practice EHRs linked to hospital, ED and mortality data for robust research on the selected serious acute conditions.



Electronic health record (EHR) systems are widely used in general practice to support clinical management [13]. They are a valuable source of information for surveillance, research and evaluation, providing large sample sizes with comprehensive clinical, sociodemographic and treatment information about patients, much of which is unavailable from other data sources [4, 5]. For these reasons the use of data from general practice EHRs for research and policy decisions is growing in Australia [6, 7]. It is important to explore the utility and limitations of general practice data as collections continue to develop [8].

Linking general practice with other routinely collected data, such as hospital admissions, emergency department (ED) and deaths, improves its utility for research, monitoring and surveillance. Limited such linkage has occurred to date in Australia. Technical and governance issues, while not insurmountable, take time and considerable resources to overcome [9]. A good understanding of the contribution of different datasets is important prior to linking them.

In Australia, for research involving conditions diagnosed, treated and managed largely within general practice, EHRs alone may be a good data source [1012]. However, patients are often managed across multiple settings with important events and outcomes recorded across disparate sources. The extent to which Australian EHRs accurately reflect acute serious events is unclear. For example, acute cardiovascular events may require urgent care in hospitals or result in deaths, with missing or delayed general practice records. Primary care data from other countries has reasonable recording of acute events [1316]. However, health care systems differ, and this question hasn’t been examined specifically for Australia.

This study evaluated five serious health events – acute pancreatitis, myocardial infarction (MI), stroke, suicide and venous thromboembolism (VTE) - as recorded in a general practice EHR dataset (MedicineInsight) compared with administrative hospital, ED and mortality datasets. We also assessed timing of events which is important for some time-sensitive research questions. The selected events are acute and primarily managed in hospitals or EDs, with general practice presentations for ongoing care or discharge. They were selected as important events of interest for future research but may also serve as indicators of the quality of recording of acute conditions with similar clinical care pathways. Although useful as an external reference standard for the purposes of this study, hospital, ED and mortality data are not a “gold standard” for overall prevalence as they are subject to errors and incompleteness, such as missing events that occur outside the hospital system.


Study design

This was an observational study comparing the level of agreement for recording five acute serious events in general practice EHRs, hospital and mortality datasets. We compared routinely used MedicineInsight algorithms flagging these events with information recorded in linked hospital, ED and mortality data.

Data sources

MedicineInsight is a large-scale database established by NPS MedicineWise in 2011, containing de-identified EHRs from over 600 participating general practices across Australia [5]. It uses third-party data extraction tools [17, 18] which de-identify, extract and securely transmit data from the Best Practice (BP)™or Medical Director (MD)™clinical information systems, for harmonisation, cleaning and storage [5]. Extracted data include demographic and clinical entries by healthcare professionals. Identifying data and fields that may contain them including name, date of birth, address, progress notes and correspondence are not extracted. Certain variables are derived such as condition flags [5, 19]. Monthly extractions result in an updated longitudinal database where patients within each practice can be tracked over time. Previous research has examined the algorithms used to create medical conditions flags and recording of death in MedicineInsight [10, 20]. However, recording of acute serious conditions in MedicineInsight have not been validated.

The New South Wales (NSW) Admitted Patient Data Collection (APDC) is a compilation of episode-level records from all admitted patient services provided by NSW public and private hospitals, public psychiatric hospitals, public multi-purpose services, and private day procedures centres [21]. The variables used in this study include the dates of admission and separation, diagnoses coded by trained clinical information managers (International Classification of Diseases Australian Modification [ICD-10-AM]), procedures, and separation mode (discharge, transfer or death).

The NSW Emergency Department Data Collection (EDDC) provides information about patient presentations to the emergency departments (ED) of NSW public hospitals [21]. Information from private hospital and some smaller public hospital EDs were not available for linkage. The variables used in this study include admission date, and diagnosis codes (ICD10-AM, ICD10 and SNOMED-CT) recorded by medical, nursing or clerical personnel at the point of care. These personnel are not trained in clinical coding, noting symptoms are often selected as diagnoses.

The NSW Registry of Birth Deaths and Marriages (RBDM) death registrations contain fact of death information, date of birth, age at death, date of death and year of death registration. The Australian Coordinating Registry Cause of Death Unit Record File (ACR CODURF) provides both fact and cause of death information. Cause of death data is coded using the ICD-10 International Version, not ICD-10-AM. The variables used in this study include dates of birth, death and death registration as well as variables related to cause and place of death. Both RBDM and CODURF include deaths occurring in NSW and do not include deaths of NSW residents who die interstate [21].

The Centre for Health Record Linkage (CHeReL) [22] is a dedicated data linkage unit managed by the NSW Ministry of Health. CHeReL used a privacy preserving record linkage (PPRL) methodology [23] to link records dated between 2010 and 2020 from the NSW APDC, EDDC, RBDM and ACR CODURF [21] to MedicineInsight. CHeReL assigned Project Person Numbers (PPNs) which identify individuals linked across datasets and are used to merge content data. Content data from each dataset was transferred to the Secured Unified Research Environment (SURE) [24] for storage and analysis. The study population was subset from this linked dataset, with analyses limited to encounters recorded during the study period.

Study population and study period

General practices

We identified 61 eligible NSW general practice sites participating in MedicineInsight that used the INCA extraction tool (a prerequisite for enabling PPRL) in the February 2022 database build, from which eligible patients were selected for linkage by CHeReL.


The study cohort was extracted from the ‘Total MedicineInsight Linkage Population’ as detailed in Figure 1. The study cohort included regular patients with valid age (0 to 112 years) with at least 3 clinical encounters between 1 January 2019 and 31 December 2020 at an eligible MedicineInsight practice in NSW which met the data quality requirements in February 2022 [5] and with at least one hospital, ED or mortality record between 1 January 2010 and 31 December 2020. Regular patients are defined as those who have at least three consultations in any 2 consecutive years, in accordance with the Royal Australian College of General Practitioner’s (RACGP’s) definition of ‘active’ patients [25]. MedicineInsight is an open cohort and patients in Australia can visit multiple general practices, thus regular patients are often selected for analyses because they are more likely than infrequent attenders to be receiving most of their care at the MedicineInsight practice, thereby enabling sufficient opportunities for diagnoses and risk factors etc. to be recorded.

Figure 1: Selection of the MedicineInsight linkage population and the study cohort.

The study period encompassed 1 January 2019 to 31 December 2020. Defining start and end of patient follow-up using data from Australian EHRs is challenging, therefore we assume regular patients were attending the MedicineInsight practice for the entire study period.

Outcome definitions

Ascertaining serious acute events

As in most primary health care EHR databases, MedicineInsight contains diagnostic algorithms [1] that use information from various fields to identify whether patients have specific conditions. These algorithms have been developed by NPS MedicineWise, the custodian of MedicineInsight, to create efficiencies for users of the data and promote consistency between studies. These algorithms identify conditions using information from three diagnostic EHR fields (diagnosis, reason for visit and reason for prescription) containing either coded terms that the user selects from a drop-down list in the EHR software, or free text. The algorithms identify patients as having the specific condition if a coded term or text string from the pre-defined list has ever been recorded for that patient in any one of the three fields. The pre-defined list is compiled by trained clinical coders and is based on available Pyefinch (used in BP) and Docle (used in MD) codes, as well as commonly accepted clinical definitions and abbreviations. Docle and Pyefinch are Australian general practice coding systems which consist of clinical terminologies for diseases, clinical findings and therapies [26]. For records identified by a free text string alone, the context in which it is recorded is reviewed by clinical coders at the time of developing the algorithm and periodically thereafter, and irrelevant instances removed. Other fields such as prescriptions or pathology are not searched. Data (including diagnoses) recorded in the unstructured area of the EHR, called ‘progress notes’, are not collected because they may contain identifiable information [5].

A detailed description of the MedicineInsight algorithms for acute pancreatitis, MI, stroke (including transient ischaemic attack), suicide (attempted or completed) and VTE (including deep vein thrombosis and pulmonary embolism) is included in Supplementary Appendix 1. Cases in MedicineInsight were patients flagged by MedicineInsight diagnostic algorithms as having an event recorded during the two-year study time period (2019 to 2020). Multiple events during the study period were only counted once and the date of the earliest event within the study period was taken as the ‘MedicineInsight index date’.

The external reference standard was cases identified in either APDC, EDDC, RBDM or CODURF data during 2019 and 2020. Primary diagnoses codes were considered in the EDDC while both primary and additional diagnoses codes were considered in the APDC and CODURF. The ICD-10-AM or SNOMED-CT codes used to define each outcome are provided in Supplementary Appendix 2. For each serious acute outcome, multiple events within or across linked datasets during the study period were only counted once and the earliest event within the study period was used to define the ‘reference index date’(RID).

Assessment of agreement regarding fact of event

For each acute serious event we calculated percentage of agreement, sensitivity, specificity, negative and positive predictive values (NPV/PPV) [10, 27] of the MedicineInsight algorithms compared with an external reference standard- during 2019 and 2020. The external reference standard was a composite of APDC, EDDC, RBDM and CODURF. An outcome recorded in any of these datasets was considered to have truly occurred.

Assessment of agreement regarding timing of event

For the ‘true positive’ cases, we calculated differences between the MedicineInsight index date and the reference index date. Results were presented in the following mutually exclusive categories where the MedicineInsight index date was: 0–30 days after the reference index date, 1-30 days before the reference index date, and more than 30 days before or after the reference index date. The median, quartile 1 and quartile 3 differences between the MedicineInsight and reference index dates are also presented (per event, rounded to whole days). Median and quartiles are presented as the distribution of this difference is not normal.

Analyses were conducted using SAS version 9.4 (SAS Institute Inc., Cary, NC, USA) via the SURE platform. As the data are clustered within practices, variance was adjusted to account for correlation between observations within clusters, and confidence intervals adjusted accordingly, using SURVEYFREQ and SURVEYMEANS procedures in SAS. For confidentiality, small cell numbers (1 to 4) were suppressed, or results aggregated to a higher level.


Study Population (Figure 1, Table 2)

The study cohort comprised 274,420 regular patients from 61 general practices with at least one record in a NSW linked dataset between 2010 and 2020 (Table 1). The study cohort was similar to national estimates of Australian patients attending general practice during 2019-20 although it overrepresented older patients (11.2% aged 70–79), inner regional areas (25.3%) and the middle socioeconomic quintile (25.8%) [28]. National comparisons indicate 8.3% of patients aged 70–79, 12.3% from inner regional areas and 19.7% in the middle socioeconomic quintile [28].

Study population N (%) Australian national data (MBS)a 2019–20 [28] N (%)
Total persons 274,420 (100%) 22,178,760 (100%)
Female 154,622 (56.3%) 11,595,257 (52.3%)
Mean age in years (SD) 43.4 (25.6)
Median age in years (Q1, Q3) 43.0 (21.9, 64.2)
Age group (in years)
0-9 37,525 (13.7%) 2,763,081 (12.5%)
10-19 23,822 (8.7%) 2,419,160 (10.9%)
20-29 26,817 (9.8%) 2,646,230 (11.9%)
30-39 35,999 (13.1%) 3,117,218 (14.1%)
40-49 32,094 (11.7%) 2,914,753 (13.1%)
50-59 31,891 (11.6%) 2,843,363 (12.8%)
60-69 33,685 (12.3%) 2,558,260 (11.5%)
70-79 30,651 (11.2%) 1,841,556 (8.3%)
80-89 16,704 (6.1%) 864,260 (3.9%)
90+ 5,232 (1.9%) 210,879 (1.0%)
Major city 184,776 (67.3%) 15,888,344 (71.6%)
Inner regional 69,311 (25.3%) 2,737,905 (12.3%)
Outer regional 20,041 (7.3%) 2,707,665 (12.2%)
Remote/ very remote 292 (0.1%) 844,227 (3.8%)
Socioeconomic status (SEIFA IRSAD quintile)
1 (most disadvantaged) 30,992 (11.3%) 3,467,086 (15.6%)
2 58,864 (21.5%) 3,563,822 (16.1%)
3 70,848 (25.8%) 4,378,392 (19.7%)
4 54,851 (20.0%) 4,626,996 (20.9%)
5 (most advantaged) 58,865 (21.5%) 6,135,506 (27.7%)
Table 1: Characteristics of the study population. Notes: The Medicare Benefits Schedule (MBS) data collection contains information on services that qualify for a benefit under the Health Insurance Act 1973 and for which a claim has been processed. MBS data includes patients with at least 1 GP visit during 2019–20, Study population includes regular patients as defined in methods.

During the 2-year study period (Table 2): for MI, there were 1,203 (0.4%) patients in MedicineInsight with this condition vs 3,178 (1.2%) patients in the linked reference data; for stroke, 1,394 (0.5%) vs 2,359 (0.9%); for VTE, 1,390 (0.5%) vs 954 (0.3%); for acute pancreatitis, 105 (0.04%) vs 550 (0.2%); and for attempted or completed suicide, 583 (0.2%) vs 1,366 (0.5%).

Study population N=274,420 Acute Pancreatitis Myocardial Infarction Stroke Suicide VTE*
Reference+ HER+ (True positive) 79 773 655 177 486
Reference- HER+ (False positive) 26 430 739 406 904
Reference+ EHR- (False negative) 471 2,405 1,704 1,189 468
Reference- EHR- (True negative) 273,844 270,812 271,322 272,648 272,562
Sensitivity (95 % CI) 14.4 (10.5, 18.2) 24.3 (22.0, 26.7) 27.8 (25.6, 30.0) 13.0 (9.8, 16.2) 50.9 (47.0, 55.0)
Specificity (95 % CI) 100.0 (100.0, 100.0) 99.8 (99.8, 99.9) 99.7 (99.7, 99.8) 99.9 (99.8, 99.9) 99.7 (99.6, 99.7)
PPV (95 %CI) 75.2 (67.3, 83.1) 64.3 (59.8, 68.7) 47.0 (43.2, 50.8) 30.4 (27.3, 33.4) 35.0 (31.4, 38.5)
NPV (95 % CI) 99.8 (99.8, 99.9) 99.1 (90.0, 99.3) 99.4 (99.3, 99.5) 99.6 (99.5, 99.6) 99.8 (99.8, 99.9)
PoA (%) 99.8 99.0 99.1 99.4 99.5
Table 2: Measures of agreement for each acute serious outcome between the EHR (MedicineInsight) algorithms and linked (Reference) data. *Venous Thromboembolism.

The selection process for the ‘Total MedicineInsight Linkage Population’ and study cohort is described in Figure 1.

Agreement of acute serious events recorded in MedicineInsight compared with NSW hospitalisation and mortality datasets (Table 3)

MedicineInsight index date vs reference index date (RID) Acute pancreatitis Myocardial infarction Stroke Suicide VTE*
n % n % n % n % n %
1–30 days before RID 9 11.4 99 12.8 95 14.5 19 10.7 58 11.9
0–30 days after RID 53 67.1 532 68.8 357 54.5 89 50.3 328 67.5
>30 days before or after RID 17 21.5 142 18.4 203 31.0 69 39.0 100 20.6
Total 79 100.0 773 100.0 655 100.0 177 100.0 486 100.0
Median difference**(Q1, Q3) in days 2 (-1, 11) 0 (-1, 7) 0 (-5, 13) 1 (-21, 16) 2 (-1, 10)
Table 3: Time between recording of acute serious events in EHR (MedicineInsight) and linked (Reference) data for ‘true positive’ cases during 2019–20. *Venous Thromboembolism. **difference=MedicineInsight index date minus RID.

In general, sensitivity and PPV of MedicineInsight algorithms were low across the selected acute serious events (Table 2). Sensitivity – or the number of patients that were identified by the MedicineInsight algorithm as having the specific acute serious event, as a proportion of patients who truly had the specific event (according to linked datasets) – was highest for VTE (50.9%), followed by stroke (27.8%) and MI (24.3%). PPV – or the proportion of patients that truly had the specific event, from all patients identified by the MedicineInsight algorithm as having the specific event – was highest for acute pancreatitis (75.2%), followed by MI (64.3%) and stroke (47.0%).

PoA, specificity and NPV were high across events because the majority of regular general practice patients during 2019-20 did not experience these acute serious events (Table 2).

The timing of selected events between datasets (Table 3)

The majority of ‘true positive’ cases were first recorded in the EHR 0 to 30 days after they were recorded in hospital, ED or mortality datasets(or after the reference index date (RID)). Of 773 ’true positive’ MI cases in MedicineInsight 68.8% were recorded 0–30 days after the RID, 12.8% were recorded 1–30 days before the RID and 18.4% were recorded more than 30 days before or after the RID. Results were similar for acute pancreatitis and VTE, however a higher proportion of suicide (39.0%) and stroke (31.0%) cases were recorded more than 30 days before or after the RID. The lowest median differences between MedicineInsight and reference index dates were for myocardial infarctions and strokes (median of 0 days difference for each). The difference in index dates was most broadly distributed for suicides, with an interquartile range of 37 days.


To our knowledge this is the first Australian study comparing the recording of these acute health events in general practice EHRs with linked hospital and mortality datasets. Outside Australia, studies validating algorithms within primary care EHRs report varying PPVs and sensitivities, with measures of agreement being generally lower for acute compared to chronic outcomes [29, 30]. Lower PPVs and sensitivities are also reported where reliance is on free text data (as opposed to where coding systems such as ICD-10 are used) [29].

Our study found Australian general practice EHR diagnostic algorithms for selected acute serious events do not correlate well with combined hospital and death records. For example, we found low sensitivities for VTE (50.9%), stroke (27.8%) and MI (24.3%) indicating a large proportion of patients experiencing these events were incorrectly classified as not having these events, either because the event was not recorded in the practice EHR within the specific time period or was not ascertained by the diagnostic algorithm. The use of these algorithms alone will lead to undercounting of patients with these acute conditions and is insufficient for most observational studies.

The proportion of patients identified by MedicineInsight algorithms as having the specific acute event who truly had the event (according to linked data) was highest for acute pancreatitis (PPV 75.2%), followed by MI (64.3%) and stroke (47.0%). The high specificity and good PPV for acute pancreatitis indicate that this diagnostic algorithm in particular returns relatively few false positives and may therefore be useful for identifying cohorts of patients who truly have the specific event, but linkage with hospital data is required to find most cases. General practice EHRs contain detailed patient information on potential confounding factors unavailable in other administrative datasets. Using linked general practice, hospital, ED and mortality data is therefore recommended in research on acute serious events to accurately ascertain cases, build appropriate patient cohorts and more adequately control for confounding. Our findings suggest that general practice records will complement hospital, ED and mortality data, and that both are required for robust ascertainment of acute serious events. Overall, our findings support the use of general practice EHRs for research, monitoring or surveillance of the selected acute events, only when linked to hospital, ED and mortality data.

These results were not unexpected in the context of the Australian health setting, despite international validation studies demonstrating good recording of acute serious events in primary care data [31, 32]. Presentations for these conditions occur mainly in acute care settings, with general practice involvement mainly for ongoing care or at discharge. Where patients visit multiple general practices, discharge information may be sent to a non-MedicineInsight practice, delayed, or not sent at all. Additionally, patients may present to a non-MedicineInsight practice for ongoing care or be managed entirely in secondary care. Furthermore, the information contained within discharge summaries does not automatically populate the diagnosis fields within EHRs and information may be recorded in sections of the EHR that are not extracted by MedicineInsight.

Other potential explanations for the low sensitivities and PPVs in this study relate to study methodology, however we expect these only partly account for the magnitude of the discordance observed between EHR and reference hospital, ED and mortality records. Events recorded in MedicineInsight but not reference datasets were defined as false positives in this study but may be an indication of true cases that did not require hospitalisation rather than an indication of incorrect diagnoses. Scenarios leading to the misclassification of MedicineInsight cases as false positives, and consequent underestimation of PPVs, include acute events managed in general practice without hospitalisation (such as VTE) or hospitalisations and deaths for events that were not captured in the linked reference data because they occurred outside NSW. Reference hospital, ED and mortality data are also subject to errors and incompleteness. In some countries a common strategy for researchers using general practice EHRs is to confirm events and dates through questionnaires sent to general practitioners particularly in the subset of patients for whom linkage to hospital and death records is not possible [32]. Conducting a validation study using general practice questionnaires was not feasible for this study; hence linked hospital and mortality datasets were used as the reference standard.

Diagnoses recorded in EHRs outside of the study timeframe were not included, leading to potential underestimates of sensitivities and PPVs. For example, if a patient was diagnosed with a VTE by their general practice at the end of 2018 and presented to hospital with it in 2019, this study did not count the earlier diagnosis recorded in MedicineInsight as it fell outside study timeframes. Similarly, if a patient presented to hospital with an MI in 2020 but the general practice was only notified in 2021, this study did not count the later diagnosis recorded in MedicineInsight. In both examples the MedicineInsight case is misclassified as false negative, leading to underestimates of the true sensitivity and PPV. Our analysis was limited to events identified by diagnostic algorithms (Supplementary Appendix 1) and may undercount events where data is recorded in non-diagnostic fields, reducing the number of cases ascertained and subsequently leading to lower sensitivities and PPVs. Combining case ascertainment using clinical coding with prescription or other clinical management data improved measures of agreement [30].

Of the ‘true positive’ cases (those identified in both EHRs and linked data during the study time period) the majority (approximately 70-80%) were recorded in the EHR within 30 days of hospital, ED or mortality record. Cases recorded outside the 30-day window could reflect true positive cases or separate events. Studies demonstrate increased documentation in primary care EHRs within broader timeframes since hospitalisation (eg looking at 60 day windows from an event) but the proportions of records with exact date matches remain relatively low [33].

Strengths and limitations

This is the second Australian study [20], to our knowledge, to use full-scale record linkage to compare acute events between general practice EHRs and population-based data collections. Coded diagnoses from NSW hospital admissions, emergency department and cause of death data was used as the reference standard against which accuracy of the diagnostic algorithms for the acute serious conditions was benchmarked. The majority of validation studies of primary care EHR use EHR reviews as the reference standard [29, 34]. However, as EHR reviews are time consuming and labour intensive, the sample size is often small which limits the generalisability of results [10]. Linking to external data sources as the reference standard enables a much larger sample size and improves study power and representativeness. The limitation of this approach is that the recording of diagnoses in the linked datasets may be inaccurate or incomplete, accordingly inaccurately estimating event prevalence. We did not access private emergency department data and patients with milder symptoms (eg milder strokes) may not present to hospital or may present outside NSW.

MedicineInsight patients are broadly similar to Australian patients who visited a general practice during 2019–20, in terms of age, sex and socioeconomic status [28]. However, this study was limited to NSW and to practices using the INCA extraction tool. Furthermore, to improve data completeness this study was based on a cohort of patients who regularly seek health care and have a record in one of the linked administrative datasets in the past 11 years. Compared to the general patient population the study cohort was biased towards older patients over 70 years and females. Results may not be generalisable to people who have less morbidity and utilise health care less frequently. As estimates of PPV and NPV depend on the incidence of the specific health condition [34], the PPV estimates returned in this study may be higher, and our NPV estimates may be lower, than those yielded by the diagnostic algorithms in a population with a lower incidence of the condition. A further threat to the generalisability of the results arises from only including practices from NSW in this study, however we expect these results would be similar in other jurisdictions.

Data in this study should not be used to determine true incidence of acute conditions. Firstly, the study cohort is not representative of the general patient population as discussed above. Secondly, the validity of information recorded in both MedicineInsight and reference datasets is unclear, for example we miscategorise as false positives cases where patients present only to general practices and not hospital. We did not evaluate the veracity of the hospital or mortality data, just its agreement with MedicineInsight algorithms. Finally, healthcare access patterns across the world were affected by COVID-19 and related measures, potentially impacting results during 2020 [35].


Our findings provide valuable insights about the identification of acute serious events in Australian general practice with important implications for research. Identifying acute serious events, often managed in non-primary care settings, in Australian general practice EHRs alone will lead to undercounting of patients with these acute events and is insufficient for most observational research. Limitations related to data extraction from general practice EHRs and study design only partly explain observed low sensitivities and PPVs. Our findings suggest that general practice records complement hospital and mortality data by improving case ascertainment, and that both are required for robust estimation of acute serious events.


We are grateful to the general practices and general practitioners who participate in MedicineInsight and the patients whose de-identified data makes this work possible. We also acknowledge NPS MedicineWise staff, particularly Lisa Quick and Josephine Belcher who contributed to this study and CHeReL for conducting the data linkage necessary for this analysis.

Statements on conflicts of interest

SA, AP and KC are employees of NPS MedicineWise, the custodian of MedicineInsight data.

Ethics statement

In December 2017, NPS MedicineWise was granted ethics approval for the standard operations and uses of the MedicineInsight database by the Royal Australian College of General Practitioners (RACGP) National Research and Evaluation Ethics Committee (NREEC 17-017). Additional ethical approval for privacy preserving linkage of de-identified MedicineInsight data with NSW hospital and death data held by CHeReL was obtained from the NSW Population and Health Service Research Ethics Committee (PHSREC) (Application number 2021/ETH00023). The project also received approval from the independent MedicineInsight Data Governance Committee, on 12 July 2020 (Application number 2020–24).

Declaration of funding

The study was funded by the Australian Government Department of Health and Aged Care. The funding body had no role in the design of the study, data collection, analysis or interpretation, nor in writing the manuscript.


95% CI: 95% confidence interval – a range of values that are likely to encompass the true value
ABS: Australian Bureau of Statistics
APDC: Admitted Patient Data Collection
ASGS-RA: Australian Statistical Geography Standard Remoteness Areas
BP: Best Practice- a general practice clinical information system
CHeReL: Centre for Health Record Linkage (NS)
CIS: Clinical information system
COVID-19: Coronavirus SARS-CoV2
CODURF: Cause of Death Unit Record File. Also referred to as ACR CODURF (Australian Coordinating Registry Cause of Death Unit Record File)
CPRD: Clinical Practice Research Datalink
DUSC: Drug Utilisation Subcommittee (Commonwealth Department of Health and Aged Care)
EDDC: Emergency Department Data Collection
EHR: Electronic Health Record-the patient’s health record within the general practice clinical information system
GP: General practice
GRHANITE: GeneRic Health Network Information Technology for the Enterprise- a data extraction tool used with GENERAL PRACTICE clinical information systems
ICD-10 & ICD-10 AM: International Classification of Diseases version 10 (and the Australian Modification)
INCA: Integrated Care- a data extraction tool used with GENERAL PRACTICE clinical information systems
IRSAD: ABS Index of Relative Socioeconomic Advantage and Disadvantage
MBS: Medicare Benefits Schedule
MD: Medical Director- a general practice clinical information system
NHMRC: National Health and Medical Research Council
NREEC: National Research and Evaluation Ethics Committee
NPV: Negative predictive value
NSW: New South Wales
p-value: A test of statistical significance
PHSREC: NSW Population and Health Services Research Ethics Committee
PBAC: Pharmaceutical Benefits Advisory Committee
PBS: Pharmaceutical Benefits Scheme
PoA: Percentage of agreement
PPN: Project specific Person Number
PPRL: Privacy preserving record linkage – a method using a hashed key generated at source from first name, last name, date of birth and address (ie a Bloom filter) to link records across datasets without needing to transfer or expose identifying information to researchers
PPV: Positive predictive value
RACGP: Royal Australian College of General Practitioners
RID: Reference Index Date
RBDM: Register of Births Deaths and Marriages
SAS: A statistical software package
SNOMED-CT: Systematized Nomenclature of Medicine Clinical Terms
SURE: Secured Unified Research Environment
SEIFA: ABS Socio-Economic Indexes for Areas
TGA: Therapeutic Goods Administration
VTE: Venous thromboembolism


  1. Campanella P, Lovato E, Marone C, Fallacara L, Mancuso A, Ricciardi W, et al. The impact of electronic health records on healthcare quality: a systematic review and meta-analysis. Eur J Public Health. 2016;26(1):60–4. 10.1093/2016
  2. Nguyen L, Bellucci E, Nguyen LT. Electronic health records implementation: an evaluation of information system impact and contingency factors. International journal of medical informatics. 2014;83(11):779–96. 10.1016/2014
  3. Gentil ML, Cuggia M, Fiquet L, Hagenbourger C, Le Berre T, Banatre A, et al. Factors influencing the development of primary care data collection projects from electronic health records: a systematic review of the literature. BMC Med Inform Decis Mak. 2017;17(1):139. 10.1186/2017
  4. Youens D, Moorin R, Harrison A, Varhol R, Robinson S, Brooks C, et al. Using general practice clinical information system data for research: the case in Australia. Int J Popul Data Sci. 2020;5(1):1099. 10.23889/2020
  5. Busingye D, Gianacas C, Pollack A, Chidwick K, Merrifield A, Norman S, et al. Data Resource Profile: MedicineInsight, an Australian national primary health care database. Int J Epidemiol. 2019;48(6):1741–h. 10.1093/2019
  6. Canaway R, Boyle DI, Manski-Nankervis JE, Bell J, Hocking JS, Clarke K, et al. Gathering data for decisions: best practice use of primary care electronic records for research. Med J Aust. 2019;210 (Suppl 6):S12–S6. 10.5694/2019
  7. Dawda P, True A, Dickinson H, Janamian T, Johnson T. Value-based primary care in Australia: how far have we travelled? Med J Aust. 2022;216 Suppl 10(S10):S24–S7. 10.5694/2022
  8. Australian Institute of Health and Welfare. Developing a National Primary Health Care Data Asset: consultation report. Canberra: AIHW; 2019. Available from:

  9. Canaway R, Boyle D, Manski-Nankervis J, Gray K. Primary Care Data and Linkage: Australian dataset mapping and capacity building. Melbourne Academic Centre for Health for the Australian Health Research Alliance; 2020. Available from:

  10. Havard A, Manski-Nankervis JA, Thistlethwaite J, Daniels B, Myton R, Tu K, et al. Validity of algorithms for identifying five chronic conditions in MedicineInsight, an Australian national general practice database. BMC Health Serv Res. 2021;21(1):551. 10.1186/2021
  11. Kitsos A, Peterson GM, Jose MD, Khanam MA, Castelino RL, Radford JC. Variation in Documenting Diagnosable Chronic Kidney Disease in General Medical Practice: Implications for Quality Improvement and Research. Journal of primary care & community health. 2019;10:2150132719833298. 10.1177/2019
  12. Chidwick K, Kiss D, Gray R, Yoo J, Aufgang M, Zekry A. Insights into the management of chronic hepatitis C in primary care using MedicineInsight. Aust J Gen Pract. 2018;47(9):639–45. 10.31128/2018
  13. Gallagher AM, Dedman D, Padmanabhan S, Leufkens HGM, de Vries F. The accuracy of date of death recording in the Clinical Practice Research Datalink GOLD database in England compared with the Office for National Statistics death registrations. Pharmacoepidemiol Drug Saf. 2019;28(5):563–9. 10.1002/2019
  14. Vinogradova Y, Coupland C, Hippisley-Cox J. Use of combined oral contraceptives and risk of venous thromboembolism: nested case-control studies using the QResearch and CPRD databases. BMJ. 2015;350:h2135. 10.1136/2015
  15. Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, et al. Data Resource Profile: Clinical Practice Research Datalink (CPRD). Int J Epidemiol. 2015;44(3):827–36. 10.1093/2015
  16. Okoli GN, Myles P, Murray-Thomas T, Shepherd H, Wong ICK, Edwards D. Use of Primary Care Data in Research and Pharmacovigilance: Eight Scenarios Where Prescription Data are Absent. Drug Saf. 2021;44(10):1033–40. 10.1007/2021
  17. The University of Melbourne. GRHANITE Health Informatics Unit 2022 [11 April 2020]. Available from:

  18. Precedence Health Care. cdmNET 2020 [cited 2022 1/9/2022]. Available from:

  19. World Health Organisation. ATC: Structure and principles 2021 [cited 2022 1/9/2022]. Available from:

  20. Myton R PA, Havard A, Belcher J, Annear K, Chidwick K. MedicineInsight report: Validation of the MedicineInsight database: the accuracy of death recording in the MedicineInsight general practice data compared with the National Death Index in Australia. Sydney: NPS MedicineWise; 2021 2021. Available from:

  21. Centre for Health Record Linkage (CheReL). Data dictionaries: CheReL; 2022 [cited 2022 18/02/2022]. Data dictionaries for datasets in CheReL master linkage map]. Available from:

  22. Irvine K, Hall R, Taylor L. A profile of the Centre for Health Record Linkage. Int J Popul Data Sci. 2019;4(2):1142. 10.23889/2018
  23. Irvine K, Smith M, De Vos R, Brown A, Ferrante A, Boyd J, et al. Real world performance of privacy preserving record linkage. International Journal of Population Data Science. 2018;3(4). 10.23889/2018
  24. Sax Institute. Introduction to SURE: Sax Institute Sydney, Australia; 2017 [cited 2022 1/9/2022]. Available from:

  25. Royal Australian College of General Practitioners (RACGP). RACGP Standards for General Practices. 2014

  26. Health Communication Network Limited. Practice Incentives Program (PIP) eHealth Incentive: Requirement 3—Data Records and Clinical Coding. 2014

  27. Trevethan R. Sensitivity, Specificity, and Predictive Values: Foundations, Pliabilities, and Pitfalls in Research and Practice. Frontiers in Public Health. 2017;5. Available from:

  28. NPS MedicineWise. General Practice Insights Report July 2019–June 2020 including analyses related to the impact of COVID-19. Sydney: NPS MedicineWise; 2021. Available from:

  29. McBrien KA, Souri S, Symonds NE, Rouhi A, Lethebe BC, Williamson TS, et al. Identification of validated case definitions for medical conditions used in primary care electronic medical record databases: a systematic review. Journal of the American Medical Informatics Association : JAMIA. 2018;25(11):1567–78. 10.1093/2018
  30. Schultz SE, Rothwell DM, Chen Z, Tu K. Identifying cases of congestive heart failure from administrative data: a validation study using primary care patient records. Chronic diseases and injuries in Canada. 2013;33(3). Available from:

  31. Persson R, Sponholtz T, Vasilakis-Scaramozza C, Hagberg KW, Williams T, Kotecha D, et al. Quality and Completeness of Myocardial Infarction Recording in Clinical Practice Research Datalink Aurum. Clinical epidemiology. 2021;13:745–53. 10.2147/2021
  32. Arana A, Margulis AV, Varas-Lorenzo C, Bui CL, Gilsenan A, McQuay LJ, et al. Validation of cardiovascular outcomes and risk factors in the Clinical Practice Research Datalink in the United Kingdom. Pharmacoepidemiol Drug Saf. 2021;30(2):237–47. 10.1002/2021
  33. Morgan A, Sinnott SJ, Smeeth L, Minassian C, Quint J. Concordance in the recording of stroke across UK primary and secondary care datasets: a population-based cohort study. BJGP Open. 2021;5(2):BJGPO.2020.0117. 10.3399/2020
  34. Benchimol EI, Manuel DG, To T, Griffiths AM, Rabeneck L, Guttmann A. Development and use of reporting guidelines for assessing the quality of validation studies of health administrative data. Journal of clinical epidemiology 2011;64(8):821–9. 10.1016/2010
  35. Sutherland K, Chessman J, Zhao J, Sara G, Shetty A, Smith S, et al. Impact of COVID-19 on healthcare activity in NSW, Australia. Public Health Res Pract. 2020;30(4). 10.17061/2020
  36. Australian Institute of Health and Welfare. Australian Burden of Disease Study: Methods and supplementary material 2018. Canberra: AIHW; 2021. Available from:

Article Details

How to Cite
Ahmed, S., Pollack, A., Havard, A., Pearson, S.-A. and Chidwick, K. (2023) “Agreement of acute serious events recorded across datasets using linked Australian general practice, hospital, emergency department and death data: implications for research and surveillance”, International Journal of Population Data Science, 8(1). doi: 10.23889/ijpds.v8i1.2118.

Most read articles by the same author(s)

1 2 > >>