Identification of International Society on Thrombosis and Haemostasis major and clinically relevant non-major bleed events from electronic health records: a novel algorithm to enhance data utilisation from real-world sources

Main Article Content

Alexander Hartenstein
Khaled Abdelgawwad
Frank Kleinjung
Stephen Privitera
Thomas Viethen
Tatsiana Vaitsiakhovich


In randomised controlled trials (RCTs), bleeding outcomes are often assessed using definitions provided by the International Society on Thrombosis and Haemostasis (ISTH). Information relating to bleeding events in real-world evidence (RWE) sources are not identified using these definitions. To assist with accurate comparisons between clinical trials and real-world studies, algorithms are required for the identification of ISTH-defined bleeding events in RWE sources.

To present a novel algorithm to identify ISTH-defined major and clinically-relevant non-major (CRNM) bleeding events in a US Electronic Health Record (EHR) database.

The ISTH definition for major bleeding was divided into three subclauses: fatal bleeds, critical organ bleeds and symptomatic bleeds associated with haemoglobin reductions. Data elements from EHRs required to identify patients fulfilling these subclauses (algorithm components) were defined according to International Classification of Diseases, 9th and 10th Revisions, Clinical Modification disease codes that describe key bleeding events. Other data providing context to bleeding severity included in the algorithm were: `interaction type' (diagnosis in the inpatient or outpatient setting), `position' (primary/discharge or secondary diagnosis), haemoglobin values from laboratory tests, blood transfusion codes and mortality data.

In the final algorithm, the components were combined to align with the subclauses of ISTH definitions for major and CRNM bleeds. A matrix was proposed to guide identification of ISTH bleeding events in the EHR database. The matrix categorises bleeding events by combining data from algorithm components, including: diagnosis codes, 'interaction type', 'position', decreases in haemoglobin concentrations (≥2 g/dL over 48 hours) and mortality.

The novel algorithm proposed here identifies ISTH major and CRNM bleeding events that are commonly investigated in RCTs in a real-world EHR data source. This algorithm could facilitate comparison between the frequency of bleeding outcomes recorded in clinical trials and RWE. Validation of algorithm performance is in progress.


Antithrombotic therapies, such as anticoagulants, are recommended for the prevention of thrombotic events, for example, in patients with atrial fibrillation or after an acute myocardial infarction [1, 2]. Anticoagulants target the blood coagulation cascade to reduce the risk of blood clotting [3]. As such, all currently existing anticoagulation treatments have an associated possible risk of bleeding complications [1, 3].

The investigation of bleeding outcomes is an essential part of safety analyses in antithrombitic clinical trials [4]. In these trials, it is beneficial for such events to be described according to consistent definitions; this is important to facilitate comparisons between studies that have been run independently [5]. In the context of clinical studies, including randomised controlled trials (RCTs), bleeding events are often assessed according to the definitions recommended by the International Society on Thrombosis and Haemostasis (ISTH) guidelines. ISTH published a definition for major bleeding in 2005 [5] and added a definition for clinically relevant non-major (CRNM) bleeding in 2015 [6].

In addition to the evidence from RCTs, real-world evidence (RWE) is increasingly used as a valuable resource for informing clinical practice. Moreover, researchers may wish to extend the evidence base by considering the real-world validity of the data generated from clinical trials [7]. Large volumes of healthcare data are captured in the real-world setting, in which key information such as bleeding events may be recorded. In order to identify these bleeding events in real-word data sources, algorithms have been proposed that aim to guide the evaluation of bleeding events based on diagnostic codes and other information collected in databases [810].

The Cunningham algorithm is a validated scheme that may be used to identify bleeding-related hospitalisations as a surrogate for major bleeding events in RWE studies of oral anticoagulant use [8]. This algorithm guides the identification of bleeding-related hospitalisations and classification of the site of bleeding based on relevant International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) codes [8]. As such, it has been broadly utilised in RWE studies of oral anticoagulants [1113]. However, there are many key differences between the conceptual definition of bleeding as defined by the Cunningham algorithm and the ISTH, which can impede comparisons between RWE studies reporting Cunningham-based bleeds and RCTs reporting ISTH-based bleeds. Firstly, the Cunningham algorithm does not distinguish between major bleeds and CRNM bleeds (i.e. it does not stratify by severity). Secondly, compared with the ISTH bleeding definition, the Cunningham algorithm places a larger emphasis on different types of bleeds; for example, there is a particular focus on gastrointestinal (GI) bleeds, with distinctions made between upper and lower GI bleeds. Moreover, the Cunningham algorithm includes a ‘trauma exclusion’ procedure whereby any bleeding events that occur in the temporal vicinity of a traumatic event are fully removed; for example, bleeds that occur shortly after a fall are not included. In contrast, trauma-related bleeds, except for surgery-related bleeds, are not explicitly excluded by the ISTH definitions. Lastly, the Cunningham algorithm was designed to identify events in health claims databases and thus is not informed by laboratory or mortality data that are available in other RWE data sources such as electronic health records (EHR).

Consequently, there remains a need for algorithms that can be used to identify bleeding events in RWE data sources that align with the definitions used for bleeding events in clinical trials. The aim of this study was to develop a novel algorithm to identify the occurrence of ISTH major and CRNM bleeding events in the Optum® de-identified Electronic Health Record (Optum® de-identified EHR) dataset to support the comparison of RWE studies with clinical trials assessing ISTH bleeds.


Source ISTH bleeding definition

Key specifications from the ISTH definitions of bleeding in studies of atrial fibrillation and non-surgical venous thromboembolism were used to construct the algorithm [5]. An ISTH major bleed is defined as bleeding with symptomatic presentation and fatal bleeding and/or bleeding in a critical area or organ (e.g. intracranial, intraspinal, intra-ocular, retroperitoneal, intra-articular or pericardial bleeds, or intramuscular bleeds with compartment syndrome) and/or bleeding causing a fall in haemoglobin levels by at least 2 g/dL (1.24 mmol/L) leading to transfusion of two or more units of whole blood or red cells [5]. An ISTH CRNM bleed is defined as any sign or symptom of haemorrhage that does not fit the definition of a major bleed but does meet one of the following criteria: requires medical intervention by a healthcare professional; leads to hospitalisation or increased level of care; or prompts a face-to-face evaluation [6].

Concept development

To inform the design of the algorithm, the ISTH major bleeding definition was first divided into logical subclauses to focus on the key components. Three subclauses were identified within the definition of ISTH major bleeding, and each subclause was named and implemented independently. If any of these three criteria were fulfilled, a bleed could be classed as an ISTH major bleed: 1) fatal bleed: a bleed resulting in death; 2) critical organ bleed: a bleed in a critical area or organ, defined as intracranial, intraspinal, intra-ocular, retroperitoneal, intra-articular or pericardial bleeds, or intramuscular bleeds with compartment syndrome; or 3) symptomatic bleed: a bleed causing a fall in haemoglobin levels by at least 2 g/dL (1.24 mmol/L) leading to transfusion of two or more units of whole blood or red cells [5]. CRNM bleeding was defined primarily as a bleeding of lesser clinical severity but still requiring physical interaction with a healthcare provider [6].

We assessed the database information required to identify patients fulfilling the subclauses of the bleeding definitions. These types of information are referred to as ‘algorithm components’.

Algorithm components

Disease codes

The algorithm was designed using individual-patient level information as collected in the Optum® de-identified EHR dataset, a US RWE data set including health records for over 100 million insured and uninsured people. This source incorporates EHR data across various therapeutic areas with data from 150,000 providers, 2000 hospitals and 7000 clinics and is considered representative of US demographics, critically, age, sex and race across a broad range of geographic locations [14]. International Classification of Diseases (ICD) codes are the global standard system for diagnostic health information and are used for recording health and statistics in primary, secondary and tertiary care [15]. Owing to the global use of the ICD system, these codes are standard entries in many RWE data sources. ICD-Clinical Modification (ICD-CM) codes are used in the Optum® de-identified EHR dataset to collect information on disease diagnosis [16].

Firstly, lists of diagnosis codes relating to events described in the ISTH bleeding subclauses were derived. The initial code lists were developed with reference to the published codes described for the Cunningham algorithm [8]. In alignment with the Cunningham algorithm, relevant disease codes were initially grouped as codes for ‘probable bleeds’ that require no verification or codes for ‘possible bleeds’ that need verification. An example of a probable bleed is ‘531.0x Acute gastric ulcer with hemorrhage’, while an example possible bleed is ‘285.9 Anemia, unspecified’.

Although the initial search for bleed codes began with the Cunningham ICD-9-CM codes, translation into International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) codes was necessary, as ICD-10-CM coding widely replaced the ICD-9-CM coding from 2015 (the Cunningham algorithm was published in 2011) [17]. However, there are known ICD-9-CM to ICD-10-CM mapping issues – ICD-10-CM contains medical information with more granularity than ICD-9-CM, such as additional information relating to the position of a haemorrhage on the left or right side of the body, or specification of location in the sub-branch of an artery. Forward and reverse mappings were performed by using the most recently available general equivalence mappings provided by Centers for Medicare and Medicaid Services in 2018 [18], and the final mapping was defined as the union of forward and reverse mappings.

To check the completeness and identify potentially missing bleeding codes that were not included in the Cunningham algorithm or missed by mapping, an additional search, informed by clinical knowledge, was conducted across the ICD-9-CM and ICD-10-CM ontologies, including keyword searches for the terms ‘contusion’ and ‘hemorrhage’. Additions to bleeding code lists were proposed by a single physician, and unanimous agreement of two further physicians was required for all changes. This resulted in the addition of several bleeding codes. Firstly, intraocular bleeds were added; intraocular bleeds are identified as a ‘critical organ or area’ by ISTH but are not included in the Cunningham algorithm. Secondly, traumatic intracranial haemorrhage (ICH) codes were included; although the Cunningham algorithm includes non-traumatic ICH, trauma codes are not included. Notably, in patients receiving anticoagulation treatments, there is an associated risk of traumatic bleeds, such as a subdural haematoma (occurring when bridging veins tear in the context of a head trauma); therefore, these types of bleeds were considered of importance for inclusion [19]. Further codes added included those for ruptured aortic aneurysms, as obvious retroperitoneal bleeds, and injury to blood vessels, which can lead to visible haemorrhaging. Finally, codes for contusions, characterised by visible bleeding into soft tissues, were also included.

Following the identification of a full list of relevant diagnostic codes, the category ‘probable bleeds’ was separated into the two subsets, critical organ bleeds and overt bleeds. This is because the ISTH major bleed definition distinguishes between bleeds in a critical area or organ (which do not require verification) and those occurring elsewhere. As such, critical organ bleeds included intracranial, intraspinal, intra-ocular, retroperitoneal, intra-articular and pericardial bleeds, and intramuscular bleeds with compartment syndrome while the remaining probable bleed codes were labelled overt bleed. The third subset, named possible bleeds, contained bleed codes that could be inferred from clinical presentation but could not be clearly established based on a diagnosis code alone. Therefore, in total, three mutually exclusive categories were defined for bleeding, named critical organ bleed, overt bleed and possible bleed (Figure 1). The full lists of disease codes relevant to each category are shown in Supplementary Tables 1–3.

Figure 1: Components included in the bleeding algorithm. ED, emergency department; HCP, healthcare professional; ICD-9-CM, International Classification of Diseases, 9th Revision, Clinical Modification; ICD-10-CM, International Classification of Diseases, 10th Revision, Clinical Modification; ISTH, International Society on Thrombosis and Haemostasis.

Finally, codes in the critical organ bleed and overt bleed code sets were subdivided into ‘trauma’ or ‘not trauma’, allowing for sensitivity analyses in which bleeds due to trauma could be excluded. ‘Trauma’ codes included International Classification of Diseases categories for trauma (ICD-10 S00-T88; ICD-9-CM 958-959.99, 996-999.99) or parent codes including the words ‘trauma’ or ‘injury’. We also included ICD-10-CM codes with labels of ‘intraoperative complications due to hemorrhage’ or ‘post procedural hemorrhage’ in the ‘trauma’ category.

Other information from diagnosis records

Further diagnostic details in the Optum® de-identified EHR dataset were considered relevant to understanding bleeding severity and were therefore incorporated into the algorithm (Figure 1). In addition to the data derived from ICD-CM disease codes, data were obtained from Current Procedural Terminology-4 (CPT-4), Healthcare Common Procedure Coding System (HCPCS), International Classification of Diseases, and 10th Revision, Procedure Coding System (ICD-10-PCS).

Firstly, ‘interaction type’ included details of where the diagnosis occurred (in an inpatient, emergency department or outpatient setting). These data can provide context for the bleeding events to give an indication of severity; for example, a code reported during hospitalisation may be more likely to be severe than one from an outpatient setting. Interaction types that denoted physical encounters were included and were separated into inpatient and outpatient categories. Interactions over email or telephone were not included, as these represented non-physical encounters. The term ‘inpatient interaction type’ was applied for either an inpatient or emergency department patient, but all other interaction types found in the Optum® de-identified EHR dataset (excluding non-physical encounters) were considered ‘outpatient interaction types’.

Secondly, ‘position’ provided information on the relationship between the diagnosis code and visits with healthcare professionals; for example, a diagnosis code associated with the primary position for a visit may indicate the reason for the visit or the severity of the event coded in this position. The Optum® de-identified EHR dataset provides two columns of either ‘primary’ or ‘discharge’ diagnosis; the values in these columns are either 0, 1, or NULL (entries which the algorithm treats as 0). Primary diagnosis or discharge diagnosis were treated as equal indicators of severity for the purposes of this algorithm, (i.e. ‘primary diagnosis position’ referred to either primary or discharge diagnosis). Diagnosis codes not in the primary or discharge diagnosis position were considered ‘secondary position’.

The Optum® de-identified EHR dataset includes all blood laboratory test results ordered for all patients. As such, data for haemoglobin values over time were available for each patient. A windowing function was included to identify drops in haemoglobin levels of at least 2 g/dL over a maximum 48-hour time window. A time frame of 48 hours was selected to indicate a clear temporal relationship of an event and the consequence (haemoglobin drop), as is used in other bleeding algorithms [20].

Moreover, data were also included for ‘blood transfusion procedures’. Procedure codes relating to whole blood, red blood cell or frozen red blood cell transfusions were identified.

In addition, mortality data were incorporated in the algorithm. Mortality information is available as ‘month of death’ in the Optum® de-identified EHR database. These data are retrieved from various sources, such as the Death Master File, the Center for Medicare and Medicaid Services, claims records and clinical records indicating death; however, the reported cause of death is not included. For the algorithm presented here, the month of death provided in the Optum® de-identified EHR database was translated into a date by setting the date of death to the end of the month.

Building the algorithm

The algorithm components were combined with logical operators ‘or’ or ‘and’ to translate the ISTH major definitions and to identify different types of bleeds. By combining information from the algorithm components, the types of bleeds that occur in a diverse range of scenarios can be represented. Figure 2 presents three possible examples of combinations of algorithm components that could represent major bleeds and one possible combination that could represent a CRNM bleed.

Figure 2: Examples combining algorithm components to identify a bleeding event. Algorithm components, including diagnosis codes, interaction types, diagnosis positions, laboratory and blood transfusion procedures, and death, can be combined with logical ‘AND’ operators (shown as ‘+’) to construct subclauses of the ISTH major bleed and CRNM bleed definitions. Shown here are three possible examples of how algorithm components can be combined to form the different subclauses that meet the major bleed or CRNM bleed criteria. Three of the example combinations shown define ISTH major bleeds and one example combination defines a CRNM bleed.

The diagnosis code position and interaction types were used as proxies for bleeding severity. As such, by combining the data inputs obtained from the Optum® de-identified EHR dataset, an informed picture of the occurrence and context of bleeding events can be constructed, allowing for estimation of the bleeding severity. The interaction of these variables forms levels of severity; thus, a bleeding diagnosis in the inpatient/emergency department setting at the primary or discharge position is the most severe possibility for a code, whereas an outpatient diagnosis in a secondary position is the least severe. A matrix was developed to illustrate all possible combinations of the three diagnosis code sets and four possible combinations of diagnosis code information. Each possible intersection of these dimensions was systematically assigned to a subclause of ISTH major or CRNM bleeding.


The final bleeding algorithm matrix is summarised in Figure 3. In this algorithm, identification of bleeding events and classification according to ISTH major bleeding or CRNM bleeding events depends on the combinations of different items from the input list.

Figure 3: Bleeding algorithm for identification of bleeding events from real-world records. ‘Primary’ refers to a diagnosis code in the primary or discharge diagnosis position. CRNM, clinically relevant non-major; IN, inpatient interaction type; OUT, outpatient interaction type.

An ISTH major bleeding event is identified as fulfilling the conditions of any of the following subclauses:

Critical organ bleed: critical organ bleed code in the inpatient setting in any diagnosis position.

Fatal bleed: critical organ bleed code preceding death by at most 45 days in either of the following circumstances: a) inpatient setting in any diagnosis position, or b) outpatient setting as the primary or discharge diagnosis. It should be noted that any critical organ bleed in the inpatient setting is already considered an ISTH major bleed, as defined by the critical organ bleed subclause. The 45 day-timeframe was selected because of the low resolution of mortality data (only the monthof death is reported); for example, if a patient were to have a date of death on the first of the month, their date of death would be artificially set to the last day of the month. Therefore, this timeframe ensured EHR records prior to death were captured.

Symptomatic bleed: an overt bleed code in the inpatient setting in the primary or discharge diagnosis position that fulfils the bleeding verification criteria.

For codes that require verification, the bleed verification criteria are defined as either a procedure code for a blood transfusion or detected haemoglobin drop in the laboratory test results, as described above.

An ISTH CRNM bleed is identified when the bleeding event does not meet the criteria for major bleeding but does meet the following requirements: 1) overt bleed code in an inpatient setting; 2) overt bleed code in the outpatient setting as the primary or discharge diagnosis; 3) overt bleed code and diagnosis in the outpatient setting that is not in the primary or discharge diagnosis position and that satisfies the bleeding verification criteria; or 4) possible bleed code and diagnosis occurs in any setting and any diagnosis position but that also satisfies the bleeding verification criteria.

In instances of major bleeding, the criteria of a CRNM bleed are often also fulfilled; therefore, a 14-day window, centred around an ISTH major bleed event, was defined, during which a CRNM bleed cannot occur. This window was selected based on the estimated length of hospital stay for a major bleed and clinical knowledge of CRNM bleeds that progress to major bleeding events.


In this study, a novel algorithm was developed to identify ISTH bleeding events from the Optum® de-identified EHR dataset, a large database that is an exemplary RWE source. By introducing an approach that incorporates disease codes and variables such as laboratory and mortality data, this algorithm can identify events with different levels of severity to assist with accurate translation of clinical definitions of bleeding events. Future validation studies will be conducted to investigate the performance of this algorithm in EHR databases.

RWE sources, such as EHRs, insurance claims data and product/disease/patient registries, provide large volumes of healthcare data. Regulatory bodies are considering the potential relevance of these data for the generation of healthcare evidence to support regulatory decisions about effectiveness and safety of new treatments; this may be particularly relevant for cases in which it is not feasible to gather the necessary data in RCTs [21]. As a result, it is highly desirable that results from studies using secondary data sources, such as EHRs or claims databases, are comparable not only to other observational studies, but also to results from clinical trials. Standardised definitions of outcomes are essential to enable such comparisons. RWE generation is an integral part of the drug development process. RWE supports decision making from the early pre-clinical research through the life-cycle management of a new therapy. To investigate “real-world” treatment effects of a drug, for instance, patient populations and clinical events should be identifiable in RWE data sources, at least for common diseases.

Efforts to standardise definitions in clinical trials, such as the publication of bleeding definitions by the ISTH, are ongoing and it is important that RWE studies attempt to align with standardisation processes in clinical trials. As the utilisation of data from RWE studies becomes more prominent, it is hoped that learnings from the implementation of coding algorithms in RWE will influence the development of outcome definitions in clinical trial standardisation processes. One of the major improvements that could occur is the development of more rigorous definitions that leave less room for ambiguity. Outcome definitions, such as the ISTH bleeding definitions, often do not explicitly list clinical conditions and how to assess them; this is acceptable in the clinical trial setting, as patient outcomes are assessed by physicians or an independent adjudication board that can assess and define the outcome. However, upon implementation of such a definition in RWE, the problems associated with ambiguities become clear; for example, in the ISTH bleeding definition, it is not clear if traumatic events or large-scale haematomas should be classified as bleeding events. In the algorithm reported here, codes for contusions (indicated by the presence of primary or discharge diagnosis flags for a contusion code, or verification by a transfusion or drop in haemoglobin in the bleeding algorithm) were included if they were the primary reason for a patient to seek medical care; it could be advantageous to include such detailed information in clinical outcome definitions to ensure comparability and standardisation. Additionally, the definition for ISTH bleeds does not specifically address surgical bleeds; we included ‘intraoperative complications due to hemorrhage’ and ‘postprocedural hemorrhage’ in the ‘overt bleed’ and ‘critical organ bleed’ code sets with justifications that these codes either indicated bleeding beyond that expected for an operative procedure (i.e. ‘complications’ that also require ‘bleeding verification’ by transfusion or haemoglobin drops), or bleeding that occurred during procedures that were not surgeries, respectively. However, it would be best if all these cases were specifically addressed in a standardised definition. A further instance of ambiguity is for the intraocular bleeds, defined as a critical area and organ by ISTH. A vitreous haemorrhage is, by definition, an intraocular bleed and is quite alarming for patients, thus often prompting a visit to the emergency room; such an event would thus likely be classified as major bleeding by the algorithm presented here. However, it is likely that a physician adjudicator would closely assess the severity of the bleed and may not classify these bleeds as major. In the algorithm presented here, we have chosen to faithfully implement ISTH definitions as published, and have thus included vitreous haemorrhages among critical organ bleed codes. However, one can imagine a future standardised definition providing guidance with higher granularity for specific medical conditions.

Algorithms that guide the identification of bleeding outcomes from RWE data sources may be important to extend our understanding of treatment by enriching data sets. In a randomised, phase 2, dose-finding clinical trial, PACIFIC-AF ( NCT04218266), the safety of a once-daily anticoagulant, asundexian 20 mg or 50 mg, was demonstrated when compared with apixaban in patients with atrial fibrillation [22]. However, this study was not powered to test the differences in rates of thrombotic events, and the short follow-up time (12 weeks) limited the potential to study bleeding and thrombotic events outside of this time window. A follow-up study has been conducted that utilised RWE to supplement the data from the PACIFIC-AF trial. This ‘hybrid’ RWE study incorporated an external control arm that utilised data from the Optum® de-identified EHR dataset to augment the control arm of the PACIFIC-AF RCT and allowed long-term ‘projection’ of safety and efficacy events [Vaitsiakhovich et al. – manuscript in preparation]. Moreover, the study demonstrated the use of the novel bleeding algorithm reported here with successful mapping of bleeding events in the PACIFIC-AF external control arm.

The aims and applications for the novel bleeding algorithm presented here are distinct from the previously validated Cunningham algorithm in several ways. The Cunningham algorithm is designed to identify a different conceptual definition from the algorithm proposed herein, namely to detect bleeding-related hospitalisations in health claims data sources. Here, we identify bleeds and attempt to faithfully implement the conceptual definition proposed by the ISTH, by distinguishing between major and CRNM bleeds in both the inpatient and outpatient settings. Additionally, the algorithm here uses EHR data sources that allow the incorporation of bleeding verification data based on laboratory test results. Of note, the algorithm applied to identify bleeding events from real-world sources can have a substantial impact on the estimated rate of bleeding outcomes. Differences in methodologies used to identify bleeding events may contribute to the heterogeneity in reported rates across studies; indeed, a recent study that compared three algorithms for identifying bleeding-related hospitalisations observed varying levels of agreement [12].

There were some limitations associated with this study. Firstly, a validation study to determine positive predictive values, negative predictive values, or sensitivity and specificity has not yet been performed (but is in progress). Secondly, this algorithm was designed for use in the Optum® de-identified EHR dataset; therefore, there are several conditions that must be fulfilled if this algorithm is to be adapted to other RWE data sources. The algorithm requires diagnosis and procedure codes to be coded in ICD-9-CM or ICD-10-CM, CPT-4, HCPCS or ICD-10-PCS; however, translation or mapping to other coding systems is possible. The diagnosis position information and interaction type information are also required because the algorithm relies on this additional information as a proxy for severity; therefore, using this algorithm with RWE data sources that do not contain this information would not be possible. Of note, interaction type and diagnosis code position information are commonly reported in RWE data sources, meaning that the Optum® de-identified EHR dataset structure of this algorithm may be generalisable to other RWE data sources. An additional point to highlight is that different data sources, especially from different countries, vary in terms of scope and purpose, and the granularity of data may depend on the healthcare systems in place; this means that the varied use of primary and discharge diagnosis position flags may influence the performance of this algorithm, for example, if usage is for remuneration requirements versus simply hospital documentation. Including codes with primary or discharge diagnosis flags also introduces complexity for determining the actual event of a date; in cases in which a bleed has occurred at the beginning of a hospital stay, and the patient is discharged a week later, the bleed would be assigned to the date of discharge rather than when the bleed actually occurred. Future adaptation of the algorithm could look to resolve this challenge by using primary/discharge diagnosis as an indicator of severity with additional retrospective assessment to identify the first bleeding code that occurred in the visit. Laboratory information is also required for this algorithm, specifically the haemoglobin test results. It is possible that this algorithm could be adapted to solely utilise blood transfusion procedure codes for this purpose for future versions. Lastly, no direct input from the ISTH was sought. As such, the information used to develop the algorithm relied on published descriptions of the bleeding definitions, combined with input from clinicians with several years of clinical trial experience, in order to translate ambiguities into the precise programmatic language required to implement code algorithms on RWE.

Future efforts are required to assess the validity of this algorithm in EHR systems. Features of the algorithm, such as the 14-day window (centred at a major bleed) in which CRNM bleeds cannot occur, were selected using expert clinician knowledge; in a validation study, the impact of such decisions on algorithm performance will be assessed.


In conclusion, this novel algorithm was developed to identify ISTH major and CRNM bleeding events in real-world EHR data sources. Such clinical endpoints are often utilised in cardiovascular clinical trials. Further research is required to validate this algorithm.


This study was funded by Bayer AG, Berlin, Germany. Medical writing support was provided by Kate Ward of Oxford PharmaGenesis, Oxford, UK, funded by Bayer AG.

Statement on conflicts of interest

All authors are employees of Bayer AG.

Ethics statement

This study did not require ethical approval because no human participants were used in this study.


CPT-4 Current Procedural Terminology-4
CRNM Clinically relevant non-major
EHR Electronic health record
GI Gastrointestinal
HCPCS Healthcare Common Procedure Coding System
ICD-9-CM International Classification of Diseases, 9th Revision, Clinical Modification
ICD-10-CM International Classification of Diseases, 10th Revision, Clinical Modification
ICD-10-PCS International Classification of Diseases, 10th Revision, Procedure Coding System
ICH Intracranial haemorrhage
ISTH International Society on Thrombosis and Haemostasis
RCT Randomised controlled trial
RWE Real-world evidence


  1. Hindricks G, Potpara T, Dagres N, Arbelo E, Bax JJ, Blomström-Lundqvist C, et al. 2020 ESC Guidelines for the diagnosis and management of atrial fibrillation developed in collaboration with the European Association for Cardio-Thoracic Surgery (EACTS): The Task Force for the diagnosis and management of atrial fibrillation of the European Society of Cardiology (ESC). Developed with the special contribution of the European Heart Rhythm Association (EHRA) of the ESC. Eur Heart J. 2021;42(5):373–498. 10.1093/eurheartj/ehaa612

  2. Collet JP, Thiele H, Barbato E, Barthélémy O, Bauersachs J, Bhatt DL, et al. 2020 ESC Guidelines for the management of acute coronary syndromes in patients presenting without persistent ST-segment elevation. Eur Heart J. 2021;42(14):1289–367. 10.1093/eurheartj/ehaa575

  3. Chan NC, Weitz JI. Antithrombotic agents. Circ Res. 2019;124(3):426–36. 10.1161/circresaha.118.313155

  4. European Medicines Agency. Clinical investigation of medicinal products for prevention of stroke and systemic embolic events in patients with non-valvular atrial fibrillation; 2014 Jul [Cited 2022 Oct 21] [Available from:].

  5. Schulman S, Kearon C. Definition of major bleeding in clinical investigations of antihemostatic medicinal products in non-surgical patients. J Thromb Haemost. 2005;3(4):692–4. 10.1111/j.1538-7836.2005.01204.x

  6. Kaatz S, Ahmad D, Spyropoulos AC, Schulman S. Definition of clinically relevant non-major bleeding in studies of anticoagulants in atrial fibrillation and venous thromboembolic disease in non-surgical patients: communication from the SSC of the ISTH. J Thromb Haemost. 2015;13(11):2119–26. 10.1111/jth.13140

  7. Oyinlola JO, Campbell J, Kousoulis AA. Is real world evidence influencing practice? A systematic review of CPRD research in NICE guidances. BMC Health Serv Res. 2016;16(1):299. 10.1186/s12913-016-1562-8

  8. Cunningham A, Stein CM, Chung CP, Daugherty JR, Smalley WE, Ray WA. An automated database case definition for serious bleeding related to oral anticoagulant use. Pharmacoepidemiol Drug Saf. 2011;20(6):560–6. 10.1002/pds.2109

  9. Go A, Singer D, Cheetham T, Toh D, Reichman M, Graham D, et al. Mini-Sentinel medical product assessment: A protocol for assessment of dabigatran. Version 3; 2015. [cited 2022 Oct 3][Available from:].

  10. Yao X, Abraham NS, Sangaralingham LR, Bellolio MF, McBane RD, Shah ND, et al. Effectiveness and safety of dabigatran, rivaroxaban, and apixaban versus warfarin in nonvalvular atrial fibrillation. J Am Heart Assoc. 2016;5(6):e003725. 10.1161/jaha.116.003725

  11. Tamayo S, Frank Peacock W, Patel M, Sicignano N, Hopf KP, Fields LE, et al. Characterizing major bleeding in patients with nonvalvular atrial fibrillation: a pharmacovigilance study of 27 467 patients taking rivaroxaban. Clin Cardiol. 2015;38(2):63–8. 10.1002/clc.22373

  12. Coleman CI, Vaitsiakhovich T, Nguyen E, Weeda ER, Sood NA, Bunz TJ, et al. Agreement between coding schemas used to identify bleeding-related hospitalizations in claims analyses of nonvalvular atrial fibrillation patients. Clin Cardiol. 2018;41(1):119–25. 10.1002/clc.22861

  13. Costa OS, Kohn CG, Kuderer NM, Lyman GH, Bunz TJ, Coleman CI. Effectiveness and safety of rivaroxaban compared with low-molecular-weight heparin in cancer-associated thromboembolism. Blood Adv. 2020;4(17):4045–51. 10.1182/bloodadvances.2020002242

  14. Optum, Inc. Optum EHR data [cited 2022 Oct 3] [Available from:]

  15. World Health Organization. International Statistical Classification of Diseases and Related Health Problems (ICD); 2022 [cited 2022 Oct 3] [Available from]

  16. National Center for Health Statistics. International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) [cited 2023 Jul 6] [Available from].

  17. Hirsch JA, Nicola G, McGinty G, Liu RW, Barr RM, Chittle MD, et al. ICD-10: History and context. AJNR Am J Neuroradiol. 2016;37(4):596–9. 10.3174/ajnr.A4696

  18. Centers for Medicare & Medicaid Services. 2018 ICD-10 CM and GEMs [cited 2022 Oct 3] [Available from:]

  19. Gaist D, García Rodríguez LA, Hellfritzsch M, Poulsen FR, Halle B, Hallas J, et al. Association of antithrombotic drug use with subdural hematoma risk. JAMA. 2017;317(8):836–46. 10.1001/jama.2017.0639

  20. Mehran R, Rao SV, Bhatt DL, Gibson CM, Caixeta A, Eikelboom J, et al. Standardized bleeding definitions for cardiovascular clinical trials: a consensus report from the Bleeding Academic Research Consortium. Circulation. 2011;123(23):2736–47. https://10.1161/circulationaha.110.009449

  21. US Food and Drug Administration. Framework for FDA’s real-world evidence program; 2018 Dec [cited 2022 Oct 3] [Available from].

  22. Piccini JP, Caso V, Connolly SJ, Fox KAA, Oldgren J, Jones WS, et al. Safety of the oral factor XIa inhibitor asundexian compared with apixaban in patients with atrial fibrillation (PACIFIC-AF): a multicentre, randomised, double-blind, double-dummy, dose-finding phase 2 study. Lancet. 2022;399(10333):1383–90. 10.1016/S0140-6736(22)00456-1


Article Details

How to Cite
Hartenstein, A., Abdelgawwad, K., Kleinjung, F., Privitera, S., Viethen, T. and Vaitsiakhovich, T. (2023) “Identification of International Society on Thrombosis and Haemostasis major and clinically relevant non-major bleed events from electronic health records: a novel algorithm to enhance data utilisation from real-world sources”, International Journal of Population Data Science, 8(1). doi: 10.23889/ijpds.v8i1.2144.