Markup: A Web-Based Clinical Annotation Tool with Enhanced Ontology Mapping
Main Article Content
Abstract
Introduction
Unstructured free-text clinical notes often contain valuable information relating to patient symptoms, prescriptions and diagnoses. These can assist with better care for patients and novel healthcare research if transformed into accessible, structured clinical text. In particular, Natural Language Processing (NLP) algorithms can produce such structured outputs, but require gold standard data to train and validate their accuracy. While existing tools such as Brat and Webanno provide interfaces to manually annotate text, there is a lack of capability to efficiently annotate complex clinical information.
Objectives and Approach
We present Markup, an open-source, web-based annotation tool developed for use within clinical contexts by domain experts to produce gold standard annotations for NLP development. Markup incorporates NLP and Active Learning technologies to enable rapid and accurate annotation of unstructured documents. Markup supports custom user configurations, automated annotation suggestions, and automated mapping to existing clinical ontologies such as the Unified Medical Language System (UMLS), the Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT), or custom, user-defined ontologies.
Results
Markup has been tested on Epilepsy clinic letters, where captured annotations were used to build and test NLP applications. Markup allowed for inter-annotator statistics to be calculated in the case of multiple annotators. Re-annotation, following iterations of annotation definitions, was incorporated for flexibility. UMLS codes, certainty context, and multiple components from complex phrases were all able to be captured and exported in a structured format.
Conclusions / Implications
Markup allows gold standard annotations to be collected efficiently across unstructured text and is optimized to capture health-specific information. These annotations are important to develop and validate NLP algorithms that automate the capture of important information from clinic letters at scale.