Deep Learning and NLP For Knowledge Extraction from Laboratory Reports
Main Article Content
Abstract
Introduction
Due to the ever-growing volume and complexity of clinical data, it has become a tedious task to extract information from data for secondary uses such as decision support, quality assurance, and outcome analysis. Recently, there have been great advances in Natural Language Processing (NLP) approaches that automate knowledge extraction from clinical reports in order to save costs and improve efficiency.
Objectives/Approach
Our goal is the development of an NLP tool designed to automatically extract and encode clinical information from laboratory reports. This study describes and evaluates our NLP tool on provincial repositories of laboratory tests and results called Ontario Laboratory Information System (OLIS). OLIS is an electronic system that covers >200 labs and stores patients’ current and past test results as patients move through different areas of the healthcare system. Our NLP tool is a modular system of pipelined components including Named Entity Recognition module for extracting mentions of virus and test mentions and inference to combine extracted entities into a meaningful outcome.
Results
Initial analyses were conducted on a segment of OLIS related to laboratory tests for respiratory viruses. This data included over a million observations corresponding to ~100 Logical Observation Identifiers Names and Codes (LOINC), with >40,000 unique strings. The clinical text was cleaned, tokenized, and parsed using an in-house text algorithm that was continually refined with manual review from clinical experts. This data was then encoded as virus and test types to be used as a ground truth. The NLP tool was built on ground truth data and achieved an accuracy greater than 95%.
Conclusion/Implications
Approaches like these can be applied to many areas of health research that make use of clinical reports. Our methods, when optimized and validated, can be deployed into clinical systems to provide on-the-spot analysis of various laboratory reports.