Inferring sensitivity and specificity of phenotyping algorithms using positive and negative predictive value in validation study in observational health data

Main Article Content

Mingkai Peng
Rosa Gini
Tyler Williamson

Abstract

Introduction
In observational health data, phenotyping algorithms are needed to process raw information into clinically relevant features. Validation studies traditionally estimate sensitivity and specificity by comparing the phenotyping algorithm with a reference standard on a population sample. There are challenges to conduct validation studies for conditions with low prevalence.


Objectives and Approach
We propose a novel and efficient method for conducting validation studies to indirectly estimate the sensitivity and specificity. We simulated datasets with different levels of disease prevalence and phenotyping algorithms with different sensitivities and specificity. We applied both the traditional (direct) and new (indirect) method on simulated data to estimate the sensitivity and specificity and compare the performance of the two methods. We also designed a gate to exclude true negatives to improve study efficiency on conditions with low prevalence and sensitive analysis was conducted on the imperfect gate.


Results
The new (indirect) method provided better or comparable accuracy in estimating both sensitivity and specificity compared to the traditional (direct) method. Applying a gate enabled us to conduct validation study in conditions with very low prevalence. An imperfect gate results in the overestimation of sensitivity but has minimal effect on specificity.


Conclusion/Implications
The new (indirect) method provides an alternative way to conduct validation studies in observational health data with improvement in estimating accuracy.

Introduction

In observational health data, phenotyping algorithms are needed to process raw information into clinically relevant features. Validation studies traditionally estimate sensitivity and specificity by comparing the phenotyping algorithm with a reference standard on a population sample. There are challenges to conduct validation studies for conditions with low prevalence.

Objectives and Approach

We propose a novel and efficient method for conducting validation studies to indirectly estimate the sensitivity and specificity. We simulated datasets with different levels of disease prevalence and phenotyping algorithms with different sensitivities and specificity. We applied both the traditional (direct) and new (indirect) method on simulated data to estimate the sensitivity and specificity and compare the performance of the two methods. We also designed a gate to exclude true negatives to improve study efficiency on conditions with low prevalence and sensitive analysis was conducted on the imperfect gate.

Results

The new (indirect) method provided better or comparable accuracy in estimating both sensitivity and specificity compared to the traditional (direct) method. Applying a gate enabled us to conduct validation study in conditions with very low prevalence. An imperfect gate results in the overestimation of sensitivity but has minimal effect on specificity.

Conclusion/Implications

The new (indirect) method provides an alternative way to conduct validation studies in observational health data with improvement in estimating accuracy.

Article Details

How to Cite
Peng, M., Gini, R. and Williamson, T. (2018) “Inferring sensitivity and specificity of phenotyping algorithms using positive and negative predictive value in validation study in observational health data”, International Journal of Population Data Science, 3(4). doi: 10.23889/ijpds.v3i4.951.

Most read articles by the same author(s)

1 2 3 4 > >>