In observational health data, phenotyping algorithms are needed to process raw information into clinically relevant features. Validation studies traditionally estimate sensitivity and specificity by comparing the phenotyping algorithm with a reference standard on a population sample. There are challenges to conduct validation studies for conditions with low prevalence.
Objectives and Approach
We propose a novel and efficient method for conducting validation studies to indirectly estimate the sensitivity and specificity. We simulated datasets with different levels of disease prevalence and phenotyping algorithms with different sensitivities and specificity. We applied both the traditional (direct) and new (indirect) method on simulated data to estimate the sensitivity and specificity and compare the performance of the two methods. We also designed a gate to exclude true negatives to improve study efficiency on conditions with low prevalence and sensitive analysis was conducted on the imperfect gate.
The new (indirect) method provided better or comparable accuracy in estimating both sensitivity and specificity compared to the traditional (direct) method. Applying a gate enabled us to conduct validation study in conditions with very low prevalence. An imperfect gate results in the overestimation of sensitivity but has minimal effect on specificity.
The new (indirect) method provides an alternative way to conduct validation studies in observational health data with improvement in estimating accuracy.