Efficient Identification of Patients Eligible for Clinical Studies Using Case-Based Reasoning on The Scottish Health Research Register (SHARE)

Main Article Content

Wen Shi
Tom Kelsey
Frank Sullivan


Trials often struggle to achieve their target sample size with only half doing so. Some researchers have turned to Electronic Health Records (EHRs), seeking a more efficient way of recruitment. The Scottish Health Research Register (SHARE) obtained patients’ consent for their EHRs to be used as a searching base from which researchers can find potential participants. However, due to the fact that EHR data is not complete, sufficient or accurate, a database search strategy may not generate the best case-finding result.

Objectives and Approach
A retrospective study was conducted to evaluate the performance of a case-based reasoning method in identifying participants for population-based clinical studies which had recruited through SHARE. A case-based reasoning framework was applied to nine studies with 119 total participants using two-fold cross-validation. Records of 30,000 random individuals were also merged with each test set to simulate the real-world recruitment setting. A prediction score for study participation was generated for each one in the test set through comparison of their diagnosis, procedure, pharmaceutical prescription, and laboratory test results attributes and those of the participants of a particular study. Evaluation was conducted by calculating Area Under the ROC Curve and information retrieval metrics for the ranking list of the test set by prediction score. We also compared the most likely participants as identified by searching a database to those ranked highest by our model.

The average ROCAUC for nine projects was 81% indicating strong predictive ability. However, the derived ranking lists showed lower predictive performance. 21% of the persons ranked within top 50 positions being the same as identified by searching databases.

Conclusion / Implications
Case-based reasoning may be more effective than database search strategy for participant identification. This hypothesis requires a prospective study for further validation. The lower performance of ranking lists suggests improvements are needed in the collection and curation of EHRs.

Article Details

How to Cite
Shi, W., Kelsey, T. and Sullivan, F. (2020) “Efficient Identification of Patients Eligible for Clinical Studies Using Case-Based Reasoning on The Scottish Health Research Register (SHARE)”, International Journal of Population Data Science, 5(5). Available at: https://ijpds.org/article/view/1509 (Accessed: 17January2021).