New Approach Improves Pregnancy Data Accuracy for Research Using CPRD and Hospital Records
A research team lead by Associate Professor Claire Carson at the NIHR Policy Research Unit in Maternal and Neonatal Health and Care, has developed a practical way to improve how pregnancies are identified in electronic health records held by the UK Clinical Practice Research Datalink (CPRD). This approach, demonstrated through a study on pre-pregnancy care, combines data from several resources: the CPRD Pregnancy Register (PR), the Mother Baby Link (MBL), and Hospital Episode Statistics (HES) Maternity. By cleaning and enhancing the data, the new method makes pregnancy records more accurate and reliable, which is essential for better maternal health research and improving care for mothers and babies.
Pregnancy-related research relies on accurate health records, but one of the biggest challenges researchers face is correctly identifying pregnancies in large electronic health datasets. Some data sources, such as the CPRD now include algorithm-based pregnancy registers. To ensure that research produces meaningful findings, it is necessary to carefully consider whether data is fit for purpose, and check the reliability and accuracy of the data. The CPRD Pregnancy Register is a valuable resource for tracking pregnancies, but it has faced challenges in identifying pregnancies accurately due to incomplete data, conflicting records, and uncertain pregnancy outcomes. Reliable information allows researchers to evaluate healthcare services, track health trends, and develop strategies to improve outcomes for mothers and babies. With increasing reliance on electronic health records, addressing data quality issues in pregnancy registers has become a top priority.
The new study, published in the International Journal of Population Data Science (IJPDS), demonstrates that by reducing the uncertainty in identifying pregnancies from routine electronic health records, this method could lead to better evidence on pregnancy care, health outcomes, and public health policies, ultimately helping to improve the health of mothers and babies. The research team’s new algorithm adds data from hospital records, resolves conflicting information, and removes incomplete pregnancy records. It improved the dataset, identified 7% more pregnancies and reduced the number of uncertain pregnancy records.
The quality of data used in research directly influences the findings and conclusions of studies. It is important to have reliable ways to identify pregnancies for research and to inform healthcare policy. By providing a practical way to improve pregnancy data without needing extra expensive resources, this method makes it easier for researchers to carry out reliable studies and offers a valuable step forward in making sure health studies on mothers and babies are based on robust data.
Dr Yangmei Li, researcher at the National Perinatal Epidemiology Unit, University of Oxford, and first author, added that “Researchers should consider carefully how data quality may influence study findings and adapt the proposed approach based on their specific research questions.”
Click here to read the full article
Yangmei Li, NIHR Policy Research Unit in Maternal and Neonatal Health and Care, National Perinatal Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, UK
Li, Y., Kurinczuk, J. J., Alderdice, F., Quigley, M. A., Rivero-Arias, O., Sanders, J., Kenyon, S., Siassakos, D., Parekh, N., De Almeida, S. and Carson, C. (2025) “Addressing uncertainty in identifying pregnancies in the English CPRD GOLD Pregnancy Register: a methodological study using a worked example”, International Journal of Population Data Science, 10(1). doi: 10.23889/ijpds.v10i1.2471.