By Themba Mutemaringa, Alexa Heekes, Mariette Smith, Andrew Boulle and Nicki Tiffin

 

Article as submitted

Article Authors

Submission Date: 19/06/2022


Round 1 Reviews

Reviewer A

Anonymous Reviewer

Completed 14/07/2022

View text

https://doi.org/10.23889/ijpds.v8i1.1771.review.r1.reviewa

Thank you so much for the opportunity to review this very well-written paper focused on the process and evaluation of record linkage in West Cape, South Africa. Your objectives, methods, results and discussion are all clearly laid out and easy to follow.

I have only a few comments/extra questions:

  1. Can you tell us more about the processes undertaken in other locations where ongoing linkage is successful and the positive changes in health care and health care provision that this brings, as well as the context that allows for health record linkage to be successful? You mention Canada and Australia in the Discussion.

  2. I understand that you have focused mainly on duplicates, but also have examined the likelihood of errors in each of your matching variables. This is very important to understanding the overall matching accuracy. Is there a way to estimate the gaps in the data/missing information (you do talk about missing IDs); where are the main areas of missing data that may affect your “highly likely” matching rate? Are there initiatives to improve the missing data rate?

  3. Can you tell us a bit more about the back end of the matching algorithm? How was your methodology created and are there process metrics that you can share ?

  4. Can you determine systematic issues affecting the matching variables either by input institution, location or other foundational factors that could potentially be remedied through education?

  5. It looks like you have reported your linkage output findings re: the above? Can you tell us more about the receptivity of these reports? Have you tracked the changes in accuracy of input over time by variable?

  6. Can you tell us a bit about the privacy aspects of this study? What authority do the authors have to collect, use and disclose the data for this study (are the data de-identified when linkage evaluations are conducted)?

Other than the above, I have no other comments. Good luck and thank you for submitting this interesting article.

Reccomendation: Accept Submission


Reviewer B

Anonymous Reviewer

Completed 03/08/2022

View text

https://doi.org/10.23889/ijpds.v8i1.1771.review.r1.reviewb

This manuscript provides a review of the record linkage system implemented in the Western Cape of South Africa to provide a unique patient identifier across health services. The manuscript is well-written and interesting throughout, and deserves publication.

Some minor questions are below

  1. "The WCGH maintains a real-time PMI system that creates a UPI for all new patients who register at public health facilities in the province". What are these public health facilities? Is it just hospital/emergency department presentations, or other services as well? International readers will need some context as to the South African health system.

  2. It might be good to have some more information on how the Patient Master index is used (or will be used) within the Western Cape. What projects does it enable?

  3. ". We have developed a machine learning model and a probabilistic record linkage scheme, based on the Fellegi-Sunter algorithm [13], which is undergoing final review before deployment". It would be good to understand the reasons for this. Are there inefficiencies/complications in the current rule-based approach? What made you consider this change to a probabilistic approach?

  4. The manuscript is quite long, and readability may improve if it were reduced in length. (Yes I know I am not helping things by asking you to add more information).

Reccomendation: Revisions Required


Editor Decision

Merran Beckley Smith

Decision Date: 16/12/2022

Decision: Resubmit for Review

View text

https://doi.org/10.23889/ijpds.v8i1.1771.review.r1.dec

Dear Themba Mutemaringa, Alexa Heekes, Mariette Smith, Andrew Boulle and Nicki Tiffin:

We have reached a decision regarding your submission to International Journal of Population Data Science, "Record linkage for Routinely Collected Health Data in an African Health Information Exchange: Record linkage in an African Health Information Exchange".

Please address the attached reviewers' comments and return to us: one clean and one tracked changes version of your revised manuscript, plus a point by point letter of response/rebuttal, by 12 October 2022.

Our decision is to: Resubmit for Review

Kind regards


Author Response

Rachel Farber

Response Date: 22/08/2022

Article as resubmitted

View text

Response to Reviewers

We thank the reviewers for their thoughtful and insightful review of the manuscript. Please find our responses to each point raised, below:

Reviewer A:

  1. Can you tell us more about the processes undertaken in other locations where ongoing linkage is successful and the positive changes in health care and health care provision that this brings, as well as the context that allows for health record linkage to be successful? You mention Canada and Australia in the Discussion.

    Thanks for the suggestion to include examples of the successes arising from other programs’ linkage algorithms. Linkage is well developed in UK, Canada and Australia and it helps provide answers that would not be achievable with enriching datasets as enabled by record linkage. We have referenced a report on some of the major achievements in Australia as an example of these achievements on page 16 paragraph 2.

    Some of the successes from better data linkage in Australia, for example, are noted by Smith and Clark in the 50-year review of the progress made in the area of data linkage including important contributions in public health such as establishing the teratogenic effects of maternal diet. Record linkage programs are well advanced in Australian states and key strides have been made in establishing a national linkage resource via the Population Health Research Network.

  2. I understand that you have focused mainly on duplicates, but also have examined the likelihood of errors in each of your matching variables. This is very important to understanding the overall matching accuracy. Is there a way to estimate the gaps in the data/missing information (you do talk about missing IDs); where are the main areas of missing data that may affect your “highly likely” matching rate? Are there initiatives to improve the missing data rate?

    Thanks for flagging the importance of missing data. This exploratory study is a first step in understanding the dynamics of linkage in our data and understanding issues such as data missingness. Future work will assess the impact of missingness and other data imperfections on our ability to match records effectively.

    We have noted the possibility for this ongoing research by adding the following sentence to our conclusion on page 17 paragraph 2.

    This work provides an understanding of record linkage in the PHDC and will provide a basis for further explorations on the impact of data quality on our ability to successfully link and deduplicate records.

  3. Can you tell us a bit more about the back end of the matching algorithm? How was your methodology created and are there process metrics that you can share?

    Thank you for your interest in evaluating the matching process through the use of specific metrics. For this publication, we have provided the rules underlying the current matching algorithm in Appendix A. Ongoing work which we hope to publish in the near future includes generating metrics to assess efficacy of matching processes and we will certainly be sharing these metrics as soon as they are finalised.

  4. Can you determine systematic issues affecting the matching variables either by input institution, location or other foundational factors that could potentially be remedied through education?

    Thanks for raising this important point. In the discussion we describe a report that can highlight systematic errors at their source in order to target interventions to improve the quality of data that are entered, but in addition the depart of health has province-wide programmes ongoing to improve the quality of data that are entered at public facilities. It’s the remit of the WCGH information management unit to undertake this kind of training and intervention but our analyses are vital to inform their processes.

    We have added the following text to the discussion (page 16, paragraph 3) to explain this knowledge transfer process

    …and understanding the characteristics of data linkage successes and challenges at the PHDC can inform interventions by the WCGH to improve data collection within public health facilities.

  5. It looks like you have reported your linkage output findings re: the above? Can you tell us more about the receptivity of these reports? Have you tracked the changes in accuracy of input over time by variable?

    The reports have been initiated fairly recently and will be able to tell overtime how successful the interventions have been in improving the data entry at source.

    We have added the statement below, on page 17 paragraph 2, to indicate this intention.

    Understanding the current characteristics of data linkage will provide a baseline against which to assess the future success of ongoing interventions to improve data collection at source.

  6. Can you tell us a bit about the privacy aspects of this study? What authority do the authors have to collect, use, and disclose the data for this study (are the data de-identified when linkage evaluations are conducted)?

    This study was conducted by employees of the Western Cape Government, working within the secure PHDC environment hosted by the Western Cape Department of Health. No individualised data were downloaded or used outside of the routine health data environment managed by the department. The reported linkage algorithms were run within the secure environment and only aggregated findings reported for publication and open sharing. We believe that for these reasons the analysis of linkage does not pose a privacy risk to any individuals whose data are held in the PHDC environment under protection by compliance with both the Health Act and the Protection of Personal Information Act of South Africa. We also sought Ethics approval from the University of Cape Town Faculty of Health Sciences Ethics Board for this study, as well as WCGH approval. We have highlighted this in the sections “Ethical Approval” and “Data Availability Statement”.

Reviewer A:

  1. "The WCGH maintains a real-time PMI system that creates a UPI for all new patients who register at public health facilities in the province". What are these public health facilities? Is it just hospital/emergency department presentations, or other services as well? International readers will need some context as to the South African health system.

    Thank you for highlighting the need for some more information about our health service. We have added the following sentence to page 5 paragraph 1.

    The Western Cape public health system has different levels of care, primary health care which is provided to all residents for free and is obtained at clinics that are owned at different levels of governance, that is, municipality/metro and provincial government. The other levels of care are secondary, tertiary and quaternary and these services are offered at public hospitals and patients get different levels of subsidies depending on their income levels. The PMI links to primary healthcare platforms at clinics and community health centres as well as hospital information systems.

  2. It might be good to have some more information on how the Patient Master index is used (or will be used) within the Western Cape. What projects does it enable?

    The PHDC is part of the provincial health service with primary purpose of healthcare delivery and health service support. The PMI thus enables interoperability and connection of disparate records in order to improve continuity of care and health service delivery. The PMI is also a key enabler in the planned implementation of the national health insurance (NHI), a national government-initiated project for achieving universal health coverage. The Western Cape PMI is well established and has been in place for over 20 years. In order to clarify this we have added a sentence on page 4 paragraph 3

    The PMI thus enables interoperability and connection of disparate records in order to improve continuity of care and health service delivery.

  3. "We have developed a machine learning model and a probabilistic record linkage scheme, based on the Fellegi-Sunter algorithm [13], which is undergoing final review before deployment". It would be good to understand the reasons for this. Are there inefficiencies/complications in the current rule-based approach? What made you consider this change to a probabilistic approach?

    Missingness in some linkage variables has influenced the decision to explore other linkage algorithms. Although, we have not yet quantified the error rates among linkage variable we have some insights into the completeness of these variables and some studies have shown that probabilistic linkage performs a lot better (Ying Zhu 2015 – Probabilistic vs Deterministic)

    There are also requirements to link other datasets to the PMI which in most cases do not have the unique patient identifier or clean set of identifying fields.

    We have highlighted these use cases in the manuscript by describing them in page 15 paragraph 3

    Missingness in some linkage variables as well as the need to link datasets without PMI identifiers support the need for a probabilistic approach.

  4. The manuscript is quite long, and readability may improve if it were reduced in length. (Yes I know I am not helping things by asking you to add more information).

    Thank you for this observation. Whilst we recognise we could be more succinct for specialised readership, we have tried to develop our explanations in a way that is accessible to a wide readership. We are within journal specified limits but we will respect any editorial decision to shorten the manuscript.


Round 2 Reviews

Reviewer A

Anonymous Reviewer

Completed 07/11/2022

View text

https://doi.org/10.23889/ijpds.v8i1.1771.review.r2.reviewa

Thank you for your thoughtful responses to the reviewer comments and your re-submission. Data linkage and its associated opportunities and challenges continue to be an important issue in all jurisdictions, and your work illustrates the ongoing work in this area.

I have a few copy-editing suggestions:

p. 5 The following sentence may benefit from additional punctuation: “The Western Cape public health system has different levels of care, primary health care which is provided to all residents for free and is obtained at clinics that are owned at different levels of governance, that is, municipality/metro and provincial government.”

p. 6 “A brief description of the work on trying to catch these duplicates was given previously [7].” I suggest either removing this sentence or providing a bit more explanation, perhaps that this current study is a follow up to a previous study.

p. 6 In Box 1, under PMI history, the acronym “Prehmis” is not defined until later on page 8.

p.8 Thank you for your description on your authority to use the data for this purpose. A brief sentence to this end in the methods section of the manuscript would add completeness to the research endeavour.

p. 16 The following could be more concise: “Some of the successes from better data linkage in Australia, for example, are noted by Smith and Clark [14] in the 50-year review of the progress made in the area of data linkage including important contributions in public health such as establishing the teratogenic effects of maternal diet [15]. Record linkage programs are well advanced in Australian states and key strides have been made in establishing a national linkage resource via the Population Health Research Network [14].”

All the best and thank you for your work on this topic.

Reccomendation: Accept Submission


Reviewer B

Anonymous Reviewer

Completed 16/12/2022

Reccomendation: Accept Submission


Editor Decision

Merran Beckley Smith

Decision Date: 14/12/2022

Decision: Resubmit for Review

View text

https://doi.org/10.23889/ijpds.v8i1.1771.review.r2.dec

Dear Themba Mutemaringa, Alexa Heekes, Mariette Smith, Andrew Boulle, Nicki Tiffin:

We have reached a decision regarding your submission to International Journal of Population Data Science, "Record linkage for Routinely Collected Health Data in an African Health Information Exchange: Record linkage in an African Health Information Exchange", and are delighted to inform you that our decision is to: Accept Submission.

We look forward to working with you through the next stages towards final publication.

Please get in touch if you have any queries going forward. Thank you.

Kind Regards