Reviews of Using a deterministic matching computer routine to identify hospital episodes in a Brazilian de-identified administrative database for the analysis of obstetrics hospitalisations
Claudia Medina Coeli, Rosa Maria Soares Madeira Domingues, Lana Meijinhos, Daniela Medina Coeli Bastos, Rejane Sobrino Pinheiro, Valeria Saraceni, Marcos Augusto Bastos Dias, Natália Santana Paiva, Kenneth Rochel de Camargo Jr
Article as submitted
Article Authors
Submission Date: 14/06/2024
Round 1 Reviews
Reviewer A
Anonymous Reviewer
Completed 03/07/2024
https://doi.org/10.23889/ijpds.v10i1.2467.review.r1.reviewa
This paper focussed on the identification of obstetric hospital episodes from Brazilian administrative records. Identification of hospital episodes is important for accurate register-based research especially in records such as these where unique patient identifiers are not available. I found all sections generally well-written and easy to understand and especially the database thoroughly explained. Please find some comments below.
i) The research question is clearly stated but does not always match statements made throughout the text. This paper aims to create episode identifiers, not unique patient identifiers. However, this is occasionally unclear with the statements making it sound like all events experienced by the same patient were identified.
E.g., Row 36: “a C++ routine that identified all hospitalisations belonging to the same patient”. Rows 53-54 & 384: “identifying hospital admissions related to the same patients”. Table 1: “Number of hospital admissions” (refers to the number of admissions per episode, not the patients’ overall number of admissions in 2018-2019.)
ii) It was stated that data were available 2012-2020 yet the focus was only on years 2018-2019, why?
a) Was there any preprocessing done to the matching variables prior to the C++ matching routine? E.g., varying date formats in the date of birth field could result in falsely unmatched records.
b) Were there missing data in the matching variables? If so, how much? I would assume these records would automatically not be part of any multiple-admission treatment episode.
c) Why were these particular matching variables selected? Previous data linkage studies using the SIH-SUS have also included variables such as the patient’s full name and mother’s name, were these not available?
d) Is the use of a deterministic matching procedure routine in the linkage of Brazilian health records? If so, please specify. If not, please motivate. Could the small number of matched records be due to the typically low sensitivity associated with deterministic matching?
iv) Statistical analysis
a) Why was rank-biserial correlation used for continuous variables?
b) The cut-off points for prolonged length of stay and high reimbursement are somewhat arbitrary and specific to the data and dichotomisation reduces statistical power. Could they be treated as count variables instead with RRs reported?
c) It would be informative to know what percentage of identified episodes contained admissions that did not include obstetric criteria.
d) The number of episodes with inconsistent postcodes is contrasted to the overall number of hospital episodes. However, episodes with only 1 admission cannot contain inconsistent postcodes. In reality, 6300/117 253 = ~5.4% of the multiple-admission episodes have inconsistencies in postcodes. Moreover, one should only contrast to episodes where the postcode was not used in the matching routine and thus could be inconsistent.
v) The paper would also benefit from proofreading and a formatting check. There are many double spaces, missing spaces, tables 3 & 4 extend beyond the page leaving the last column unseen. There are also two different versions used for the same word (e.g., transfer vs. transference, postal code vs. postcode), some unclear/incomplete words (e.g., Pós -abortion, admissio), and complicated sentence structures.
Reccomendation: Revisions Required
Reviewer B
Anonymous Reviewer
Completed 29/07/2024
https://doi.org/10.23889/ijpds.v10i1.2467.review.r1.reviewb
Dear editor,
We thank the authors for a clear explanation of the need for episode-based identification of hospital use, and a clear illustration of the data processes with suitable visualisation. With the combination of careful data curation and creative comparison, the authors demonstrate the added benefit of using episode-based case identification on more precise estimation of patient care experience & outcomes. Here are some questions and comments:
- Cohort definition:
Age 10-49 was used to define the cohort, as women of reproductive age. As per WHO definition, women of reproductive age is defined as between 15-49 years old. Describing age 10-15 as reproductive age is not appropriate, and not necessary for the current article.
- Could the authors please clarify why children between 10-15 were included in the current study?
- The study does not plan to describe age-specific patterns or breakdowns of service use. From international literature, e.g., in United Kingdom, young pregnancy is known to associate with more severe outcomes etc… Comparison characteristics of included sample with all childbirths could be something interesting to discuss, but most probably belong to a different paper.
- Length of stay
- Would same day discharges be counted as 0 day or 1 day?
- Would Inter-municipality/federation hospital transfer be captured as part of inter-hospital transfer?
- I suppose this is relevant for the false positive evaluation.
- Linkage Evaluation: Any chance you could report, for the 99.88% with no difference in postcodes, how many % have a cohesive principal diagnosis of ICD10 – Chapter XV?
- Linkage Evaluation: Any chance you could look at false positive by looking at consecutive child birth within impossible timeframes? For example, 2 childbirth admission in < 9 months etc…
- Linkage Evaluation: Other false positives could potentially identified by examining whether there are individuals who had an episode after death.
- Discussion: Over-estimation of events. I think a core reason of the current study showing less over-estimation is not difference in linkage techniques, but a direct effect of looking at obstetric admissions, which is more restricted by pregnancy and relevant timeframes, compared to, for example, presentation in A&E due to falls.
- Only 2.4% involve multiple hospital admissions etc… Any chance you can relate this to national statistics of child-birth? For people giving multiple births, what is the distribution of year-gaps?
- Could you elaborate a bit more on the comparison between Brazilian Mortality database and SIH-SUS please? Might not be directly clear for international readers. (Great discussion with clear example from line 357-366)
Other minor wording suggestions:
- Page 8, Line 242-243, Compare episodes… Could you please refer to the outcome measure here.
- I failed to install the microdatasus package in R 4.4, would be good to update your specifications on which R versions the package would work.
- Page 15, Line 330-333, the Hornbrook quote and discussion should belong to the introduction, not discussion.
- Page 15, Line 340-347, the core message of this paragraph is not clear.
Minor typo:
P11, Line 274, admission.
p12, Table 2, continuous.
Table 3 and 4, footnote 3 Confidence Interval 95% is not used, should be removed.
Page 14, IC10 --> ICD10, same error Page 16.
Reccomendation: Resubmit for Review
Reviewer C
Tony Stone
Completed 12/08/2024
https://doi.org/10.23889/ijpds.v10i1.2467.review.r1.reviewc
Many thanks for allowing me to review your interesting manuscript. I want to preface my remarks with the comments:
A) I think the manuscript describes an interesting study which should be published;
B) I enjoyed reading your manuscript.
However, I do feel that the manuscript would benefit from some further attention. Specifically:
1) Some of the English (in the context used) is unusual (e.g. "intercurrences", "federative", "univocally", "homogenous") and, at times, makes it difficult to interpret the correct meaning (e.g. "hospital admission discharge"). It would benefit from a native English writer's editing.
2) Setting out the key terminology in the intro and being very consistent with the pertinent terms would make reading/intrepretation easier.
3) It would be useful to be provided with the translations of all of the Portugese phrases which are included. Almost all are, but not all [perhaps only 1] (e.g. "Autorizacao de Internacao Hospitalar"). Some un/partially- translated text appears to have escaped (e.g. "Pos-abortion"). [Please excuse my removal of diacritics].
4) I'd appreciate a bit more description of how the underlying data are recorded, collated, prepared, etc. And a fuller explanation of what fields are recorded, and the completeness of any of the important fields.
5) There is passing reference to 28.5% of the Brazilian population using other (private) medical coverage and these varying by state and age. But, one imagines, these also vary by socio-economic status, academic attainment, and various other factors which are known to affect maternal outcomes (at least in other countries). I think this (bias) is worthy of greater discussion.
6) I'd expect to see more references relating to the clinical subject matter (maternal care and validation of clinical care measures).
7) More clinical background would certainly be helpful, both around:
a) any relevant smaller/cohort studies that might suggest whether the findings of this study (around estimates of % births which result in the mother requiring further care) are sensible (external validation); and,
8) I think the current stated objective of the study is rather vague ("utility" is not well defined). The main (most interesting, to my mind) objective of the study would appear to be to compare maternal outcomes based upon:
admission records relating to a birth
-VS-
episodes in which the first admission related to a birth.
Something which is potentially very important.
9) I do not think the statistical analyses are meaningful. This is a descriptive study (there is no real "intervention"/"exposure"): simply using the descriptive statistics would be very powerful: they will tell the story well enough, especially if some were plotted rather than tabular. In any case, relative difference seems like an odd choice of measure, absolute difference would be much easier to interpret.
10) The definintions/use of the severity indicators "Prolonged length of stay" and "High reimbursement" do not seem sensible to me. It's a truism: enchaining hosptial contacts into episodes can only increase values for these.
11) The inclusion/exclusion criteria (illustrated in Fig 2) seem an odd choice. Could you make the two cohorts more similar - I'm also not clear on why they are not almost identical - e.g. only include women admitted in 2018/19 (regardless of year of discharge)? In any case, an explanation about why these cohorts are not identical would be helpful.
12) The results (in Fig 4) would likely be more meaningful if you restricted the analyses to the intersection of the two cohorts but see:
13) I think you may wish to re-read reference 32, line 372. That reference does suggest (yes, imperfect) methods for evaluating linkage quality (or, at least, linkage bias) in the absence of gold standard data. Also see my previous comment, (7a). You may also wish to read the following from the same authors: https://doi.org/10.1093/ije/dyx177 . I think the current postcode analysis could be expanded.
14) Finally, there are published reporting standards/checklists for studies making use of observational data [http://dx.doi.org/10.1371/journal.pmed.1001885]; and, even, specifically guidance for information about linking datasets [https://doi.org/10.1093/pubmed/fdx037]. Following one of these (the first is more widespread) would help structure your manuscript and increase its utility to future research (more citations!).
Minor points:
15) Please don't list software packages or describe/name processing actions based on the software used (but, by all means cite the software, etc.). Best to use the word count to explain what processing was done, any assumptions, and what this processing achieved.
16) I don't think the flowcharts are particularly enlightening (see above point about naming processing actions). They could certainly be presented more concisely.
Once again, I enjoyed reading the manuscript and think the study described is interesting. I hope the above suggestions are useful.
Reccomendation: Revisions Required
Editor Decision
Oleguer Plana-Ripoll
Decision Date: 15/08/2024
Decision: Resubmit for Review
https://doi.org/10.23889/ijpds.v10i1.2467.review.r1.dec
Dear Jing Wang, Yanhong Jessika Hu, Lana Collins, Anna Fedyukova, Varnika Aggarwal, Fiona Mensah, Jeanie L. Y. Cheong, Melissa Wake:
We have reached a decision regarding your submission to International Journal of Population Data Science, "Study protocol: Generation Victoria (GenV) special care nursery registry".
Please address the attached reviewers' comments and return to us: one clean and one tracked changes version of your revised manuscript, plus a point by point letter of response/rebuttal, by 21/04/2023.
Our decision is to: Resubmit for Review
Kind Regards
Author Response
Kenneth Camargo Jr
Response Date: 24/09/2024
Round 2 Reviews
Reviewer A
Anonymous Reviewer
Completed 15/10/2024
https://doi.org/10.23889/ijpds.v10i1.2467.review.r2.reviewa
Thank you for addressing my comments. I especially find that the distinction of identifying hospital episodes rather than all hospitalisations belonging to the same person is now much clearer.
However, I agree with my fellow reviewers’ wish to further assess the quality of the data linkage and am slightly confused as to why further assessment was not conducted. For example, as suggested by reviewer B, it would be feasible to assess whether some of the intermediate hospitalisations within an episode contain a discharge due to death (i.e., episode continues despite a death occurring in the middle). Moreover, it would be good to add to the limitations that false negatives were not assessed in any manner.
Finally, I would like to reiterate that the manuscript still requires a thorough proofread to avoid spelling mistakes, double articles, inconsistent capitalisation, missing spaces etc., which at certain points make the content difficult to understand.
Thank you again for the opportunity to review this interesting manuscript!
Reccomendation: Revisions Required
Reviewer B
Anonymous Reviewer
Completed 25/09/2024
https://doi.org/10.23889/ijpds.v10i1.2467.review.r2.reviewb
I thank the authors for their revisions and resubmission of this important paper. Here are a few further comments on this submission.
Personally, I do not find the change from Hospital admission to Hospitalisation particularly helpful.
Line 92: duplicated “the”
Line 136-138: Thanks for the clarification.
The current wording is a bit unclear: Who recommended? What is this recommendation based on? You have all the answers already. I suggest:
- Add a citation to teenage pregnancies
- Re-word and add citation: We recognize that women aged 10 to 14 presents an increased risk of complications, which is an additional reason for their inclusion in studies evaluating obstetric hospitalizations and their outcomes
Line 251: typo “apply”
On same-day-discharge, is it possible for individuals to have multiple episodes (linked) but both discharged on the same day?
Linkage evaluation:
The false match rate evaluation using postal codes as comparison is likely under-estimating the linkage quality, as the linkages were performed using matched municipality, which caps the extent of errors.
As mentioned by a previous reviewer comment, perhaps the authors could consider a comparison of the stability of other available (non-pregnancy related) diagnosis for the linkage evaluation?
I would suggest the authors add a statement on how (and why) they interpreted the 74.7% and 80% agreement on principal diagnosis. Does that suggest a 20% missed match rate? What does it mean if it does? If not, how should users of this linked data be aware of the potential biases?
Line 379: typo “5.4%” – check commas vs full stops when describing %
Line 478: “fewer false links” - than what? what is the comparison?
Reccomendation: Revisions Required
Editor Decision
Oleguer Plana-Ripoll
Decision Date: 27/11/2024
Decision: Revisions Required
https://doi.org/10.23889/ijpds.v10i1.2467.review.r2.dec
Dear Claudia Medina Coeli, Rosa Maria Soares Madeira Domingues, Lana Meijinhos, Daniela Medina Coeli Bastos, Rejane Sobrino Pinheiro, Valeria Saraceni , Marcos Augusto Bastos Dias, Natália Santana Paiva, Kenneth Camargo Jr,
We have reached a decision regarding your submission to International Journal of Population Data Science, "Using a deterministic matching computer routine to identify hospital episodes in a Brazilian deidentified administrative database for the analysis of obstetrics hospital admissions.".
Our decision is: Revisions Required (a very minor revision - see below)
You added a website to refer to teenage pregnancies. Please provide this as a formal citation and add the author/name of the report (together with the website) in the reference list. Once this change has been made, we will accept the paper for publication.
Kind Regards
Author Response
Kenneth Camargo Jr
Response Date: 18/04/2023
Editor Decision
Oleguer Plana-Ripoll
Decision Date: 02/12/2024
Decision: Revisions Required
https://doi.org/10.23889/ijpds.v10i1.2467.review.r3.dec
Dear Claudia Medina Coeli, Rosa Maria Soares Madeira Domingues, Lana Meijinhos, Daniela Medina Coeli Bastos, Rejane Sobrino Pinheiro, Valeria Saraceni , Marcos Augusto Bastos Dias, Natália Santana Paiva, Kenneth Camargo Jr,
We have reached a decision regarding your submission to International Journal of Population Data Science, "Using a deterministic matching computer routine to identify hospital episodes in a Brazilian deidentified administrative database for the analysis of obstetrics hospital admissions.".
Our decision is: Revisions Required (a very minor revision - see below)
You added a website to refer to teenage pregnancies. Please provide this as a formal citation and add the author/name of the report (together with the website) in the reference list. Once this change has been made, we will accept the paper for publication.
Kind Regards