Panning for gold: finding medical treatment data in insurance records IJPDS (2017) Issue 1, Vol 1:178, Proceedings of the IPDLN Conference (August 2016)

Main Article Content

Daniel Avery
Published online: Apr 18, 2017


In our Chinese biobank of half a million people, we use data gathered from health insurance agencies to supplement our follow-up. We have 217,000 participants with insurance records including a breakdown of what the insurance paid for, totalling 1.6 million insurance records and 60 million chargeable items. The objective was to find ways of using this information to enhance our Electronic Health Records (EHRs) by adding usable and reliable treatment data, as a basis for future research.

Machine translations of every charge description were produced so that early investigation could be done by analysts who were not Chinese speakers. Key phrases were produced by specialist clinicians in an iterative process. We began by focussing on haemodialysis treated ESRD, heart failure, and coronary revascularisation. With our refined techniques, key phrase searches were developed which could be tied into ongoing validation procedures elsewhere in the study (e.g. cancer) or which could be validated using existing data from other sources (e.g. death reporting).

Machine translation provided both problems and unexpected solutions. While it could be inaccurate (‘Divine Comedy’, ‘semen’, ‘corpse cuisine’), more often than not it provided unexpected advantages, converting regional, archaic, or otherwise uncommon Chinese terms into the most common English equivalent.
The majority of chargeable elements in our insurance records are not treatment data per se, but instead hospital fees, generic care, and records of tests without result data. This makes identification of relevant treatment data challenging. Targeted key phrase searches proved successful, demonstrating that it was possible to use this data to answer research questions, even teasing out details which would otherwise not be available to us (e.g. ESRD, location and type of revascularisation).

Validation of these findings is ongoing. For example, we found that 395 of our participants have been charged for ‘corpse cuisine’ (more accurately ‘corpse preparation’). Comparing these figures to our death records (an independently gathered source) we confirmed that 326 are known to be dead, and we added the remaining 69 to our list for active follow-up. Similarly, will we be seeking hospital records for the 528 patients who are receiving cancer treatment with no record of cancer.

Our methods for dealing with treatment data are still being refined, but early results are looking promising. We are investigating standardisation to ICD-10-PCS codes, developing more treatment-based diagnoses, and feeding our findings back into our ongoing validation program.


Despite the implementation of full-day kindergarten (FDK) in several Canadian provinces, there is little evidence on the long-term outcomes associated with this program. Our objective was to use population-level linked data sources from Manitoba, Canada, to determine whether FDK results in better long-term academic outcomes and reduced inequities in outcomes.


Using data held in the Manitoba Centre for Health Policy Data Repository we examined provincial reading and numeracy assessments in grades 3, 7, and 8 and a performance index in grade 9 for students in two Manitoba school divisions between 1999-2012. In School Division A (SDA), FDK is targeted in the lowest SES schools; in School Division B (SDB) FDK was gradually introduced universally. SDA FDK students were matched using propensity scores to students in an adjacent school division with similar socioeconomic status (SES) but no FDK; in SDB a stepped-wedge design was used. Logistic regressions accounted for confounders including classroom effects and sex. Gamma sensitivity analyses were used to assess sensitivity of results to unmeasured confounding. The Kakwani Progressivity Index (KPI) determined how FDK affected equity.


There were 224-544 children in FDK and 869-1923 non-FDK matches in SDA, depending on the outcome examined; numbers in SDB ranged from 335-707 (FDK) and 222-475 (non-FDK). Including interactions, 35 comparisons were examined in SDA and 24 in SDB. None of the outcomes examined in SDB showed statistically significant effects of FDK that were robust to unmeasured confounding. In SDA there were only 3 statistically significant and robust findings of benefits of FDK, all related to math. Comparisons of KPIs for FDK and non-FDK children in both school divisions demonstrated inequities in outcomes associated with SES, however there were no significant differences in equity between the FDK and non-FDK children for any of the outcomes.


Our findings indicate no apparent benefits of universal FDK, and limited benefits from targeted FDK, specifically long-term improvements in numeracy for low-income girls. No reductions in inequity were found. Decisions regarding FDK implementation should weigh the costs of this program against the limited long-term academic benefits.

Article Details