Encoding Diagnostic Codes for Privacy-Preserving Record Linkage
Main Article Content
Diagnostic codes, such as the ICD-10, may be considered as sensitive information. If such codes have to be encoded using current methods for data linkage, all hierarchical information given by the code positions will be lost. We present a technique (HPBFs) for preserving the hierarchical information of the codes while protecting privacy. The new method modifies a widely used Privacy-preserving Record Linkage (PPRL) technique based on Bloom filters for the use with hierarchical codes.
Objectives and Approach
Assessing the similarities of hierarchical codes requires considering the code positions of two codes in a given diagnostic hierarchy. The hierarchical similarities of the original diagnostic code pairs should correspond closely to the similarity of the encoded pairs of the same code.
Furthermore, to assess the hierarchy-preserving properties of an encoding, the impact on similarity measures from differing code positions at all levels of the code hierarchy can be evaluated. A full match of codes should yield a higher similarity than partial matches.
Finally, the new method is tested against ad-hoc solutions as an addition to a standard PPRL setup. This is done using real-world mortality data with a known link status of two databases.
In all applications for encoded ICD codes where either categorical discrimination, relational similarity or linkage quality in a PPRL setting is required, HPBFs outperform other known methods. Lower mean differences and smaller confidence intervals between clear-text codes and encrypted code pairs were observed, indicating better preservation of hierarchical similarities. Finally, using these techniques allows for much better hierarchical discrimination for partial matches.
The new technique yields better linkage results than all other known methods to encrypt hierarchical codes. In all tests, comparing categorical discrimination, relational similarity and PPRL linkage quality, HPBFs outperformed methods currently used.
This work is licensed under a Creative Commons Attribution 4.0 International License.