In Brazil, the public healthcare is provided by the Unified Health System (SUS) which has multiple administrative databases, the major ones record Hospital (SIH) and Outpatient (SIA) procedures. Epidemiological information is collected for all population in subsystems such as Mortality (SIM), Birth Notifications (SINASC) and Notifiable Diseases (SINAN). Each subsystem has its own information system which is able to provide information about procedures, clinical information, drugs dispensed. However, these systems are not linked preventing individual-centered analysis.
To describe the methods and results to determine the parameters that are needed to execute the probabilistic deduplication of administrative and epidemiological databases in Brazil and the creation of National Health Database Centered on Individual.
A model was developed to comprehend the data from SIH, SIA, SIM, SINASC, SINAN, which have different formats and attributes between themselves and over time. These data consisted 1,331,398,981 records from 2000-2015. Probabilistic and deterministic record linkages were used to deduplicate this data. Kappa statistic and Clerical review were used to ensure the quality of the linkage. The graph algorithm depth first search was used to generate the identifiers.
The deterministic deduplication process resulted in a database with 403,113,527 possible unique individuals. After the probabilistic deduplication process of the former database, 175,435,802 unique individuals were identified. This result has an estimated error of 3.3% false positive and the false negative error is estimated at 12.3%.
The National Health Database Centered on Individual was generated and it will allow researchers to perform clinical, pharmacological, pharmacoeconomic, real world evidence and others researchers/studies. This database represents a significant cohort in the world spanning 15 years of historical data and preserving patient privacy. The success of the process described will allow repeating and incrementing the data for future years.
data linkage, record linkage, Brazilian health database, SUS deduplication