Linkage of whole genome sequencing with administrative health, and electronic medical record data for the study of autism spectrum disorder: Feasibility, Opportunities and Challenges

Main Article Content

Jennifer Brooks
Evdokia Anagnostou
Farah Rahman
Karen Tu
Lavnaya Uruthiramoorthy
Kirk Nylen
John McLaughlin
Michael Schull
Susan Bronskill


Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder (NDD) that presents with a high degree of heterogeneity (e.g., co-occurrence of other NDDs and other co-morbid conditions), contributing to differential health system needs. Genetics are known to play an important role in ASD and may be associated with different disease trajectories.

Objectives and Approach
In this proof of principle project, our objective is to link >2,200 children with a confirmed diagnosis of a NDD from the Province of Ontario Neurodevelopmental (POND) Study to administrative health data and electronic medical record (EMR) data in order to identify subgroups of ASD with unique health system trajectories. POND includes detailed phenotype and whole genome sequencing (WGS) data. Identified subgroups will be characterized based on clinical phenotype and genetics. To meet this goal, consideration of WGS-specific privacy and data issues is needed to implement processes which are above and beyond traditional requirements for analyzing individual-level administrative health data.

Linkage of WGS data with administrative health data is an emerging area of research. As such it has presented a number of initial challenges for our study of ASD. Privacy concerns surrounding the use of WGS data and rare-variant analysis are of particular importance. Practical issues required the need for analysts with expertise in administrative data, EMR data and genetic analyses, and specialized software and sufficient processing power to analyze WGS data. Transdisciplinary discussions of the scope and significance of research questions addressed through this linkage were crucial. The identification of genetic determinants of phenotypes and trajectories in ASD could support targeted early interventions; EMR linkage may inform algorithms to identify ASD in broader populations. These approaches could improve both patient outcome and family experience.

As the cost of genetic sequencing decreases, WGS data will become part of the routine clinical management of patients. Linkage of WGS, EMR and administrative data has tremendous potential that has largely not been realized; including population-level ASD research to improve our ability to predict long-term outcomes associated with ASD.


Most public health-related concepts and outcomes can be defined as to their geographic location. The surroundings often have a strong influence or interactions with studied phenomena. For this reason a good understand and accurate geographic placement, linking, and aggregation of studied concepts is a critical yet often underestimated procedure.

Objectives and Approach

The main objectives of this presentation are: 1) an easy to understand review and explanation of geographic delineation markers in common healthcare databases, and 2) ways and pitfalls of geographic data linkages. Common point- and area-defined databases will be described. Nuances of ‘point-to-area’, ‘area-to-area’ linkages will be discussed, with additional explanations of scale and zone effects. Examples of common linkages between the following common spatial delineators will be explained: Postal Code Conversion File (PCCF), small area Canada Census units, common health system geographies (e.g. sub-regions, LHINs). Frequently committed errors and best practices in geographic data linkages will be discussed.


Examples of the influence of various methods of geographic data linkages on study simulated outcomes will be shown.


Improper geographic linkage procedures can lead to incorrect study results. Enhancing the knowledge of geographic concepts in public health research and promotion of correct procedures in spatial placements, linkages and aggregation are the main take home messages of this presentation.

Article Details

How to Cite
Brooks, J., Anagnostou, E., Rahman, F., Tu, K., Uruthiramoorthy, L., Nylen, K., McLaughlin, J., Schull, M. and Bronskill, S. (2018) “Linkage of whole genome sequencing with administrative health, and electronic medical record data for the study of autism spectrum disorder: Feasibility, Opportunities and Challenges”, International Journal of Population Data Science, 3(4). doi: 10.23889/ijpds.v3i4.739.

Most read articles by the same author(s)

1 2 3 4 > >>