Cohort profile: A population-based record linkage platform to address critical epidemiological evidence gaps in respiratory syncytial virus and other respiratory infections

Main Article Content

Minda Sarna
Belaynew Taye
Huong Le
Fiona Giannini
Kathryn Glass
Christopher C Blyth
Peter Richmond
Rebecca Glauert
Avram Levy
Hannah C Moore


The Western Australia (WA) Respiratory Infections Linked Data Platform is a population-based cohort established to investigate the epidemiology of RSV and other respiratory infections in children aged 0-10 years, incorporating microbiological testing patterns, hospital admissions, emergency department presentations, and socio-demographic data.

The cohort was formed through individual linkages between datasets from the WA Department of Health including the Birth and Death Registry, Midwives Notification System (MNS), Hospital Morbidity Data Collection, Emergency Department Data Collection, WA Notifiable Diseases Database, WA Register of Developmental Anomalies, WA Cerebral Palsy Register, WA Antenatal Vaccination Database, WA Family Connections, and PathWest Respiratory Virus Surveillance Data. Hospitalisations and emergency department presentations were temporally linked to routine respiratory viral surveillance data.

The cohort consists of 368,830 WA births between 1 January 2010 and 31 December 2020 with accompanying perinatal and demographic data, and with secondary care follow-up to 30 June 2022. Of these births, 24,660 (6.7%) identify as Aboriginal. A total of 4,077 (1.1%) children died from all causes during the study period (2010-2020), and 9.2% (33,818) of children were born preterm (<37 weeks).

The Respiratory Infections Linked Data Platform enables epidemiological investigations, identifying virus-specific risk groups, risk factors, clinical presentation, viral testing patterns, long-term impacts and accurate measures of viral incidence rates in risk and population sub-groups This will not only aid in the calculation of cost-effectiveness estimates of interventions such as immunisations, but also provide guidance for design and implementation of such programs to priority groups. The Respiratory Infections Linked Data Platform will also enable evaluation of the direct and indirect effects of maternal and infant vaccines and new therapeutics. Analyses using this platform will also generate epidemiological data needed for other respiratory viruses on the vaccine pipeline such as parainfluenza virus and human metapneumovirus.


  • The Western Australia (WA) Respiratory Infections Linked Data Platform is a population-based cohort established to investigate the epidemiology of RSV and other respiratory illnesses in children aged 0-10 years. The cohort was formed through individual linkages between datasets from the WA Department of Health including the Birth Registry, Midwives Notification System, Hospital Morbidity Data Collection, Emergency Department Data Collection, WA Notifiable Diseases Database, Death Registry, WA Register of Developmental Anomalies, WA Cerebral Palsy Register, WA Antenatal Vaccination Database, WA Family Connections, and PathWest Respiratory Virus Surveillance Data.
  • The cohort consists of 368,830 children born between 01 January 2010 and 31 December 2020, with secondary care follow-up to 30 June 2022.
  • Research on the epidemiological characteristics including clinical presentation, risk factors, viral testing patterns, long-term impacts, and accurate measures of incidence rates in risk and population sub-groups through statistical prediction models will improve our understanding of the long-term burden of RSV and other respiratory infections. Statistical prediction models will estimate the under-ascertainment of RSV infections and thus predict more accurate incidence rates. This will aid in the calculation of cost-effectiveness estimates of any interventions and provide guidance for design and implementation of prevention programs to priority groups.
  • The platform is a static dataset that can be expanded on in the future.


Respiratory syncytial virus (RSV) is the leading cause of pneumonia in children worldwide [1] and globally responsible for >3.6 million hospitalisations in children aged <5 years with the highest incidence of RSV-associated hospitalisations occurring in young infants [2]. An Australian report published >10 years ago estimated the annual direct cost of RSV in children aged <5 years to be $24–50 million (surpassing that of influenza for this age) [3]. This has recently been updated to $59–121 million [4]. Pathogen-specific incidence rates are needed to accurately determine burden and associated healthcare costs; however, these rates can be impacted by variable microbiological testing patterns. For example, a statistical prediction model developed by our team using RSV-positive hospitalisations and the total number of tests conducted for RSV detection to estimate a ‘predicted rate’ found previous incidence rates under-estimated the burden in infants by 30–57%, suggesting that ‘true’ RSV hospitalisation rates in those aged <6 months ranges from 27.9 to 43.7/1000 child-years [5].

In the last 2 years, major changes in both RSV epidemiology and the prevention landscape have occurred. Changes in the typical seasonal patterns of RSV to out-of-season peaks in Australia and other countries following COVID-19 mitigation measures have highlighted the significant burden of RSV and other viruses like human metapneumovirus (hMPV), and resulted in a re-assessment of conventionally accepted thinking regarding seasonality, how immunity is acquired and other transmission dynamics [6, 7]. Additionally, RSV prevention has seen significant advancements, with the development of a single dose long-acting monoclonal antibody (mAb), nirsevimab [8, 9], now given regulatory approval by the European Commission and US Food and Drug Administration for use in newborns and infants for protection against RSV disease [10, 11] and a maternal vaccine [12], now approved for use in the US [13]. Both therapeutics provide passive immunisation to infants, the group experiencing the highest burden. These events have renewed global scientific and public interest in RSV disease and further highlighted gaps in the evidence base related to the country-specific RSV epidemiology [14].

Population sub-groups such as infants born preterm [15], First Nations infants [16], children with comorbidities, and those with congenital anomalies [17] may be suffering from a disproportionate burden of acute lower respiratory infection (ALRI). However, RSV and other respiratory pathogen-specific data in these high-risk groups are limited. In addition, while data exists for infants and young children stratified by age in years or 6-month intervals, data with finer age stratification are needed, particularly to inform the impact of future immunisation measures. Data in older children are also sparse. Epidemiologic research on the clinical presentation, identification of risk factors, viral testing patterns, long-term impacts, and accurate measures of incidence rates through statistical prediction models in risk and population sub-groups will improve our understanding of the long-term burden of RSV and other respiratory infections. This will not only aid in the calculation of cost-effectiveness estimates of any interventions, but also provide guidance for design and implementation of prevention programs to priority groups. Linked population-based datasets also provide a source of real-world data to evaluate the impact of current and future intervention programs.

To address these evidence gaps and needs, we have constructed the Western Australian (WA) Respiratory Infections Linked Data Platform, established through individual-level linkage of population-based state administrative databases and registers via probabilistic matching. The Data Platform is a static platform that will be expanded on in the future. While our current focus is on RSV and the platform was built with RSV in mind, the Respiratory Infections Linked Data Platform will provide a valuable resource for wider epidemiological research on other respiratory viruses and infections in the future. Members of our team have long-standing expertise using linked population data in previous analyses, including the evaluation of maternal influenza and pertussis vaccine safety and effectiveness [18, 19], and direct and indirect effects of maternal [2022] and childhood [16, 23] vaccines, which has aided the design of this platform and will assist with investigations of all respiratory infections. The Respiratory Infections data platform will enable evaluation of the direct and indirect effects of maternal RSV vaccines and/or monoclonal antibodies on RSV and non-RSV hospitalisations. Subsequent to these research outcomes for RSV, robust data will be needed for other respiratory pathogens on the vaccine pipeline such as parainfluenza virus and hMPV.


Setting and population

WA is the largest state, covering the western third of Australia, and has a population of 2.7 million, of which Aboriginal and/or Torres Strait Islander people (hereafter respectfully referred to as Aboriginal as the accepted term in WA) comprise 6.4% of the population [24]. The majority of the State’s population live in the metropolitan areas of the capital city, Perth, and its surrounds (2.1 million people, 78%) [24]. Climate ranges from tropical in the northern regions to temperate in the metropolitan and southern regions. The winter respiratory virus season typically spans from May to September inclusive.

Data sources and linkage

The WA Respiratory Infections Linked Data Platform comprises a static population-based cohort of births between 2010 and 2020 across WA (Figure 1). The Platform was formed through individual-level linkage of a number of administrative health databases and registries, described in Table 1. Data extraction and linkage of individual records was performed by Data Services at the WA Department of Health. In the absence of a unique national personal identifier in Australia, the Data Services team uses probabilistic linkage processes in a ‘best practice protocol’ to link the same individual in a number of different health databases and registers [25]. Best practice protocol involves the separation of personal demographic data from clinical or service information; linkage of records; removal of personal identifiers; and the assignment of unique encrypted linkage keys, to allow researchers to link individuals between different datasets. Probabilistic linkage compares groups of records using complex non-unique identifiers or field matching algorithms [26]. These algorithms compare common demographic fields (e.g., given name, surname, date of birth, and other relevant fields dependent on the contents and context of the dataset [25]), and provide a similarity weighting index which is positively associated with the likelihood that two or more records belong to the same individual [26]. Clerical review is required to assess potential non-matched records; this process has been shown to reduce the error rate of matching to less than 0.1% [27]. Individuals in the resulting linked datasets are identified through a 16-digit unique alpha numeric code (unique child ID, or “root”) which is included in each dataset provided and allows the same child to be identified in more than one dataset. Similarly, Mother ID allows mothers to be identified in perinatal data, the antenatal vaccine register, and in mapping files linking mother to child.

Figure 1: Establishment of the Respiratory Infections Linked Data Platform, Western Australia 2010–2020.

Database Description Data within the respiratory infections data linkage platform
Roots a N Records N Time period
Birth Cohort Platform File
Birth Registrations Administered by the Registry of Births, Deaths and Marriages and includes all children born and subsequently registered in Western Australia. 417,563 417,563 1 Jan 2010–27 June 2022
Midwives Notification System (MNS) [42] State perinatal data collection includes demographic maternal and newborn data, maternal medical and obstetric history, information on labour, delivery and birth on 99% of births [28] with gestational age ≥20 weeks and birthweight ≥400 g (where gestational age is unknown) in WA. From July 2016, MNS also records trimester of antenatal influenza and pertussis vaccinations. 371,387 371,387 1 Jan 2010–31 Dec 2020
WA Register of Developmental Anomalies (WARDA) [43] Legally mandated register compiling records of up to 10 major and minor birth defects per individual diagnosed in utero, at, and following birth, and up to the age of 6 years using active and passive case ascertainment to provide accurate information in a timely manner. The quality of data in WARDA has been evaluated and is estimated to be high, with only 1.5% of missing data of the 20 essential variables collected by WARDA [44] 20,982 20,998 1 Jan 2010–30 Jun 2022
WA Cerebral Palsy Register [45] Sub-section of the WARDA also contains a clinical register on cerebral palsy (CP) and associated impairments. The CP Register data may be delayed as diagnosis of CP, particularly for milder forms of CP, may not be rendered until the brain is fully developed at 3–5 years of age. 554 554 12 Jan 2010–25 Apr 2020
Family Connections data WA Family Connections system contains links between individuals who are related using data stored in the Birth Register and MNS. It is a unique dataset to WA and allows investigations of disease within families. Family connections data contained within the WA Respiratory Pathogens Data Linkage Platform allow the identification of full and half siblings with a date of birth that falls both within and outside of the birth cohort. Mapping files link the child’s ID to mother ID and father ID. Siblings will have the same mother ID. The relationship of half siblings are stated in the mapping files. 474,315 474,315 1 Jan 1990–31 Dec 2020
WA Antenatal Vaccinations Database (WAAVD) [46] State-wide registry of healthcare provider-reported vaccines administered during pregnancy between March 2012 and September 2016, including vaccine brand, batch number, and date of vaccination. Previous validation of WAAVD has demonstrated specificity and positive predictive values exceeding 95% [46] The database was archived in 2016 and from 2016 onwards antenatal vaccinations are recorded in the perinatal record in the MNS. 35,428 41,244 16 Mar 2012–30 Sep 2016
Health Outcome Data b
Hospital Morbidity Data collection [47] Hospitalisation admissions (inpatient) data covered all inpatient separations (discharges, transfers and deaths) from all free-standing hospitals across the state. Data includes dates of admission and separation, the primary diagnosis code (first-listed diagnosis) and 20 secondary diagnosis codes (coded using the ICD-10-AMc coding system), ACHId codes for procedures performed during the stay, intensive care unit attendance and associated length of stay. 241,512 588,784 1 Jan 2010–31 Mar 2022
Emergency Department Data Collection [48] Emergency department data contain date of ED presentation and a combination of diagnostic information to describe the presenting complaint. This includes coded diagnosis (using ICD-10-AM codes), symptoms (using the Systematised Nomenclature of Medicine Clinical Terms) or free text. ED activity in all public hospitals and private hospitals under contract with the WA government is captured in this dataset. 319,107 1,495,232 4 Jan 2010-30 Jun 2022
WA Notifiable Infectious Diseases Database (WANIDD)e Collates data on all public health notifications of notifiable infections. Recorded information includes disease, date of onset, diagnosis method and organism serotype. For the WA Respiratory Infections Data Linkage Platform, notifications include influenza virus, pertussis, invasive pneumococcal disease and COVID-19. 10,197 10,590 30 Jan 2010–22 Jun 2020e
Death Registrations and cause of deathf Death registration data (from the National Death Index) includes demographic information, date and cause of death (including contributing causes) on all deaths in WA coded using ICD-10-AM coding 4,077 4,077 1 Jan 2010–31 Dec 2022
PathWest respiratory virus surveillance data Routine respiratory microbiological test results (positive and negative) performed at all government funded PathWest laboratory services, WA’s sole public pathology provider. Data include the date of specimen collection, specimen type, test, and result. All respiratory specimens were tested using PCR other than the inclusion of cases based on a clinical case definition for pertussis. For the WA RSV Data Linkage Platform, respiratory viruses include RSV, influenza, parainfluenza viruses 1-3, human metapneumovirus, adenovirus, enterovirus, rhinovirus, pre-COVID-19 seasonal coronaviruses, SARS-CoV-2, Chlamydia pneumoniae, Legionella pneumophila and longbeachae, Bordetella pertussis, Mycoplasma pneumoniae, and Pneumocystis jirovecii. Influenza and RSV subtyping is also available. 48,823 74,621 15 Jan 2010–9 Dec 2021
Table 1: Data sources incorporated into the WA respiratory infections data linkage platform. aUnique child ID. bHealth outcome data in the Respiratory Infections Data Linkage Platform include data for the birth cohort only. cICD-10-AM, International Statistical Classification of Diseases and Related Health Problems, 10th Revision, Australian Modification. dAustralian Classification of Health Interventions. eThere were no respiratory virus notifications for influenza, pertussis, or RSV for the remainder of 2020. fCause of death coded by the National Coronial Information System and the Victorian Department of Justice and Community Safety (Home | National Coronial Information System Victoria (

The birth cohort was formed by merging all registered births (live and stillbirths) from Birth Register data with a date of birth from 1 January 2010 to 27 June 2022, with perinatal records from the MNS from 1 January 2010 to 31 December 2020, based on the unique child ID. Together, the overall coverage of the birth cohort comprised 371,387 children born in WA (all live and stillbirths) and spanned from 1 January 2010 to 31 December 2020 with complete perinatal data (Figure 2). As data are refreshed, perinatal data for the subsequent years will be provided. Antenatal vaccination data recorded in Western Australia Antenatal Vaccines Database (WAAVD) were linked for each mother and child in the birth cohort by date of vaccination using the conception date (calculated from gestational age at birth and the child’s date of birth recorded in the perinatal data collection) and the date of birth. Gestational age at vaccination in weeks and trimesters were re-calculated from the date of vaccination for vaccinations between March 2012 and September 2016.

Figure 2: Time period of data provided for individual datasets and number of records in the Respiratory Infections Linked Data Platform, Western Australia.

The Platform provides several health outcome related outputs to enable a wide range of research. Healthcare utilisation data includes all records, regardless of diagnosis, relating to emergency department (ED) presentations, hospital separations, deaths and associated healthcare measures such as hospital length of stay, in-hospital procedures, intensive care unit (ICU) attendance and length of stay. Respiratory pathogen detections, including notifiable infectious diseases, are also identified. Separate PathWest respiratory pathogen testing records were combined if they related to the same child with the same day of specimen collection. Specimens collected on or after the date of death were considered to be post-mortem specimens. From a combined dataset of all hospital admissions and ED presentations (herein, termed as secondary care episodes), we temporally linked laboratory testing records using the unique child ID and dates of hospital admission and/or presentation and date of specimen collection. These health outcomes are available from birth up to the most recent date at the time of data extraction and linkage for all individuals in the cohort. Longitudinal data over successive years also allows assessment of repeat infections and Family Connections data enable studies assessing risk across siblings within families.

Due to the wealth of data items available on the MNS and additional indicators available through Data Services (e.g., Aboriginal identification and geocoding information), the cohort can be comprehensively described. This includes demographic structure (number and place of births over time; levels of socio-economic deprivation) and important covariate factors (e.g., gestational age, maternal comorbidities, maternal smoking status, maternal antenatal vaccination status, birth defects, other perinatal factors) that are often used in infectious disease epidemiology research. The MNS has been validated and is estimated to capture 99% of births [28] (live and stillborn) with a gestational age of ≥20 weeks. Furthermore, it is mandated and provides data into national data collections at the Australian Institute of Health and Welfare, and thus, the quality of data is high.


Pregnancy trimesters were defined as first trimester (0–13 completed weeks of gestation), second trimester (14–26 completed weeks), and third trimester (27 or more completed weeks). Gestational age in weeks at birth was used to derive preterm birth categories, defined as extremely preterm (<28 weeks), very preterm (28-31 weeks), moderate to late preterm (32–36 weeks), and term (37 weeks or more) births. Small for gestational age births were defined as birthweight lower than the 10th percentile for liveborn, singleton infants and based on the Australian national birthweight percentiles by sex and gestational age [29]. Low birthweight was defined as a live birth with birthweight less than 2,500g irrespective of gestational age. Proportion optimal birthweight was calculated by WA Data Services [30] and provides a method of assessing appropriateness of intrauterine growth that is less dependent on the health of the reference population or the quality of their morphometric data than is percentile position on a birth weight distribution. Major birth defects were identified and include those known to increase susceptibility to respiratory infection morbidity including congenital heart, lung and neurological diseases and Trisomy 21.

Aboriginal children were identified using a derived flag for Aboriginal status from a combination of routine data sources using a pre-validated algorithm [31]. Geocoding data provided by the WA Data Services included the Socio-Economic Indices for Areas (SEIFA) scores to measure socioeconomic status [32]. SEIFA scores are derived from the Australian Bureau of Statistics and were calculated at the Statistical Areas Level 1, based on the boundaries used at the child’s year of birth, for population-based epidemiological and statistical research [32]. Statistical Areas Level 1 are geographic areas built from whole Mesh Blocks and generally have an average population of approximately 400 people. Of the four different SEIFA scores available, we chose to use the Index of Relative Socio-Economic Advantage and Disadvantage, which is derived from 17 different variables including education, occupation, income, housing, disability and family measures collected from the Census of Population and Housing [32]. SEIFA scores are grouped into five decile categories with the lowest scores representing the most socioeconomically deprived. Remoteness is determined through the Accessibility and Remoteness Index of Australia [33] which uses postcode of residence to categorise all individuals to major cities, inner regional, outer regional, remote, and very remote areas.

The identified exposures and outcomes of interest will differ according to each specified analysis using the Data Linkage Platform. Pathogen specific outcomes measured in the cohort include respiratory viral tests (including positive and negative results) from routine microbiological tests conducted through PathWest and disease notifications from Western Australian Notifiable Diseases Database (WANIDD), including influenza, pertussis, invasive pneumococcal disease and COVID-19. Hospital admissions and separations and ED presentations of interest will be identified using International Classification of Diseases diagnosis codes, version 10, Australian modification (ICD-10-AM). Principal and up to 20 additional diagnosis fields are used to classify outcomes of interest. For ED presentations, a hierarchy of coding was applied whereby the single principal diagnosis ICD code was used in preference over other free-text diagnosis and symptom codes in the following order: a) an ICD code; b) a symptom code; c) diagnosis at discharge text field; d) presenting complaint (symptom) text field; and e) a major diagnostic category (‘diseases and disorders of the respiratory system’) as previously published [34]. Importantly, our datasets include hospital and ED presentations for all causes, enabling the identification of non-specific outcomes (e.g. skin infections, all-cause injuries and trauma) that temporally link to a respiratory viral testing record, notifiable disease or conditions that can be used as negative controls. Procedure codes were based on the Australian Classification of Health Interventions codes, 11th edition. Common diagnoses and procedures relating to respiratory infections and the associated ICD-10-AM codes are shown in Supplementary Table 1.


From the complete cohort of 371,387 births, we removed 2,557 (0.7%) children born in WA but residing elsewhere, as we had no follow up secondary care data on these children. Of the remaining 368,830 births, 24,660 (6.7%) identify as Aboriginal (Table 2). A total of 4,077 (1.1%) children died during the study period (2010–2020), and 33,818 (9.2%) children were born preterm (<37 weeks). Other perinatal factors and cohort descriptors by Aboriginal status are shown in Table 2. Aboriginal mothers were younger, more likely to smoke, and less likely to be maternally vaccinated. They were also more likely to live in rural and remote locations and be in the lower socioeconomic quintiles.

Characteristic Total N = 368,830 Non-aboriginal N = 344,170 (93.3%) Aboriginal N = 24,660 (6.7%)
Maternal and socio-demographic factors
Maternal age at birth
 <20 years 11,154 (3.0%) 7,104 (2.1%) 4,050 (16.4%)
 20–24 years 47,723 (12.9%) 39,902 (11.6%) 7,821 (31.7%)
 25–29 years 102,086 (27.7%) 95,574 (27.8%) 6,512 (26.4%)
 30–34 years 126,729 (34.4%) 122,698 (35.7%) 4,031 (16.3%)
 35 or more years 81,138 (22.0%) 78,892 (23.0%) 2,246 (9.1%)
Smoking during pregnancy
 No 332,512 (90.2%) 318,269 (92.5%) 14,243 (57.8%)
 Yes 36,318 (9.8%) 25,901 (7.5%) 10,417 (42.2%)
Maternal history of asthma
 No 336,566 (91.3%) 314,301 (91.3%) 22,265 (90.3%)
 Yes 32,264 (8.7%) 29,869 (8.7%) 2,395 (9.7%)
Maternal aboriginal status a
 Non-Aboriginal 346,266 (93.9%) 342,915 (99.6%) 3,351 (13.6%)
 Aboriginal 22,564 (6.1%) 1,255 (0.4%) 21,309 (86.4%)
Maternal antenatal vaccinated
 Influenza 90,316 (24.5%) 84,962 (24.7%) 5,354 (21.7%)
 Pertussis 118,159 (32.0%) 112,115 (32.6%) 6,044 (24.5%)
Socio-economic disadvantage
 0–20% (most disadvantaged) 73,009 (19.8%) 60,455 (17.6%) 12,554 (51.3%)
 21–40% 77,460 (21.0%) 71,953 (20.9%) 5,507 (22.5%)
 41–60% 78,817 (21.4%) 75,044 (21.8%) 3,773 (15.4%)
 61–80% 75,602 (20.5%) 73,716 (21.4%) 1,886 (7.7%)
 81–100% (least disadvantaged) 63,276 (17.2%) 62,548 (18.2%) 728 (3.0%)
 Missing 666 (0.2%) 454 (0.1%) 212 (0.9%)
Remoteness index
 Major city 287,103 (77.9%) 276,935 (80.5%) 10,168 (41.5%)
 Inner regional 29,097 (7.9%) 27,447 (8.0%) 1,650 (6.7%)
 Outer regional 26,896 (7.3%) 23,031 (6.7%) 3,865 (15.8%)
 Remote 16,626 (4.5%) 12,354 (3.6%) 4,272 (17.4%)
 Very remote 8,612 (2.3%) 4,044 (1.2%) 4,568 (18.6%)
 Missing 496 (0.1%) 359 (0.1%) 137 (0.6%)
Child factors
Delivery mode
 Vaginal/instrumental 234,830 (63.7%) 217,024 (63.1%) 17,806 (72.2%)
 Caesarean 134,000 (36.3%) 127,146 (36.9%) 6,864 (27.8%)
Infant sex
 Female 189,728 (51.4%) 176,909 (51.4%) 12,819 (52.0%)
 Male 179,036 (48.5%) 167,208 (48.6%) 11,828 (48.0%)
 Indeterminate 66 (0.0%) 53 (0.0%) 13 (0.1%)
Gestational age
≥37 weeks 335,012 (90.8%) 314,159 (91.3%) 20,853 (84.6%)
 32–36 weeks 28,108 (7.6%) 25,089 (7.3%) 3,019 (12.2%)
 28–31 weeks 2,791 (0.8%) 2,412 (0.7%) 379 (1.5%)
<28 weeks 2,919 (0.8%) 2,510 (0.7%) 409 (1.7%)
Season of birth
 Spring (Sept-Nov) 92,132 (25.0%) 86,196 (25.0%) 5,936 (24.1%)
 Summer (Dec-Feb) 90,351 (24.5%) 84,217 (24.5%) 6,134 (24.9%)
 Autumn (Mar-May) 94,760 (25.7%) 88,315 (25.7%) 6,445 (26.1%)
 Winter (Jun-Aug) 91,587 (24.8%) 85,442 (24.8%) 6,154 (24.9%)
Number of other siblings
 0 99,013 (26.8%) 93,348 (27.1%) 5,665 (23.0%)
 1 113,305 (30.7%) 107,711 (31.3%) 5,594 (22.7%)
 2 or more 156,512 (42.4%) 143,111 (41.6%) 13,401 (54.3%)
Multiple births
 Singleton birth 358,383 (97.2%) 334,434 (97.2%) 23,949 (97.1%)
 Multiple birth 10,447 (2.8%) 9,736 (2.8%) 711 (2.9%)
Table 2: Demographic characteristics of key variables of the study cohort, including proportion of missing data for key variables. abased on the algorithm by Christensen et al. [31]. bAustralian Bureau of Statistics Socio-economic indexes for areas [32]. cAccessibility/Remoteness index of Australia.

Respiratory virus testing data over this time period was conducted for 48,823 children. Approximately 66% of specimens were tested for RSV, of which 15% of tests were positive. Based on an analysis of the proportion of pathology records linking to secondary care episodes up to 8 days either side of the date of specimen collection for the same child, we calculated the proportion of included PathWest records that linked, shown in Figure 3. Just over a half 39,725 (53.2%) laboratory records linked to a secondary care episode on the same day, with 56,059 (75.1%) linking within a 48-hour window either side. If the linkage rule was extended to 4 days either side of the date of secondary care admission/presentation, 57,836 (77.5%) laboratory records linked. We will use the 4-day inclusion rule to identify secondary care episodes with linked laboratory records for our planned analysis, over a previously used 2-day rule due to recent analyses showing a longer duration of RSV viral shedding [35].

Figure 3: Linkage of PathWest respiratory virus testing data with hospital admissions and emergency department data (n=74,621 laboratory, n=588,784 hospital and n=1,495,232 emergency department records).

Hospital separations related to birth admissions were removed. Inter-hospital transfers were defined as multiple adjacent hospital admission records of the same person where either a) the admission date was identical to the discharge date of the prior record; or b) the admission date occurred before the discharge date of the prior record; or c) both admission and discharge dates fell within the time span of the prior record. These records were collapsed and considered a single admission. From 588,784 admissions, 24,590 (4.1%) were part of a transfer set of records.


The Respiratory Infections Linked Data Platform takes advantage of data linkage capacity in WA, and the availability of State and Commonwealth data sources, to capture population-level RSV and other respiratory virus epidemiology in an Australian setting, focusing on the paediatric population. It will assess the public health impact of current RSV immunisation strategies, future RSV and other respiratory virus prevention strategies, and provide enduring utility to guide ongoing public health policy decisions.

Future updates and planned work

The current iteration of the Respiratory Infections Linked Data platform contains predominantly pre-pandemic data. As RSV became notifiable nationally in 2021/22 (in WA, from 1 August 2021) [36], we envisage a data refresh of the Data Platform in future years with the addition of all-age RSV notifications, during and post COVID-19 periods when RSV seasonality was disrupted (2021-25). This will complement laboratory surveillance data, providing valuable insights on community RSV incidence outside secondary care in all age groups and critical inputs for dynamic models informing RSV population dynamics.

We are also in the process of requesting approval to link Commonwealth datasets, including the Australian Immunisation Register (data on infant and child vaccinations), Medicare Benefits Scheme data (family physician encounters) and Pharmaceutical Benefits Scheme data (data on medication and prescriptions, including antimicrobial agents). The addition of these datasets will allow studies on direct, indirect, and non-specific maternal and childhood vaccine effects on a range of outcomes.

The use of polymerase chain reaction testing by diagnostic pathology laboratories surged in 2010 following the influenza pandemic of 2009. In WA, in-house tests were commonly used, which allowed discrimination of subtypes of influenza, RSV and parainfluenza viruses. Steadily over the period covered by this dataset that coverage has diminished, and that trend is likely to continue. Furthermore, we are now in the era of rapid antigen testing (RAT) which is available for influenza and RSV as well as SARS-CoV-2. As the uptake of RATs increases, it will replace PCR as the primary diagnostic tool for community infections, which will not be notified. The timing of this primarily pre-pandemic linked data set is therefore unique with respiratory viruses identified with unprecedented granularity and represents a truer snapshot of respiratory pathogen testing and positivity. It also comprises baseline data with which to compare future data updates as testing practices change.

Strengths and limitations

We have constructed a population-based linked dataset comprising administrative and registry data from 14 databases to provide a longitudinal analysis of RSV and other respiratory virus pre-COVID-19 activity. However, our established platform does have some notable limitations. Our study only includes children born and resident in WA from 2010 to the most recent date at the time of request, as we were unlikely to have outcome data on children non-resident in WA but born here. Similarly, we did not include children born elsewhere in Australia and residing in WA as we did not have perinatal data on these children (the primary source of risk factor data for our planned analyses). With respect to immigration, the latest Australian census (2016) estimates that in 2020, 0.3% of WA children less than 10 years of age were born overseas [37]. Therefore, our study cohort may not be generalisable to immigrant children. We were also unable to capture hospital encounters for our cohort that occurred outside of WA. However, a study linking hospitalisations across four Australian states found that only 0.2% of hospitalisations in WA residents occurred in the other states [38]. RSV was made a notifiable disease in August 2021 in WA, after ethics approvals for the Platform were gained and data were extracted for linkage. Hence, our Platform lacks RSV notifications. Importantly and uniquely in Australia, the PathWest data provide the only source of laboratory-confirmed RSV infections for the years prior to 2021 when RSV became notifiable. Over the period 2010-2020, PathWest laboratories consistently covered all public hospital testing (>80% of all hospitalised patients; close to 100% for all children). Coverage of community tests decreased between 2010 and 2020, with private pathology providers increasing their scope of testing, particularly in metropolitan and southern WA. Private pathology data are not currently available for research due to client privacy and confidentiality concerns; however, this is an area of ongoing investigation with our research team. PathWest is often the sole provider in remote WA and has maintained good representation of collection centres in rural locations [39, 40]. Finally, our study includes primarily secondary care data and lacks primary care data to provide community burden. However, we currently have a program of research in early childcare to alleviate this gap.


The data within the Respiratory Infections Data Linkage Platform cannot be shared publicly. Access to the data is subject to approval by data custodians and provided by Data Services at the WA Department of Health ( Use of the data is restricted to named researchers only on the approved ethics protocols. Further details on the data platform can be accessed through lead investigators (, while access to the raw data can be requested via Data Services at the WA Department of Health (


The authors wish to thank the staff at WA Data Services of the WA Department of Health. We are also grateful to the data custodians of the data collections used (West Australian Register of Births, Deaths, and Marriages, the Midwives Notification System, Hospital Morbidity Data Collection, Emergency Department data collection, the WA register for developmental anomalies and the Cerebral Palsy register, the West Australian Notifiable Diseases Database and the West Australian Antenatal Vaccination Database, and the PathWest Laboratory Medicine respiratory surveillance Database and staff at the National Coronial Information System and the Victorian Department of Justice and Community Safety. The authors would also like to acknowledge the contribution of the RSV+ Community Reference Group at Telethon Kids Institute for their guidance on this program of work.

HCM is supported by a Stan Perron Charitable Foundation Fellowship and the Future Health Research and Innovation Fund through the WA Near-miss Awards program. The funding bodies had no role in the design, conduct, analysis or interpretation of the study or decision to publish.

Ethics statement

Ethical approval was granted by the WA Department of Health Human Research Ethics Committee [Project ID: RGS4675] and the Western Australian Aboriginal Health Ethics Committee [Project ID: 1138]. Obtaining individual consent for population-level data is impractical. Therefore, a waiver of consent was approved by the WA Department of Health Human Research Ethics Committee in accordance with state and national privacy legislation. This allowed the use of administrative data containing personal information for approved health research (NHMRC guidelines, section 95, Privacy Act 1988) [41]. As per our approved data application, all research outputs will be approved by the WA Data Services prior to publication, no raw data will be presented and all cell sizes of less than 5 will be suppressed.

Conflicts of interest

HCM has received institutional honoraria from advisory committees sponsored by Merck Sharpe & Dohme (Australia) Pty. Ltd, Pfizer and Sanofi for other work unrelated to this analysis. HCM, BT, PR have received funding from Sanofi-Aventis in the form of an externally sponsored collaboration agreement. PR has received institutional honoraria from advisory committees sponsored by GSK, Pfizer, Merck, AstraZeneca, and Novavax. PR also receives funding from Merck Sharpe & Dohme (Australia) Pty. Ltd, GSK. HCM and MS have received travel funding from Seqirus.


CP Cerebral Palsy
ED Emergency Department
hMPV Human Metapneumovirus
ICD International Classification of Diseases
ICU Intensive Care Unit
mAb Monoclonal Antibody
MNS Midwives Notification System
RSV Respiratory Syncytial Virus
SEIFA Socio-economic Index for Areas
WA Western Australia
WAAVD West Australian Antenatal Vaccines Database
WANIDD West Australian Notifiable Infectious Diseases Database
WARDA West Australian Register of Developmental Anomalies


  1. Pratt MTG, Abdalla T, Richmond PC, et al. Prevalence of respiratory viruses in community-acquired pneumonia in children: a systematic review and meta-analysis. Lancet Child Adolesc Health 2022; 6:555–70. 10.1016/S2352-4642(22)00092-X

  2. Li Y, Wang X, Blau DM, et al. Global, regional, and national disease burden estimates of acute lower respiratory infections due to respiratory syncytial virus in children younger than 5 years in 2019: a systematic analysis. Lancet 2022; 399:2047–64. 10.1016/S0140-6736(22)00478-0

  3. Ranmuthugala G, Brown L, Lidbury BA. Respiratory syncytial virus–the unrecognised cause of health and economic burden among young children in Australia. Commun Dis Intell Q Rep 2011; 35:177–84.

  4. Brusco NK, Alafaci A, Tuckerman J, et al. The 2018 annual cost burden for children under five years of age hospitalised with respiratory syncytial virus in Australia. Commun Dis Intell 2022; 46:1–23. 10.33321/cdi.2022.46.5

  5. Gebremedhin AT, Hogan AB, Blyth CC, Glass K, Moore HC. Developing a prediction model to estimate the true burden of respiratory syncytial virus (RSV) in hospitalised children in Western Australia. Scientific Reports 2022; 12:332–44. 10.1038/s41598-021-04080-3

  6. Eden JS, Sikazwe C, Xie R, et al. Off-season RSV epidemics in Australia after easing of COVID-19 restrictions. Nat Commun 2022; 13:2884. 10.1038/s41467-022-30485-3

  7. Foley DA, Sikazwe CT, Minney-Smith CA, et al. An Unusual Resurgence of Human Metapneumovirus in Western Australia Following the Reduction of Non-Pharmaceutical Interventions to Prevent SARS-CoV-2 Transmission. Viruses 2022; 14:2135–49. 10.3390/v14102135

  8. Griffin MP, Yuan Y, Takas T, et al. Single-Dose Nirsevimab for Prevention of RSV in Preterm Infants. N Engl J Med 2020; 383:415–25. 10.1056/NEJMoa1913556

  9. Hammitt LL, Dagan R, Yuan Y, et al. Nirsevimab for Prevention of RSV in Healthy Late-Preterm and Term Infants. N Engl J Med 2022; 386:837–46. 10.1056/NEJMoa2110275

  10. Joint Committee on Vaccination and Immunisation. Respiratory Syncytial Virus (RSV) immunisation programme: JCVI advice, 7 June 2023 – GOV.UK. Available at:

  11. Melgar M, Britton A, Roper LE, et al. Use of Respiratory Syncytial Virus Vaccines in Older Adults: Recommendations of the Advisory Committee on Immunization Practices – United States, 2023. MMWR Morb Mortal Wkly Rep 2023; 72:793–801. 10.15585/mmwr.mm7229a4

  12. Marchant A, Sadarangani M, Garand M, et al. Maternal immunisation: collaborating with mother nature. Lancet Infect Dis 2017; 17:e197–e208. 10.1016/S1473-3099(17)30229-3

  13. US Food and Drug Administration. FDA approves first vaccine for pregnant individuals to prevent RSV in infants, 2023.

  14. Rice E, Oakes DB, Holland C, Moore HC, Blyth CC. Respiratory syncytial virus in children: epidemiology and clinical impact post-COVID-19. Curr Opin Infect Dis 2023. 10.1097/qco.0000000000000967

  15. Sarna M, Gebremedhin AT, Richmond P, Levy A, Glass K, Moore HC. Determining the true incidence of seasonal respiratory syncytial virus-confirmed hospitalizations in preterm and term infants in Western Australia. Vaccine 2023; 41:5216–20. 10.1016/j.vaccine.2023.07.014

  16. Le H, Gidding H, Blyth CC, Richmond P, Moore HC. Pneumococcal Conjugate Vaccines Are Protective Against Respiratory Syncytial Virus Hospitalizations in Infants: A Population-Based Observational Study. Open Forum Infect Dis 2023; 10:ofad199. 10.1093/ofid/ofad199

  17. Jama-Alol KA, Moore HC, Jacoby P, Bower C, Lehmann D. Morbidity due to acute lower respiratory infection in children with birth defects: a total population-based linked data study. BMC pediatrics 2014; 14:80. 10.1186/1471-2431-14-80

  18. Sarna M, Pereira GF, Foo D, Baynam GS, Regan AK. The risk of major structural birth defects associated with seasonal influenza vaccination during pregnancy: A population-based cohort study. Birth Defects Res 2022; 114:1244–56. 10.1002/bdr2.2049

  19. Regan AK, Moore HC, Binks MJ, et al. Maternal Pertussis Vaccination, Infant Immunization, and Risk of Pertussis. Pediatrics 2023; 152. 10.1542/peds.2023-062664

  20. Foo D, Sarna M, Pereira G, Moore HC, Regan AK. Maternal influenza vaccination and child mortality: Longitudinal, population-based linked cohort study. Vaccine 2022; 40:3732–6. 10.1016/j.vaccine.2022.05.030

  21. Foo D, Sarna M, Pereira G, Moore HC, Regan AK. Prenatal influenza vaccination and allergic and autoimmune diseases in childhood: A longitudinal, population-based linked cohort study. PLoS Med 2022; 19:e1003963. 10.1371/journal.pmed.1003963

  22. Foo D, Sarna M, Pereira G, Moore HC, Regan AK. Association between maternal influenza vaccination and neurodevelopmental disorders in childhood: a longitudinal, population-based linked cohort study. Arch Dis Child 2023; 108:647–53. 10.1136/archdischild-2022-324269

  23. Le H, de Klerk N, Blyth CC, Gidding H, Fathima P, Moore HC. Non-specific benefit of seasonal influenza vaccine on respiratory syncytial virus-hospitalisations in children: An instrumental variable approach using population-based data. Vaccine 2023; 41:5029–36. 10.1016/j.vaccine.2023.06.085

  24. Australian Bureau of Statistics. Snapshot of Western Australia. Available at:,capital%20city%20area%20of%20Greater%20Perth%20%282.1%20million%29.

  25. Kelman CW, Bass AJ, Holman CD. Research use of linked health data–a best practice protocol. Aust NZ J Public Health 2002; 26:251–5. 10.1111/j.1467-842x.2002.tb00682.x

  26. Eitelhuber TW, Thackray J, Hodges S, Alan J. Fit for purpose - developing a software platform to support the modern challenges of data linkage in Western Australia. Int J Popul Data Sci 2018; 3:435. 10.23889/ijpds.v3i3.435

  27. Holman CD, Bass AJ, Rouse IL, Hobbs MS. Population-based linkage of health records in Western Australia: development of a health services research linked database. Aust N Z J Public Health 1999; 23:453–9. 10.1111/j.1467-842x.1999.tb01297.x

  28. V G, V D. Validation study of the Western Australian Midwives’ Notification System 1992. Perth, WA: Western Australia Department of Health. 1994;

  29. Dobbins TA, Sullivan EA, Roberts CL, Simpson JM. Australian national birthweight percentiles by sex and gestational age, 1998-2007. Med J Aust 2012; 197:291–4. 10.5694/mja11.11331

  30. Blair EM, Liu Y, de Klerk NH, Lawrence DM. Optimal fetal growth for the Caucasian singleton and assessment of appropriateness of fetal growth: an analysis of a total population perinatal database. BMC pediatrics 2005; 5:13. 10.1186/1471-2431-5-13

  31. Christensen D, Davis G, Draper G, et al. Evidence for the use of an algorithm in resolving inconsistent and missing Indigenous status in administrative data collections. Aust J Soc Issues 2014; 49:423-43. 10.1002/j.1839-4655.2014.tb00322.x

  32. Australian Bureau of Statistics. Socio-Economic Indexes for Areas (SEIFA) 2016 Technical Paper: Australian Bureau of Statistics Canberra, Australia, 2018.

  33. Australian Bureau of Statistics. The Australian Statistical Geography Standard (ASGS) Remoteness Structure. Available at:

  34. Barnes R, Blyth CC, de Klerk N, et al. Geographical disparities in emergency department presentations for acute respiratory infections and risk factors for presenting: a population-based cohort study of Western Australian children. BMJ Open 2019; 9:e025360. 10.1136/bmjopen-2018-025360

  35. Munywoki PK, Koech DC, Agoti CN, et al. Influence of age, severity of infection, and co-infection on the duration of respiratory syncytial virus (RSV) shedding. Epidemiol Infect 2015; 143:804–12. 10.1017/S0950268814001393

  36. Department of Health Australian Government. Respiratory syncytial virus (RSV) infection. Available at:

  37. Australian Bureau of Statistics. Migration, Australia, 2014-15: cat. no. 3412.0, ABS, Canberra. Available at:

  38. Spilsbury K, Rosman D, Alan J, Boyd JH, Ferrante AM, Semmens JB. Cross border hospital use: analysis using data linkage across four Australian states. Med J Aust 2015; 202:582-6. 10.5694/mja14.01414

  39. Lim FJ, Blyth CC, Fathima P, de Klerk N, Moore HC. Record linkage study of the pathogen-specific burden of respiratory viruses in children. Influenza Other Respir Viruses 2017; 11:502–10. 10.1111/irv.12508

  40. Anonymous. PathWest Locations. Available at:

  41. Government of Australia. National Health and Medical Research Council Guidelines under Section 95 of the Privacy Act 1988. In: Health Do, ed, 2015.

  42. Anonymous. The Midwives Notification System. Available at:

  43. Department of Health WA. The West Australian register of developmental anomalies:1980-2014. Available at:

  44. Nembhard WN, Bower C. Evaluation of the Western Australian Register of Developmental Anomalies: Thirty-five years of surveillance. Birth Defects Res A Clin Mol Teratol 2016; 106:894–904. 10.1002/bdra.23575

  45. Anonymous. Cerebral Palsy Register. Available at:

  46. Regan AK, Mak DB, Moore HC, et al. Surveillance of antenatal influenza vaccination: validity of current systems and recommendations for improvement. BMC Public Health 2015; 15:1155–62. 10.1186/s12889-015-2234-z

  47. Government of Western Australia: Department of Health. Hospital Morbidity Data System. Reference Manual Part A: Contacts, Hospital Responsibilities, Data Element Definitions. Available at:

  48. Government of Western Australia: Department of Health. Process For Emergency Department Patient Statistics–Emergency Department Data Collections, 2018.

Article Details

How to Cite
Sarna, M. (Minda), Taye, B., Le, H., Giannini, F., Glass, K., Blyth, C., Richmond, P., Glauert, R., Levy, A. and Moore, H. (2024) “Cohort profile: A population-based record linkage platform to address critical epidemiological evidence gaps in respiratory syncytial virus and other respiratory infections”, International Journal of Population Data Science, 9(2). doi: 10.23889/ijpds.v9i2.2376.

Most read articles by the same author(s)

1 2 > >>