UK Longitudinal Linkage Collaboration (UK LLC): The National Trusted Research Environment for Longitudinal Research

Main Article Content

Andy Boyd
Katharine M. Evans
https://orcid.org/0000-0002-9819-1049
Emma Turner
Robin Flaig
Jacqui Oakley
Kirsteen C. Campbell
Richard Thomas
Stela McLachlan
Matthew Crane
To view the complete list of authors, please see the PDF

Abstract

Introduction
The UK Longitudinal Linkage Collaboration (UK LLC) is the national Trusted Research Environment (TRE) for the UK's longitudinal research community, supporting the UK's unparalleled collection of Longitudinal Population Studies (LPS). Initially set up as a COVID-19 research resource, UK LLC is now a generic database for any research for the public good.


Objectives
UK LLC supports longitudinal research by providing record linkage and TRE services.


Methods
The UK LLC partnership provides a secure analytics environment, a trusted third-party linkage processor and a comprehensive governance framework to minimise risks to participant confidentiality. UK LLC is ISO 27001 certified and accredited by the UK Statistics Authority as a processor under the Digital Economy Act. The active involvement by members of UK LLC's public involvement programme ensures UK LLC is acceptable to LPS participants and the wider public. All UK LPS are eligible for inclusion. Researchers can apply to access the TRE via an approach that fulfils the needs of the LPS, the linked data owners and includes a review by public contributors.


Results
Twenty-two LPS have so far joined UK LLC. Where permissions allow, participants are linked to their National Health Service (NHS) England, NHS Wales and place-based records, with work ongoing to link to NHS Scotland and non-health administrative records, including Department for Work and Pensions and His Majesty's (HM) Revenue and Customs. UK LLC Explore allows potential researchers to discover the breadth of data available in the TRE. All applications are listed on UK LLC's publicly accessible Data Access Register.


Conclusions
UK LLC enables researchers to interrogate pooled LPS participant data that are systematically linked to diverse records. UK LLC remains open to additional LPS joining the partnership and will increase the breadth of data to support the longitudinal research community and attract increasing numbers of researchers across multiple disciplines, government departments and industry.

Introduction

The UK Longitudinal Linkage Collaboration (UK LLC) is the national Trusted Research Environment (TRE) for the UK’s longitudinal research community. It is designed to support the UK’s unparalleled collection of Longitudinal Population Studies (LPS) by providing record linkage and secure analysis and data curation services. LPS follow the lives of participant volunteers over time, including over whole lifetimes, generations of families and households. Data collected include in-depth measures of physical and mental health, lifestyle, environmental and socio-economic measures, alongside biological samples. LPS therefore provide a depth of data that gives unique insights into population wellbeing, behaviours and development. The primary funders for UK LPS share a strategic objective for maximising the value of LPS by ensuring their data are FAIR (findable, accessible, interoperable and reusable) [1] and enhanced through linking participants to their routinely collected health and administrative records and geo-spatial ‘place-based’ data about the environment, properties and neighbourhoods in which they live.

UK LLC is a growing partnership of many of the UK’s most established LPS and, with a strong public contribution, it is led by the Universities of Bristol and Edinburgh, in collaboration with Swansea University; the University of Leicester; City St George’s, University of London; and UCL. UK LLC has four main objectives:

  1. To provide record linkage services to LPS. This includes bringing together de-identified data from the LPS and to systematically link these data to the participants’ health, administrative, and environment and neighbourhood data in the UK LLC TRE in a manner that is compliant with all relevant legal, information security and regulatory frameworks, is publicly acceptable and is efficient to all parties.
  2. To provide a secure generic research database – the UK LLC TRE – that supports efficient access from large numbers of UK-based approved researchers for approved projects, so they can conduct research for the public good using diverse data from one or more LPS linked to diverse participant records.
  3. To drive improvements in research equity by co-locating many LPS’ populations into one location to create a highly heterogeneous UK-wide sample, with increased statistical power to study ‘rare’ exposures/outcomes and to consider differential outcomes across diverse and under-served population sub-groups.
  4. To offer the UK LLC TRE as part of an efficient, responsive and secure UK data science capability to support investigation of emerging policy questions and to meet future crises such as pandemics, the impacts of climate change or economic shocks.

UK LLC was established in 2020 as part of the COVID-19 Longitudinal Health and Wellbeing National Core Study to fulfil these objectives and to support the longitudinal research community to transition to a TRE way of working. Initially supported by HM Treasury to underpin high priority COVID-19 research questions, UK LLC is now funded by UK Research and Innovation, Economic and Social Research Council and Medical Research Council.

Currently, 22 LPS have joined UK LLC. This group is drawn from a collection of around 100 UK LPS with a combined estimated population size of >3 million individuals. UK LLC remains open to additional LPS joining the partnership through a formalised onboarding process. The UK LLC TRE is likely to host data about >500,000 LPS participants – similar to extensively used flagship databases such as UK Biobank – with a highly heterogenous sample and data often collected from pregnancy or soon after birth and across multiple generations. This full life-course data has significant advantages in informing research on critical health or social science effects. For example, the impacts of in utero or early life exposures on later outcomes, or generational transmission of inequalities, or identifying factors impacting on key transition periods, such as from education into employment. In addition, data hosted in the UK LLC TRE could provide significant benefits when used in combination with other databases. For example, UK LLC could provide an effective means to replicate findings initially made in UK Biobank or other large volunteer database (or vice versa), or to conduct deep phenotyping for analysis protocols designed to leverage the strengths of whole population databases (i.e. large power, high representation, but limited depth of data), through access to granular LPS data, such as health or social behaviours, that are not present in routine records.

The UK LLC protocol has been iteratively developed with key stakeholders, including public and participant contributors and national data owners. This paper summarises version 2.0 of UK LLC’s protocol [2] and describes a generic research databank for longitudinal research for the public good. It replaces version 1.0 of the protocol which established UK LLC as a COVID-19 research resource [3]. This protocol paper describes UK LLC in its widest terms, including partner organisations; legal and regulatory basis; underlying technical infrastructure; role of public/participant contributors; ethico-legal basis; and approach to reproducible research. The paper then summarises the current set of contributing LPS; the types of data available; the mechanisms for making the data discoverable and available to approved researchers; and the breadth of the anticipated scientific programme.

Methods

UK LLC trusted research environment

The Trusted Research Environment (TRE) concept is established across academic [4, 5], National Health Service (NHS) [6] and wider government stakeholders [7], and organisations devoted to promoting privacy and trustworthiness of data use [8, 9], as an appropriate mechanism for population health science for public good, whilst controlling for risks to confidentiality and data misuse. Most UK TREs, including UK LLC, have adopted the ‘Five Safes’ governance framework [10] for the design and management of a TRE. The UK LLC TRE comprises a secure analytics environment, a trusted third party linkage processor and a comprehensive governance framework implementing technological, data, security and ethico-legal governance controls to minimise risks to participant confidentiality.

UK LLC TRE partnership

UK LLC contracts Swansea University to provide a Secure eResearch Platform (SeRP UK) infrastructure for the UK LLC TRE and their partner, NHS Digital Health and Care Wales (DHCW), to conduct trusted third party linkage services. The Universities of Bristol and Edinburgh manage the governance, data curation, data application and public involvement aspects of UK LLC (and lead the programme as a whole), with the University of Bristol acting as Data Controller. The University of Leicester and City St George’s, University of London provide environmental exposure modelling and geo-coding expertise; Swansea University provides additional data curation expertise; and UCL provides expert interdisciplinary guidance in longitudinal research. The active involvement in the design and operation of UK LLC by contributing LPS and participants/public via UK LLC’s Public and Participant Involvement and Engagement (PPIE) programme, ensures the system is designed to meet the collective and interdisciplinary needs of LPS in a manner acceptable to participants and the wider public.

The UK LLC TRE partnership works alongside and often in collaboration with wider UK Data Science investments, including other TREs (OpenSAFELY, Office for National Statistics, Secure Anonymised Infrastructure for Linkage (SAIL) Databank), data owners (NHS, Department of Education, Department for Work and Pensions and HM Revenue and Customs), LPS infrastructures (CLOSER, UK Data Service, Dementias Platform UK), and funders and data science enabling programmes (Health Data Research UK and Administrative Data Research UK) (Figure 1.

Figure 1: The wider UK LLC TRE partnership. 1Lead organisations: University of Bristol and University of Edinburgh. 2Partner organisations: City St George’s, University of London; Digital Health and Care Wales (DHCW); Secure eResearch Platform (SeRP UK); Swansea University; UCL; University of Leicester. 3Data users: Researchers interested in accessing data in the UK LLC TRE. 4Data community: Other data science organisations and infrastructures. LPS: Longitudinal Population Studies; PPIE: Public and Participant Involvement and Engagement.

Legal and regulatory basis

Both UK LLC and collaborating LPS maintain a UK General Data Protection Regulation (GDPR) basis for holding and processing data and for the use of special category information. For UK LLC this is Article 6(1)(e) and Article 9(2)(j), respectively. The use of participant personal data for record linkage purposes carries a Duty of Confidentiality in UK Common Law, which LPS meet by either gaining explicit participant consent or through using opt-out objection mechanisms coupled with regulatory approvals and enabling legislation, which differ across the UK. For NHS England and Wales records, UK LLC fully respects any participant objections recorded in the respective NHS national data opt out schemes, unless overridden by explicit consent. NHS Wales flows de-identified data into the SAIL Databank (based on SeRP UK infrastructure) in such a way that they are not personal data whilst in the protective controls of the SAIL Databank and SeRP infrastructure (which includes the UK LLC TRE). The flow of NHS data in Scotland into the UK LLC TRE will be assessed via a ‘public interest’ test conducted by NHS Scotland Public Benefit and Privacy Panel for Health and Social Care. The legal basis for the linkage and use of non-health data will be section 64 of the Digital Economy Act 2017.

Participant and public involvement and engagement (PPIE)

UK LLC has embedded public contributors, including LPS participants and members of the public, across its work. Many of UK LLC’s safeguards and system design decisions were informed by extensive PPIE conducted by individual LPS with their participants over many years. UK LLC has recruited a diverse group of public contributors who input to: (i) UK LLC Strategic Advisory Committee (strategy and future-facing guidance); (ii) UK LLC Data Access Public Review Panel (review of applications to access the UK LLC TRE); (iii) UK LLC Public Advisory Group (insights on new initiatives and review of UK LLC communication materials); and (iv) UK LLC Public Involvement Network (wider public perceptions on UK LLC activity). By embedding PPIE in its operations, UK LLC aims to include the public voice in design and decision making, thereby listening to concerns and demonstrating that it is trustworthy.

Working in partnership with LPS and their participants

The trust relationship between an LPS and its participants is crucial to maintaining ongoing participation in an LPS. LPS need to demonstrate to their participants how they are maintaining their custodian role within the UK LLC TRE. The framing and ‘rules’ relating to this are specific to any given LPS, because all differ in key ways: purpose; start date; longevity; sample composition; and prior assurances to participants. These assurances are drawn from all information given to participants over time, including consent materials and some LPS may have provided more formalised ‘social contracts’ (a set of rules regarding data use, e.g. ALSPAC [11] and TEDS [12]).

Through PPIE, UK LLC co-develops generic fair processing materials, which detail: (i) the parties and data involved; (ii) the way in which the data flow and are combined; (iii) the intended purpose for the data; (iv) how UK LLC minimises risks; (v) how UK LLC respects existing objections; and (vi) how a participant can register an objection. These materials are shared with LPS for them to tailor and implement before any data flow into the UK LLC TRE. LPS remain as Data Controllers of their data, which means at an LPS level they can decide which datasets to link to and which projects are permitted access to their data.

Split-file processing and record linkage

The SeRP UK-DHCW partnership offers ‘split file’ processing, which enables the UK LLC TRE to be a fully de-identified environment where hosted data are effectively anonymous to all approved researchers and the system administrators. DHCW, as UK LLC’s linkage trusted third party, only handles LPS participants’ identifiers and does not have access to any further individual attribute data, and SeRP UK and UK LLC only have access to de-identified LPS data and linked records (Figure 2).

Figure 2: UK LLC architecture for ‘split file’ management of participant identifiers and de-identified LPS and linked records.

Using this methodology, the LPS data are split by LPS data managers into a file of personal identifiers (‘File 1’) and a file of de-identified attribute data (‘File 2’). LPS send the File 1 to DHCW – this file includes current and historic names, date of birth, NHS ID, current and historic address data and permission status flags. Distinct permission flags are provided for each linked data source and for the use of address data for geo-modelling. Permission flags can be set at the LPS or participant level to indicate permissions/objections. Refreshed File 1s are sent to DHCW each quarter, enabling UK LLC to enact withdrawal/dissents, to provide updated personal identifiers to reflect name and address changes, and to add new participants for LPS with active recruitment.

DHCW then act as the linkage ‘broker’, facilitating linkages by sending reformatted and permission-filtered files to linked data owners and UK LLC’s geo-modeller for processing. These organisations create and send File 2s containing de-identified linked attribute data to SeRP UK for import into the UK LLC TRE. The File 2s sent to SeRP UK by LPS are processed in the same way.

Secure analytics environment

The UK LLC TRE comprises a virtual desktop infrastructure running a Microsoft Windows operating system and containing standard analytics software (including R and STATA statistics software, Python programming tools, GitLab and Jupyter Notebooks documentation tools, and Microsoft Office software). Data are hosted in a Microsoft SQL Server relational database, accessible through a database management software such as SQL Server Management Studio, Eclipse and through ODBC connections to statistical software. This Windows environment is linked to a high-volume storage facility (2PB capacity) and to a High-Performance Computing cluster (~1800 processing cores and ~14TB of memory) to enable computationally intensive analysis. All computing, processing and data storage is conducted on Swansea University servers meaning the data always remain in the UK.

The environment functions as a ‘reading library’ where the system is configured to allow approved researchers to remotely access the UK LLC TRE to conduct analyses, but blocks any mechanism for taking data out. Researchers can request the export of aggregated analytical outputs from the TRE following statistical disclosure control assessment by a team at SAIL Databank. SeRP UK, SAIL Databank, DHCW and UK LLC all have information security management systems that are certified to ISO 27001 international standard for information security. In addition, UK LLC, SAIL Databank and DHCW are accredited by the UK Statistics Authority as processing environments under the Digital Economy Act 2017.

Reproducible and reusable research

UK LLC requires all approved researchers to make analytical syntax, code lists, derived variables and project documentation available to future users in the UK LLC GitHub and GitLab, as appropriate. These research tools are catalogued and made discoverable and accessible to future UK LLC users. UK LLC has developed ‘helper’ syntax and software to enable commonly conducted tasks, e.g. providing the mechanism to extract data from the UK LLC database into a researcher’s preferred statistical software package and to annotate these data with metadata labels. UK LLC also derives a file of core socio-economic and demographic indicators, which are harmonised in definition and encoding across all contributing LPS to expedite cross-LPS research, using both self-reported and linked information. UK LLC’s Team Data Science approach is informed by the UK Reproducibility Network [13] and the FAIR Guiding Principles for scientific data management and stewardship [1].

Contributing LPS and participant sample

LPS are eligible for inclusion in UK LLC if they have/had direct follow-up with participants and linkage permissions. There are no eligibility criteria regarding the number of participants in an LPS, because it is recognised that some LPS with small sample sizes have either unique and important data or may feature under-served groups. At the time of writing, 22 LPS participate in UK LLC (Table 1), including pan-UK studies, studies based in England, Scotland or Northern Ireland alone and studies with biomedical and/or social science backgrounds. Detailed descriptions of LPS cohorts are provided in the Supplementary Materials. All 22 LPS that participated in the initial COVID-19 programme chose to continue involvement in the new phase of UK LLC (the 2023–2028 funding period). Substantial numbers of additional LPS are now seeking to join the UK LLC partnership and the first will onboard during 2025.

Name 1 Owner Coverage Cohort Years
AIRWAVE [23] Imperial College London E, S, W 53,280 police officers and staff, ≥17 years, recruited 2004–2015 2004–
ALSPAC [24] University of Bristol E c. 14,000 pregnant women recruited 1991–1992 1991–
BCS70 [25] UCL E, S, W c. 17,000 babies born in a single week of 1970 1970–
BIB [26] Bradford Teaching Hospitals NHS Foundation Trust E 12,453 women (3,443 partners) with 13,776 pregnancies at Bradford Royal Infirmary 2007–2010 2007–
ELSA [27] UCL E c. 18,000 adults, ≥50 years, recruitment ongoing 2002–
EPICN [28] University of Cambridge E c. 30,000 adults, 40-79 years, recruited 1993–1998 1993–
EXCEED [29] University of Leicester E c. 11,000 adults, recruitment ongoing 2013–
FENLAND [30] University of Cambridge E 12,435 adults born 1950–1975 2005–
GENSCOT [31] University of Edinburgh S c. 24,000 people, ≥12 years, recruitment ongoing 2006–
GLAD [32] King’s College London UK c. 40,000 people, ≥16 years, recruitment ongoing 2018–
MCS [33] UCL UK 18,818 babies born in 2000–2002 2000–
NCDS58 [34] UCL E, S, W 17,415 babies born in a single week in 1958 1958–
NEXTSTEP [35] UCL E c. 16,000 people born 1989–1990 recruited in 2004 2004–
NICOLA [36] Queen’s University Belfast NI c. 8,500 adults aged ≥50 years, recruitment ongoing 2013–
NIHRBIO_COPING [37] University of Cambridge UK c. 150,000 people aged ≥16 years, recruitment ongoing2 2020–
NSHD46 [38] UCL E, S, W 5,362 babies born in a single week in 1946 1946–
SABRE [39] UCL E 4,858 adults aged 40–69 years recruited 1988–1991 1988–
TEDS [40] (includes E-Risk) King’s College London E, W 13,759 pairs of twins born 1994–1996 1994–
TRACKC19 University of Cambridge E Up to 90,000 adults previously recruited into INTERVAL [41], COMPARE [42] and STRIDES [43] 2020–
TWINSUK [44] King’s College London UK c. 15,000 adults who are identical or non-identical twins, recruitment ongoing 1992–
UKHLS [45] University of Essex UK c. 40,000 households recruited in 2009 2009–
UK-REACH [46] University of Leicester UK c. 18,000 HCWs recruited 2021–2022 2020–2045
Table 1: Key information about the 22 LPS that contribute to the UK LLC TRE. 1See the Supplementary Materials for additional key references. 2Recruited from the general NIHR BioResource (including c. 14,000 participants from the COPING study). AIRWAVE: The Airwave Health Monitoring Study; ALSPAC: Avon Longitudinal Study of Parents and Children; BCS70: 1970 British Cohort Study; BIB: Born in Bradford; E: England; ELSA: The English Longitudinal Study of Ageing; EPICN: The European Prospective Investigation into Cancer Norfolk Study; E-Risk: Environmental Risk Longitudinal Twin Study; EXCEED: Extended Cohort for E-health, Environment and DNA; FENLAND: The Fenland Study; GENSCOT: Generation Scotland; GLAD: Genetic Links to Anxiety and Depression Study; MCS: The Millennium Cohort Study; NCDS58: 1958 National Child Development Study; NEXTSTEP: The Next Steps Study; NI: Northern Ireland; NICOLA: Northern Ireland Cohort for the Longitudinal Study of Ageing; NIHRBIO_COPING: NIHR BioResource COVID-19 Psychiatry and Neurological Genetics Study; NSHD46: MRC National Survey of Health and Development Cohort; S: Scotland; SABRE: Southall and Brent Revisited; TEDS: The Twins Early Development Study; TRACKC19: TRACK-COVID Study; UKHLS: Understanding Society – the UK Household Longitudinal Study; UK-REACH: UK Research study into Ethnicity And COVID-19 outcomes in Healthcare workers; W: Wales

Managing the complexity of the participant sample

Dynamic denominator

The UK LLC denominator is the sum product of all participants whose data are provided by the contributing LPS. The denominator is highly complex, dynamic and can be assessed at multiple levels, where: (i) the sample provided by each LPS will change as new participants join or some withdraw/revoke consent; (ii) new LPS may join UK LLC and some LPS may withdraw; (iii) participants are known to take part in multiple LPS, and any new LPS joining UK LLC may result in changing numbers of multiple participants being identified as the same unique individual; and (iv) participants’ changing interactions with NHS and government services (e.g. within-UK migration) may result in their appearance in new datasets.

The UK LLC denominator is fixed on a quarterly basis (in line with LPS refreshes of File 1s and the linkages based on these). Each quarter, UK LLC establishes a ‘data freeze’ of the UK LLC sample, which is critical to interpreting the resource, and provisions data to approved researchers based on this headline denominator and appropriate participant permissions [14]. The data tables relating to each quarterly fix are retained for archiving and reproducibility purposes.

Multiple membership of LPS and relationships

UK LLC will develop a de-identified participant ‘register’ for the denominator to enable researchers to understand which participants are active in multiple LPS and the relationship between different participants, and also between participants and households. This is crucial because the joint analysis of data across multiple LPS will be conducted on a statistical assumption that the samples are independent. The overlap of participant samples may introduce consent/permission ambiguities where permissions to link to and use routine records across different LPS are set in contrasting ways. UK LLC will work with LPS to establish the most effective methodology to overcome these challenges and will provide guidance to researchers.

Data types

LPS data

LPS data cover most conceivable topics and include detailed information about subjects that are discussed openly in society through to those that are highly sensitive. These are collected directly by LPS from participants via face-to-face or remote interviews/assessments or via linkage to routine records or novel data sources (such as social media posts, personal sensors, images). LPS ensure the anonymity of all data provided to UK LLC, for example, by providing metrics that characterise the mood of a social media post (rather than providing the actual wording) and by putting time stamps into bands. LPS data in the UK LLC TRE include:

  • Broad ranging quantitative data on participants’ demographic, socio-economic and health status (physical and mental); family and life-course indicators; and information on diverse behaviours, aspirations and outcomes
  • Existing assayed biological information (e.g. blood group type, biomarkers such as serology)
  • Derived genetic, metabolomic, proteomic and epigenetic information
  • Specific COVID-19 data collections (questionnaires and assayed biological data)
  • Participant consent/opt-out status and history.

LPS data could extend to other classes of de-identified data, such as derived information from qualitative studies or image data (e.g. brain or organ MRIs, DEXA bone density scans, retina images) or wider smart data (e.g. sensor data from wearable devices).

Linked health data

It is optimal to: (i) extract individuals’ full life-course records to support longitudinal assessments and to build health care use pathways; (ii) for the extract to be refreshed on a timely basis to ensure an ongoing assessment of healthcare use and outcomes, and for new LPS survey data to be linked to data from an equivalent time period; and (iii) to have as complete LPS population coverage as possible. However, coverage is limited due to restrictions in data availability (e.g. limited temporal coverage) and we recognise that some vulnerable and marginalised groups are systematically under-represented [15] because of limitations in linked records, e.g. specific inclusion/exclusion criteria and due to governance reasons.

All data extracted under version 1.0 of the UK LLC protocol for COVID research purposes [3] (the ‘historical data’) will be retained because they are directly relevant for the new research purpose and will support ongoing COVID research.

The principles for the linkage mechanism to NHS records are the same for all four nation health authorities (see figure 2), with DHCW providing lists of participant identifiers to NHS England (for English NHS records), Public Health Scotland (for Scottish NHS records) and the SAIL Databank (for Welsh NHS records), respectively. NHS authorities apply opt-out, as appropriate. Updates are provided on a quarterly basis. It is UK LLC’s intention to develop governance approvals to allow linkage to Northern Irish NHS records as mechanisms to enable this are established by the devolved authorities. NHS data in the UK LLC TRE include:

  • Demographic data such as date of birth, sex and entry and exit to NHS services
  • Mortality and cancer registry records, and audits such as stroke, cardiac and intensive care
  • Primary care data
  • Secondary care data including hospital inpatients (includes intensive care and maternity), outpatients and accident and emergency records
  • Mental health data about people in contact with community mental health care services
  • Community services data including breastfeeding, nutrition, care event and screening
  • Medicines dispensed in the community
  • COVID-19 specific datasets, including testing, vaccination and outcomes.

Linked administrative data

UK LLC is an interdisciplinary database that supports research into individuals’ socio-economic outcomes and understanding of health/socio-economic interactions. UK LLC will link administrative data via the Office for National Statistics and in agreement with the source data owners. Updates will be provided on an annual basis. Data from the Department for Work and Pensions, HM Revenue and Customs, and Department for Education are anticipated to flow into the UK LLC TRE, including:

  • HM Revenue and Customs: Data items relating to employment payments; workplace pensions; employment cessation payments; nature and source of income; pensions; and share schemes
  • Department for Work and Pensions: Customer Information System; Benefits and Income Data; National Benefits Database; and Child Benefit Extract
  • Department for Education: National Pupil Database; and Pupil Level Annual School Census ages 4-18 years.

Linked place-based data

UK LLC has commissioned the University of Leicester and City St George’s, University of London to model a number of environmental exposure estimates and place-based data such as information about houses and neighbourhoods. These are based on taking existing environmental sensor readings and with other inputs (e.g. traffic count data, weather pattern data) used to model pollution and other environmental exposures and to map these to participants’ addresses. The file sent to UK LLC’s geo modeller contains only pooled address data at a 1:3 ratio of LPS participants’ addresses to masking addresses. Although still defined as personal data (all address data are considered to be personal data, whether in the public domain or not), this use does not breach participants’ confidentiality because nothing sensitive or confidential can be inferred from the list.

UK LLC also collates and processes place-based public domain data for inclusion in the UK LLC TRE. All these data are processed to ensure they are de-identified prior to ingest, with no geographies other than English region and devolved nation being available within the TRE. Environmental and place-based data will include:

  • Geospatial modelling and public domain datasets: these are assigned to a property, postcode or higher-level geography (e.g. a lower super output area or region)
  • Modelled environmental exposure estimates of pollution, climate data (e.g. temperature, rainfall, pollen) and noise
  • Modelled access to green and blue space and measures estimating the ‘walkability’ around a property or area
  • Information about the neighbourhood (e.g. building density, land use characteristics, deprivation indices, crime rates, provision of services and availability of ‘hazards’ such as fast food outlets or gambling shops)
  • Information about the property (e.g. building age, type, building floor, sale dates, value and energy performance records).

Discovery and access

Discovery

UK LLC is designed to be a discoverable and accessible resource for public good research. UK LLC is promoted through the contributing LPS to their research users, through data science networks, longitudinal study resources and funders. UK LLC Explore [16] is UK LLC’s discoverability and data selection portal that is populated with metadata using existing automated metadata extraction software and enriched through an API to the Catalogue of Mental Health Measures [17]. UK LLC has also developed a data documentation resource, the UK LLC Guidebook [18], which is enriched through an API to the HDR UK Innovation Gateway [19] and will be expanded and maintained in collaboration with data owners and researchers. There are plans to also pull information from CLOSER Discovery [20] and UK Data Archive [21]. These approaches avoid duplication, minimise LPS burden and promote interoperability and federation.

Applying to access the UK LLC TRE

Applications to access data in the UK LLC TRE are processed using a novel delegated and distributed approach that satisfies the needs of the contributing LPS, the third-party data owners and includes a review by members of the public. Researchers submit an expression of interest, where a proposed project is assessed for feasibility, and then work with UK LLC to develop their full application, which includes a list of datasets they would like to access. The application and data request are reviewed by each contributing LPS’s data access committee and the UK LLC Data Access Committee. If an application is approved, researchers will only have access to the datasets approved by the data access committees.

Applications are screened using the ‘Five Safes’ framework [10]. All researchers must hold valid Office for National Statistics Accredited Researcher status, be based in the UK and be employed by an organisation with sufficient capacity to support good governance in research (although future enhancements will be made to extend access internationally where LPS and other data owners permit this). All projects must be for public good, ethically sound and not for profit-making purposes. All applications are listed on the publicly accessible UK LLC Data Use Register [22]. Prior to access to the UK LLC TRE, all researchers must sign a UK LLC Data User Responsibilities Agreement and their research is controlled using a Data Access Agreement (a legal contract) between each researcher’s organisation and the University of Bristol.

Scientific programme

UK LLC is a generic research database with an explicit remit to support longitudinal public good research across the breadth of the sector, including supporting a broad range of both biomedical and social science research. As such, UK LLC is a FAIR resource that is open to applications from any legitimate UK-based researcher. Currently identified themes of research include investigating health and social inequalities; occupational health outcomes and assessment of outcomes in under-served groups; an ongoing COVID-19 programme; and methodological scientific programme, e.g. understanding LPS population coverage, bias and data quality, and relating to TRE ways of working such as data integration and harmonisation. These themes are non-exclusive and indicative of the wider potential of the resource.

Conclusion

UK LLC forms a novel national TRE for the longitudinal research community of globally unique depth and breadth that will enable UK-wide world-class research. For the first time, it enables researchers to interrogate pooled LPS participant data from many contributing LPS that are systematically linked to diverse routine records, enabling the study of rare exposures/outcomes, including in under-served groups. UK LLC’s bespoke governance framework enables equivalency of permissions across all contributing LPS, whilst maintaining the core aspects of the participant-LPS trust relationship.

UK LLC will expand LPS membership and increase the breadth of data, sample size and diversity to support the wider LPS community and attract increasing numbers of researchers across disciplines, UK government departments and industry. UK LLC will build new functionality to enable multi-omics analyses and replicate key derived data and data formatting from other LPS and resources (e.g. UK Biobank or whole-population databases) to enable efficient cross-TRE replication analyses. As researchers use the UK LLC TRE, they will help to create an ever-increasing knowledge base that will enable transparent reproducible research and iteratively improve ease of use and functionality.

UK LLC will continue to work with the public and LPS participants to ensure the design and operation of UK LLC is developed with public input and that all activities of UK LLC are transparent to the public.

Acknowledgements

We wish to recognise and thank all LPS participants and the LPS staff that are part of the UK LLC partnership. A full list of acknowledgements, including support for each LPS, is provided in the Supplementary Materials. We thank the National Health Service (NHS) and particularly NHS England for their work in curating LPS participants’ health records and for making these available for public good research designed to improve health services. In particular, we thank Garry Coleman for his input into the design of the UK LLC protocol and Mujiba Ejaz for her invaluable support and hard work on UK LLC governance. We also thank Helen Buckles, Oliver Smith, John Wigglesworth, Louise Dunn and Abigail Lucas for all their contributions. We thank the Ordnance Survey for providing AddressBase Plus. We thank the Administrative Data Research UK (ADR UK) and Office for National Statistics (ONS) teams for their contribution to developing non-health administrative linkages, in particular Emma Gordon (ADR UK), Emily Oliver (ADR UK), Rachel Huck (ONS), Roya Shahrokni (ONS), Jen Donald (ONS), Leah Quinn (ONS); Graham Knox and Mike Daly at the Department for Work and Pensions; Mark Barry, Angela Martindale, Nike Ogunlade, Tracy Holland, Richard Millington at HM Revenue and Customs; and David Burnett at the Department for Education. We thank Carol Morris (Public Health Scotland) and the much-missed Dermot O’Reilly (ADR Northern Ireland) for their help in understanding linkages in Scotland and Northern Ireland, respectively. We thank Vicki Bowles and Claire Ainley (VWV Ltd); Clare Smith, Henry Stuart and Adam Taylor (University of Bristol); and Cynthia McNerney, Rob Garlick and Sharon Heyes (Swansea University) for their contributions to developing the UK LLC governance framework. We thank the funders of UK longitudinal research for their guidance and contributions, particularly Mary De Silva (then Wellcome Trust), Joe McNamara and Catherine Moody (Medical Research Council) and Bridget Taylor and Rebecca Perring (Economic and Social Research Council). We thank all current and past members of the UK LLC Participant and Public Involvement and Engagement (PPIE) Programme who have played invaluable roles in shaping the design of our processes and with operational decision-making. We remember and thank Dolapo (Della) Ogunleye whose constructive challenges to improve diversity in UK LLC and longitudinal research remain with us and are an important influence on our future work. Finally, we thank Sir Patrick Vallance and all of those who worked to establish the National Core Studies (NCS) programme.

Funding

UK LLC is a UK Research and Innovation (UKRI) funded infrastructure with co-funding from the Medical Research Council (MR/X021556/1) and Economic and Social Research Council (ES/X000567/1). The initial funding which established UK LLC was provided by the UKRI-funded Longitudinal Health and Wellbeing National Core Study led by UCL and University of Bristol (MC_PC_20030 and MC_PC_20059). The protocol development was informed by ABoyd’s secondment to the Economic and Social Research Council (ES/S016732/1) to scope options for improving the inclusiveness of UK longitudinal research through increased use of population data; funding from the Wellcome Trust (221574/Z/20/Z) for a secretariat to co-ordinate the response to the COVID-19 pandemic; and funding for the ALSPAC birth cohort study which is core funded by the UK Medical Research Council, the Wellcome Trust and the University of Bristol (217065/Z/19/Z). The onward design, implementation and operations of UK LLC have been supported by staff from all the contributing LPS. Acknowledgements and funding for each contributing LPS are provided in the Supplementary Materials. UK LLC is a member of the DATAMIND consortium which is addressing infrastructure needs for mental health research (MRC: MR/W014386/1) and is also supported by National Institute for Health and Care Research (NIHR) funding to establish greater functionality to support occupational health research (NIHR: NIHR20671). UK LLC is also a member of the HDR UK Social and Environmental Determinants of Health driver programme, which is supporting enhanced geo-spatial linkage capabilities (HDRUK2023.0029). The views expressed are those of the authors and not necessarily those of any of the funders or the organisations contributing data into the UK LLC.

Conflicts of interest

J Danesh serves on scientific advisory boards for AstraZeneca, Novartis and UK Biobank, and has received multiple grants from academic, charitable and industry sources outside the submitted work. M Tobin, A Guyatt and C John have a funded research collaboration with Orion for collaborative research projects outside the submitted work. No other conflicts of interest were disclosed.

Ethics and informed consent

UK LLC has ethical approval from the Health Research Authority (HRA) Research Ethics Committee (Haydock Committee; ref: 20/NW/0446). Each contributing LPS has its own independent ethical basis and has established a legal and ethical basis for participant involvement in UK LLC. Participant objections and withdrawal are always upheld. UK LLC has approval from the HRA Confidentiality Advisory Group (ref: 21/CAG/0044) to support the flow of identifiers from LPS to NHS Digital Health and Care Wales where an LPS’ legal basis relies on section 251 support.

Author contributions

Authors ABoyd and RF led the development of the UK LLC concept, and the initial protocol was co-developed with AC, DF, DFG, JG, AH, JM, M-PG, CO, DP, ASanchez, CSteves, MT, NJT, ST, AWong. The concept was fully developed with substantial input from data managers and principal investigators across the Vanguard group of LPS including ABritten, AButterworth, CB, EB, NC, GB, LB, MB, SBoatman, SBristow, AC, JD, JD-M, KD, EDA, TCE, EF, AG, AHeard, DFG, M-PG, MH, CJ, FK, NK, JK, CL, GL, AM, BM, CMM, DM, MMumme, CN, KN, ZO, DP, GBP, JP, ASanchez, AScott, ASteptoe, CSteves, CSudlow, GS, MT, NJT, TT, LV, AWatmuff, AWong, MW, NWalker, NWareham, JW, TW, DY. These authors also implemented the LPS-side functionality of UK LLC. The UK LLC infrastructure was developed by the UK LLC team: KME, ELT, JO, KCC, RT, SM, MC, SBerman, RW and infrastructure collaborators: DF, CO, ST, KA, ALG, JG, AHansell, HL-G. The overarching design of the Longitudinal Health and Wellbeing National Core Study – which informed the shape of this protocol – was led by NC and JS with contributions from: ABoyd, GBP, CSteves, NJT. The UK LLC public contributors were involved in the design of key UK LLC functionality described in this protocol: SC, MG, RHarmston, RHill, SC-H, RK, MMcKenzie, SM, YR, DS, KW. ABoyd drafted this protocol, with authors KME and ELT conducting significant editing. All other authors contributed to the manuscript and agreed its final form. NC secured the initial National Core Studies funding which supported the establishment and ABoyd has led subsequent funding rounds. ABoyd is responsible for UK LLC and is the guarantor for this manuscript.

Data availability statement

Data in the UK LLC Trusted Research Environment (TRE) cannot be used or shared outside this environment. UK-based researchers who hold valid Office for National Statistics (ONS) Accredited Researcher status and are employed by an organisation that can support good governance in research can apply to access the UK LLC TRE (see the process outlined in the UK LLC Data Access and Acceptable Use Policy: https://ukllc.ac.uk/governance). Researchers can explore the data available in the UK LLC TRE using UK LLC’s discoverability and data selection portal (https://explore.ukllc.ac.uk/).

Abbreviations

DHCW NHS Digital Health and Care Wales
FAIR Findable, Accessible, Interoperable, Reproducible
GDPR General Data Protection Regulation
HM His Majesty’s
LPS Longitudinal Population Study
NHS National Health Service
PPIE Public and Participant Involvement and Engagement
SAIL Secure Anonymised Information Linkage
SeRP Secure eResearch Platform
TRE Trusted Research Environment
UK LLC UK Longitudinal Linkage Collaboration

References

  1. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3(1):160018. 10.1038/sdata.2016.18

    10.1038/sdata.2016.18
  2. University of Bristol & University of Edinburgh. UK Longitudinal Linkage Collaboration: The National Trusted Research Environment for Longitudinal Research (Research Protocol V2.0). Zenodo; 2023. 10.5281/zenodo.10868470

    10.5281/zenodo.10868470
  3. University of Bristol & University of Edinburgh. The Longitudinal Linkage Collaboration: a platform for longitudinal COVID-19 research (Research Protocol V1.0). Zenodo; 2020. 10.5281/zenodo.10868639

    10.5281/zenodo.10868639
  4. Burton PR, Murtagh MJ, Boyd A, Williams JB, Dove ES, Wallace SE, et al. Data Safe Havens in health research and healthcare. Bioinformatics. 2015;31(20):3241-8. 10.1093/bioinformatics/btv279

    10.1093/bioinformatics/btv279
  5. UK Health Data Research Alliance and NHSX. Building Trusted Research Environments - Principles and Best Practices; towards TRE ecosystems. 2021. 10.5281/zenodo.5767586

    10.5281/zenodo.5767586
  6. Department of Health & Social Care. Data saves lives: reshaping health and social care with data. Department of Health & Social Care; 2022. https://www.gov.uk/government/publications/data-saves-lives-reshaping-health-and-social-care-with-data.

  7. Office for National Statistics. About the Secure Research Service: ONS; 2024. https://www.ons.gov.uk/aboutus/whatwedo/statistics/requestingstatistics/secureresearchservice/aboutthesecureresearchservice.

  8. Harrison T. Putting the Trust in Trusted Research Environments London: Understanding Patient Data; 2020. https://web.archive.org/web/20220810164752/https:/understandingpatientdata.org.uk/news/putting-trust-trusted-research-environments.

  9. medConfidential. Analysis and Inputs Reporting: medConfidential; 2020. https://medconfidential.org/2020/.

  10. Desai T, Ritchie F, Welpton R. Five Safes: designing data access for research. Bristol: University of the West of England; 2016. https://uwe-repository.worktribe.com/output/914745

  11. ALSPAC. Our commitment to you 2024. https://www.bristol.ac.uk/alspac/participants/our-commitment-to-you/.

  12. TEDS. The TEDS Promise 2024. https://www.teds.ac.uk/participants/tedspromise.

  13. UK Reproducibility Network. The UK Reproducibility Network (UKRN) 2024. https://www.ukrn.org/.

  14. Berman S, Evans K, Thomas R, Crane M, McLachlan S, Whitehorn R, et al. Summary Profile of the UK LLC Resource: Data Freeze 1. Zenodo; 2022. 10.5281/zenodo.10890836

    10.5281/zenodo.10890836
  15. Boyd A. Understanding Population Data for Inclusive Longitudinal Research. Bristol, UK: University of Bristol; 2021. https://www.ukri.org/publications/understanding-population-data-for-inclusive-longitudinal-research/

  16. UKLLC. UK LLC Explore: UK LLC’s discoverability and data selection portal 2024. https://explore.ukllc.ac.uk/.

  17. KCL. Catalogue of Mental Health Measures 2024. https://www.cataloguementalhealth.ac.uk/.

  18. UK Longitudinal Linkage Collaboration. UK LLC’s linked data documentation and user guide 2024. https://guidebook.ukllc.ac.uk/.

  19. Health Data Research UK. HDR UK Innovation Gateway 2024. https://healthdatagateway.org/en.

  20. UCL. CLOSER Discovery 2024. https://discovery.closer.ac.uk/.

  21. UKDA. UK Data Archive 2024. https://www.data-archive.ac.uk/.

  22. UK Longitudinal Linkage Collaboration. UK Longitudinal Linkage Collaboration (UK LLC) 2024. https://ukllc.ac.uk/.

  23. Elliott P, Vergnaud AC, Singh D, Neasham D, Spear J, Heard A. The Airwave Health Monitoring Study of police officers and staff in Great Britain: rationale, design and methods. Environ Res. 2014;134:280-5. 10.1016/j.envres.2014.07.025

    10.1016/j.envres.2014.07.025
  24. Boyd A, Golding J, Macleod J, Lawlor DA, Fraser A, Henderson J, et al. Cohort Profile: the ’children of the 90s’–the index offspring of the Avon Longitudinal Study of Parents and Children. Int J Epidemiol. 2013;42(1):111-27. 10.1093/ije/dys064

    10.1093/ije/dys064
  25. Sullivan A, Brown M, Hamer M, Ploubidis GB. Cohort Profile Update: The 1970 British Cohort Study (BCS70). Int J Epidemiol. 2023;52(3):e179-e86. 10.1093/ije/dyac148

    10.1093/ije/dyac148
  26. Wright J, Small N, Raynor P, Tuffnell D, Bhopal R, Cameron N, et al. Cohort Profile: the Born in Bradford multi-ethnic family cohort study. Int J Epidemiol. 2013;42(4):978-91. 10.1093/ije/dys112

    10.1093/ije/dys112
  27. Steptoe A, Breeze E, Banks J, Nazroo J. Cohort profile: the English longitudinal study of ageing. Int J Epidemiol. 2013;42(6):1640-8. 10.1093/ije/dys168

    10.1093/ije/dys168
  28. Hayat SA, Luben R, Keevil VL, Moore S, Dalzell N, Bhaniani A, et al. Cohort Profile: A prospective cohort study of objective physical and cognitive capability and visual health in an ageing population of men and women in Norfolk (EPIC-Norfolk 3). Int J Epidemiol. 2013;43(4):1063-72. 10.1093/ije/dyt086

    10.1093/ije/dyt086
  29. John C, Reeve NF, Free RC, Williams AT, Ntalla I, Farmaki AE, et al. Cohort Profile: Extended Cohort for E-health, Environment and DNA (EXCEED). Int J Epidemiol. 2019;48(3):678-9j. 10.1093/ije/dyz073

    10.1093/ije/dyz073
  30. Lindsay T, Westgate K, Wijndaele K, Hollidge S, Kerrison N, Forouhi N, et al. Descriptive epidemiology of physical activity energy expenditure in UK adults (The Fenland study). IJBNPA. 2019;16:126. 10.1186/s12966-019-0882-6

    10.1186/s12966-019-0882-6
  31. Smith BH, Campbell A, Linksted P, Fitzpatrick B, Jackson C, Kerr SM, et al. Cohort Profile: Generation Scotland: Scottish Family Health Study (GS:SFHS). The study, its participants and their potential for genetic research on health and illness. Int J Epidemiol. 2013;42(3):689-700. 10.1093/ije/dys084

    10.1093/ije/dys084
  32. Davies MR, Kalsi G, Armour C, Jones IR, McIntosh AM, Smith DJ, et al. The Genetic Links to Anxiety and Depression (GLAD) Study: Online recruitment into the largest recontactable study of depression and anxiety. Behav Res Ther. 2019;123:103503. 10.1016/j.brat.2019.103503

    10.1016/j.brat.2019.103503
  33. Connelly R, Platt L. Cohort Profile: UK Millennium Cohort Study (MCS). Int J Epidemiol. 2014;43(6):1719-25. 10.1093/ije/dyu001

    10.1093/ije/dyu001
  34. Power C, Elliott J. Cohort profile: 1958 British birth cohort (National Child Development Study). Int J Epidemiol. 2005;35(1):34-41. 10.1093/ije/dyi183

    10.1093/ije/dyi183
  35. Next Steps. A guide to the re-deposit of sweeps 1 to 7 datasets. Institute of Education, UCL; 2020. https://doc.ukdataservice.ac.uk/doc/5545/mrdoc/pdf/next_steps_userguide_to_the_redeposit_of_sweeps_1to7_may2020.pdf

  36. Neville C, Burns F, Cruise S, Scott A, O’Reilly D, Kee F, Young I. Cohort Profile: The Northern Ireland Cohort for the Longitudinal Study of Ageing (NICOLA). Int J Epidemiol. 2023;52(4):e211-e21. 10.1093/ije/dyad026

    10.1093/ije/dyad026
  37. Young KA-O, Purves KL, Hübel C, Davies MR, Thompson KN, Bristow S, et al. Depression, anxiety and PTSD symptoms before and during the COVID-19 pandemic in the UK. Psychol Med. 2023;53(12):5428-41. 10.1017/S0033291722002501

    10.1017/S0033291722002501
  38. Kuh D, Pierce M, Adams J, Deanfield J, Ekelund U, Friberg P, et al. Cohort profile: updating the cohort profile for the MRC National Survey of Health and Development: a new clinic-based data collection for ageing research. Int J Epidemiol. 2011;40(1):e1-9. 10.1093/ije/dyq231

    10.1093/ije/dyq231
  39. Jones S, Tillin T, Park C, Williams S, Rapala A, Al Saikhan L, et al. Cohort Profile Update: Southall and Brent Revisited (SABRE) study: a UK population-based comparison of cardiovascular disease and diabetes in people of European, South Asian and African Caribbean heritage. Int J Epidemiol. 2020;49(5):1441-2e. 10.1093/ije/dyaa135

    10.1093/ije/dyaa135
  40. Lockhart C, Bright J, Ahmadzadeh Y, Breen G, Bristow S, Boyd A, et al. Twins Early Development Study (TEDS): A genetically sensitive investigation of mental health outcomes in the mid-twenties. JCPP Adv. 2023;3(2):e12154. 10.1002/jcv2.12154

    10.1002/jcv2.12154
  41. Di Angelantonio E, Thompson SG, Kaptoge S, Moore C, Walker M, Armitage J, et al. Efficiency and safety of varying the frequency of whole blood donation (INTERVAL): a randomised trial of 45 000 donors. Lancet. 2017;390(10110):2360-71. 10.1016/s0140-6736(17)31928-1

    10.1016/s0140-6736(17)31928-1
  42. Bell SA-O, Sweeting MA-O, Ramond AA-O, Chung R, Kaptoge S, Walker M, et al. Comparison of four methods to measure haemoglobin concentrations in whole blood donors (COMPARE): A diagnostic accuracy study. Transfus Med. 2021;31:94-103. 10.1111/tme.12750

    10.1111/tme.12750
  43. McMahon A, Kaptoge S, Walker M, Mehenny S, Gilchrist PT, Sambrook J, et al. Evaluation of interventions to prevent vasovagal reactions among whole blood donors: rationale and design of a large cluster randomised trial. Trials. 2023;24(1):512. 10.1186/s13063-023-07473-z

    10.1186/s13063-023-07473-z
  44. Verdi S, Abbasian G, Bowyer RCE, Lachance G, Yarand D, Christofidou P, et al. TwinsUK: The UK Adult Twin Registry Update. Twin Res Hum Genet. 2019;22(6):523-9. 10.1017/thg.2019.65

    10.1017/thg.2019.65
  45. Buck N, McFall S. Understanding Society: design overview. Longitudinal and Life Course Studies. 2012;3(1):5-17. 10.14301/llcs.v3i1.159

    10.14301/llcs.v3i1.159
  46. Bryant L, Free RC, Woolf K, Melbourne C, Guyatt AL, John C, et al. Cohort Profile: The United Kingdom Research study into Ethnicity and COVID-19 outcomes in Healthcare workers (UK-REACH). Int J Epidemiol. 2022;52(1):e38-e45. 10.1093/ije/dyac171

    10.1093/ije/dyac171

Article Details

How to Cite
Boyd, A., Evans, K., Turner, E., Flaig, R., Oakley, J., Campbell, K., Thomas, R., McLachlan, S., Crane, M. and et al. (2025) “UK Longitudinal Linkage Collaboration (UK LLC): The National Trusted Research Environment for Longitudinal Research”, International Journal of Population Data Science, 10(1). doi: 10.23889/ijpds.v10i1.2468.

Most read articles by the same author(s)

1 2 3 4 > >>