A population level study of SARS-CoV-2 prevalence amongst people experiencing homelessness in Wales, UK

Abstract Introduction Prior research into the prevalence of SARS-CoV-2 infection amongst people experiencing homelessness (PEH) largely relates to people in communal forms of temporary accommodation in contexts where this type of accommodation remained a major part of the response to homelessness during the COVID-19 pandemic. Little is known about the prevalence of SARS-CoV-2 amongst PEH more broadly, and in a policy and practice context that favoured self-contained accommodation, such as Wales, UK. Objective Describe the prevalence of SARS-CoV-2 amongst PEH in Wales, UK, using routinely collected administrative data from the Secure Anonymised Information Linkage Databank. Methods Routinely collected data were used to identify PEH in Wales between 1st March 2020 and 1st March 2021. Using SARS-CoV-2 pathology testing data, prevalence rates were generated for PEH and three comparator groups: (1) the not-homeless population; (2) a cohort ‘exact matched’ for age, sex, local authority and area deprivation; and (3) a matched comparison group created using these same variables and Propensity Score Matching (PSM). Three logistic regressions were run on samples containing each of the comparator groups to explore the effect of experiencing homelessness on testing positive for SARS-CoV-2. Results The prevalence of SARS-CoV-2 infection amongst PEH was 5.0%, compared to the not-homeless population at 5.6%. For the exact matched and PSM match comparator groups, prevalence was 6.9% and 6.7%, respectively. Logistic regression found that SARS-CoV-2 infection was 0.9 times less likely amongst PEH compared to people not experiencing homelessness from the general population. The odds of SARS-CoV-2 infection for PEH was 0.75 and 0.73 where the ‘not-homeless’ comparators were from the exact match and PSM samples, respectively. Conclusion Our analysis revealed that a year into the COVID-19 pandemic, the prevalence of SARS-CoV-2 amongst PEH in Wales was lower than the general population. A policy response to homelessness that moved away from communal accommodation may be partly responsible for the reduced SAR-CoV-2 infection amongst PEH.


Introduction
The onset of the Coronavirus (COVID-19) disease pandemic, caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) virus, prompted widespread concern about the potential impact of the virus on people experiencing homelessness (PEH) [1]. Concerns centred upon people who were literally roofless and those in communal forms of accommodation, such as shelters and hostels, where facilities and air space were shared. It was feared these environments could hamper a person's ability to adhere to public health instructions regarding hand hygiene, maintaining physical distancing, and isolation when symptomatic or following a positive SARS-CoV-2 test [1,2]. The COVID-19 pandemic triggered unprecedented policy and practice responses, with homelessness being reframed as an urgent public health crisis for the very first time [3].
Several studies have begun to shed light on the different approaches taken in assisting PEH over the course of the pandemic [4,5]. Three types of response have been identified internationally [5]: (1) continued use and expansion of shelters, whilst introducing protective measures such as physical distancing and Personal Protective Equipment, (2) continued use of shelters, supplemented by a broadened accommodation offer, including self-contained or singleroom accommodation, and (3) widespread commissioning of self-contained or singleroom provision and the closure of communal forms of accommodation.
The focus of this article is Wales, a UK nation where the response to PEH during the pandemic followed the third of these response types. We aim to describe the prevalence of SARS-CoV-2 amongst PEH in Wales and consider the extent to which the response to homelessness may have led to the prevalence rates identified.

Background and study rationale
Homelessness can be conceptualised as a lack of housing which is habitable, secure, and private [6]. People experiencing homelessness (PEH) can include those living on the street ('roofless'), people living in temporary or crisis accommodation, such as hostels and shelters ('houseless'), and people living in insecure accommodation, for example 'sofa surfing', or severely inadequate accommodation ('housing excluded'). The extant literature on SARS-CoV-2 prevalence amongst PEH largely supports the initial concerns around high rates of infection amongst this group of people. A systematic review of 37 mainly shelter-based studies of SARS-CoV-2 infection amongst PEH found that prevalence peaked in the context of shelter outbreaks; 31.5% compared to 2.3% without a local outbreak [7]. The same systematic review was also able to generate pooled estimates for SAR-CoV-2 prevalence amongst staff working in shelters, which were 14.8% and 1.5% in the context of an outbreak and no outbreak, respectively. Several studies, mainly from outside the US, have sought to compare SAR-CoV-2 prevalence amongst PEH using shelters to other groups of PEH, and the general 'not-homeless' population.
In Atlanta, Georgia, testing for SARS-CoV-2 infection amongst people in shelters, those who were homeless and unsheltered (roofless), and staff members at shelters during April-May 2020, found prevalence rates of 2.1%, 0.5%, and 1.8%, respectively [8]. A study using random sample SARS-CoV-2 antibody testing in different forms of emergency/crisis service in Paris found prevalence rates of 27.8% at foodbanks, 50.5% of those in emergency shelters, and 88.7% in workers' residences-being temporary housing for migrants [9]. In a second European study, in Denmark, a cross-sectional nationwide examination of SARS-CoV-2 antibodies was conducted amongst people in shelters, shelter workers, and the general population in November 2020 [10]. The prevalence of antibodies was similar among homeless people in shelters (6.8%) and those working in shelters (6.3%), whilst prevalence rates of both groups were significantly higher than the general population (2.9%). Finally, in Ontario, Canada, populationbased analysis using administrative data found that between January and July 2020, people with recent experiences of homelessness had higher rates of testing positive for SARS-CoV-2 than 'community-dwelling' people, 2.01 per 100 personyears compared to 0.39 per 100 person-years, respectively [11].
Current information on SARS-CoV-2 prevalence amongst PEH is based on a narrow definition of homelessness, largely drawing on houseless people in shelters and hostels. The high rates of infection found amongst PEH is therefore unsurprising given that the extant evidence base relates to PEH in high-risk communal accommodation, where there are large volumes of people in close contact, and a high degree of churn as clients enter and exit accommodation. However, as indicated previously, homelessness is more than people staying in shelters and hostels. Our study therefore sought to contribute to understanding SARS-CoV-2 prevalence amongst a wider population of PEH than those who are houseless staying in shelters and hostels.
Additionally, the literature on SARS-CoV-2 infection amongst PEH originates from countries and regions that continued to use communal forms of accommodation as a response to homelessness during the COVID-19 pandemicresponse types 1 and 2-such as the United States. In the UK however, where self-contained and single-room provision constituted the primary response to homelessness during the pandemic, there is little published information on the prevalence of SARS-CoV-2 amongst PEH. One early UK study produced a prevalence estimate of 4.1% amongst PEH in England in May 2020 [12]. However, this estimate was based on modelling of possible infection rates amongst PEH in hostels, shelters, and living on the streets, rather than observational data. Therefore, the second contribution of this paper is to explore SARS-CoV-2 prevalence in a very different COVID-19 homelessness policy and practice setting to the extant literature, which we now discuss to situate our analysis.

Policy and practice context relevant to the study
The Housing (Wales) Act 2014 places a duty on local authorities to take reasonable steps to assist all households at-risk of homelessness, or already homeless. Whilst this legislation results in positive outcomes for many people, there is ultimately no duty to provide temporary or settled accommodation unless a household is in 'priority need' for assistance-a legal category that primarily includes households with children [13,14]. The COVID-19 homelessness response in Wales resulted in important changes to this legislative and policy context, consisting of two main phases.
Beginning in March 2020, the first phase of Wales' COVID-19 response included £10m of support and clear guidance from the Welsh Government, directing local authorities to ensure everyone was suitably accommodated -this included statutory guidance that extended priority need status to people sleeping rough. The local authority response primarily focused on sourcing additional temporary accommodation, including hotels, bed and breakfasts, holiday lets, university accommodation, and social housing. To a lesser extent, some existing temporary accommodation, such as night shelters, were decommissioned or adapted so that there were no shared rooms. A further £40m was made available by the Welsh Government in June 2020 to support the second phase recovery response. Crucially, guidance that accompanied this phase stated that short term accommodation must meet minimum expectations and it was no longer acceptable to offer emergency floor space, to provide tents, or to use sleeping 'pods'-portable self-contained cabins [15, pp. 9].
For people who were roofless, such as those living on the streets, changes to priority need meant that they now had access to emergency accommodation for which they may not previously have been eligible. Guidance provided to local authorities also indicated that those in insecure accommodation, such as sofa surfers or those leaving prisons, who may become roofless, were also considered vulnerable and eligible for temporary accommodation [16]. For people already in temporary accommodation, the closure of communal settings and the sourcing of self-contained accommodation meant that they now had access to improved housing that reduced household mixing and facilitated self-isolation. In summary, the changes instigated by Welsh Government during the pandemic were extensive and had far-reaching impacts on all segments of PEH in Wales. This paper provides the first empirical insights into the potential impacts of these changes on SARS-CoV-2 infection.

Methods
This study used administrative data sets made available via the Secure Anonymised Information Linkage (SAIL) databank. Data sets included: Emergency Department Data Set (EDDS); Patient Episode Database for Wales (PEDW); Primary Care (GP) data; Substance Misuse Data Set (SMDS); the Welsh Demographic Service (WDS); and SARS-CoV-2 pathology test results. GP data represent information from interactions with primary care providers in Wales, whilst the EDDS and PEDW reflect secondary healthcare interactions. The SMDS data relate to interactions with Welsh Governmentfunded substance use services in Wales. The WDS contains information on people registered with a General Practitioner in Wales.
All data sets were linked using a unique identifier for each person in Wales, the Anonymised Linkage Field (ALF), which is assigned to all data within SAIL [17]. ALFs are assigned either deterministically, based on NHS Wales numbers, or probabilistically based on personal identifiers, such as name, date of birth, postcode. We only retained records that had an ALF-match accuracy of greater than 90%, or where ALFs were matched deterministically.
The study period of interest was the 1 st March 2020 up to and including 1 st March 2021. Several processes were involved in preparing data prior to analysis, including: the creation of a population spine for Wales to provide a stable cohort of people for later analysis; linking to SARS-CoV-2 pathology testing data as the main outcome of interest; flagging people within the population spine who had experienced homelessness; and creating a series of 'not-homeless' comparison groups. Each of these processes is now discussed.

Creating a resident population spine
We used the WDS to create a population spine, within which PEH were flagged, and which was used to generate 'nothomeless' comparator groups for analysis. The WDS contains a unique residential identifier, known as the Residential Anonymised Linkage Field (RALF), alongside information about when a person registered and de-registered as living at that RALF, and a date of death. The cohort used in this analysis included all people who were alive and had an entry on the WDS which covered the 1 st March 2020.

Main outcome measure
Having a positive test for SARS-CoV-2 was flagged in the resident population spine using both Polymerase Chain Reaction (PCR) and antibody serology test results. We included all positive test results where a sample was taken between 1 st March 2020 up to and including 1 st March 2021. As such, our analysis covers infection from the start of the SARS-CoV-2 outbreak-with the first recorded case in Wales being 28 th February 2020 [18]-up to and including both the initial and second waves-which peaked in April 2020 and December 2020, respectively.
Though Lateral Flow Device (LFD) testing data are available within SAIL for linkage studies, we chose not to use LFD data. Meta-analysis of LFD testing has found that it has poor sensitivity in detecting SARS-CoV-2 infection [19], and potentially provides high false positive rates when community cases rates are low [20]. Furthermore, as SARS-CoV-2 testing policy in Wales is that a positive lateral flow test be followed by a confirmatory PCR test, then people with SARS-CoV-2 identified through positive LFD tests would, in most cases, be recorded under the more reliable PCR testing process [21].
From the original SARS-CoV-2 pathology testing data sets, 145,539 (5%) records were excluded from analysis due to either a lack of a linkage field (ALF), or because the linkage quality was low (fuzzy matching rate <90%). Most records (99.96%) were removed because there was no match. Where a person had multiple tests during the period of interest, if any of these tests were positive, then they were flagged as having had SARS-CoV-2.

Identifying people experiencing homelessness (PEH)
There is a lack of national individual-level homelessness data in Wales which was problematic given the study's aim of exploring the prevalence of SARS-CoV-2 amongst PEH. PEH in Wales were identified using non-housing related administrative data. We initially identified GP data and SMDS data as sources of information on PEH through exploration and analysis using data assets in the SAIL Databank [22]. We then cross-referenced and extended these initial data sources and codes, drawing on published peer reviewed studies from Canada, England, and Wales which use administrative data to identify PEH [23][24][25]. Based on this literature, we added further codes for extracting PEH from the GP data, along with additional sources (EDDS and PEDW). As our analysis represented a rapid response to COVID-19, we were unable to undertake further validation of data sources and codes beyond using those identified by studies that had undergone a peer review process. Table 1 provides an overview of variables and codes within each data set that relate to experiences of homelessness ('homelessness event codes'), and the initial number of people identified as having experienced homelessness. Given the range of homelessness event codes, the population of PEH included in this analysis were people without housing living on the streets ('roofless'), people living in hostels, shelters, and other temporary accommodation such as bed and breakfasts ('houseless'), and people in insecure housing situations, such as 'sofa surfers' [6]. Though event codes related to inadequate housing were included, this accounted for a relatively small proportion of PEH. The population of PEH included in our analysis are therefore in line with international definitions of extreme homelessness [26].
Experiences of homelessness were identified in the health care data through its recording within diagnosis fields, with each health data set using a different coding system. PEDW uses the International Classification of Diseases 10th Revision which includes a series of codes related to factors influencing health status and contact with health services. General Practitioners use 'read codes' to record all aspects of interactions with patients, which can include social elements relevant to a diagnosis. Emergency departments use a specialised system of coding for diagnoses which allows up to six diagnosis codes per entry, with any homelessness related codes entered being eligible for inclusion. In the SMDS, people in receipt of support from a government-funded substance use service were explicitly asked their housing status on two occasions: upon initial assessment and during Treatment Outcomes Profiles conducted semi-regularly during receipt of treatment [27]. We drew on both these measures of homelessness from the SMDS data; this differs slightly to other uses of the SMDS to identify PEH in Wales [25] which drew on housing status from the initial assessment alone.
Several of the event codes, such as 'Z590 -Homelessness' in the EDDS, relate to a general state of homelessness without the ability to differentiate roofless, houseless, or housing excluded. However, most codes identify specific forms of homelessness, such as '13FL. -Living rough' in the GP data, or, as in the case of the SMDS, contained additional information for use by the clinician when coding the housing situation. In the SMDS, examples of situations that related to an 'urgent housing problem' were living on the streets, using hostel accommodation or sleeping in different accommodation each night; whilst for 'housing problems', examples provided to the clinician covered housing situations that were insecure, such as staying with others as a short-term guest, accessing temporary accommodation for short-term stays, or squatting [27].
For the purposes of this analysis, we retained all homelessness events that were recorded during the study period. As homelessness was recorded on a (health) event basis, rather than having a start and end date to the person's homelessness, we were not able to assess whether the person was experiencing homelessness at the time of infection with SARS-CoV-2. Our analysis therefore relates to people who had recently experienced homelessness, though we refer to them as people experiencing homelessness, or PEH, for ease. PEH were flagged in the resident population spine if they were recorded as having any homelessness event codes during the study period. There were an initial 7,006 event codes that related to homelessness, belonging to 4,049 unique individuals. Consistent with similar uses of these data sets to identify PEH in Wales [22,25], the SMDS contributed the greatest number of potential homeless people to the analysis, with 73% of the initial number of unique individuals (n = 2,943) being present in the SMDS.

Creating the 'not-homeless' comparator groups
An initial comparison group included all people in the resident population spine who were not flagged as having experienced homelessness. However, consistent with what is known about homeless populations, there were significant differences between the homeless and not-homeless groups in terms of age, gender, and socio-economic status-discussed in detail in the findings section. We therefore adopted further methods of creating a comparison group that had similar characteristics as the homeless group. However, there were a limited number of characteristics that may have been important in positivity rates and were available in administrative data.
There is evidence to suggest that SARS-CoV-2 infection is higher amongst men, those from older age groups, and those living in more deprived areas [28][29][30][31]. Specific to Wales, there were also regional disparities in SARS-CoV-2 infection due to localised outbreaks which may have driven increases in positivity rates and increased testing through community testing drives. Though there is also consistent evidence for differences in SARS-CoV-2 infection and mortality by ethnicity [31], ethnicity data were not consistently available across all of the data sets provided and was therefore not included.
An initial matched comparator group retained people in the not-homeless population with the exact same characteristics as the homeless group, based on age, gender, local authority of residence and 2019 Welsh Index of Multiple Deprivation (WIMD) quartile. This group are referred to as the 'exact matched' comparator. We then adopted Propensity Score Matching (PSM) to create our final comparator group. This approach used a logistic regression predicting whether a person had experienced homelessness to generate propensity scores as an indicator of similarity between people. The pre-specified 'caliper' was set at 0.1, which is the maximum tolerated difference in propensity scores accepted as matching. PSM matching was based on the same variables used in the exact matching process. Furthermore, given there were many possible matches to choose from in the not-homeless population, we extracted five people from the general not-homeless population for every homeless person. The r package MatchIt [32] was used to undertake PSM.
To avoid matching to people who were likely to be of similar socio-economic characteristics to PEH by virtue of living at the same residence, but whose homelessness status could not be assumed, we excluded people living in the same RALF as a PEH on the 1 st March 2020.

Analysis
We initially report the basic characteristics of the homeless and not-homeless comparison groups in terms of age (years), gender, local authority of residence on the 1 st March 2020, 2019 WIMD quartile for the Lower Supper Output Area of their residence, and proportion of people who had been tested for SARS-CoV-2 at least once during the study period. Percentages and means, with standard deviations, are provided where appropriate. Standardised differences are reported to provide an indication of the difference in distribution of data between the homeless and each not-homeless comparator group separately [33]. Standardised differences with a value of 0.1 or more are taken to indicate imbalance between the homeless and not-homeless comparator [34].
Prevalence rates for testing positive for SARS-CoV-2 were calculated as the total number of people having a positive PCR or antibody test from 1 st March 2020 up to and including 1 st March 2021, divided by the total number of people within each group. Confidence intervals for point estimates of prevalence were calculated using the Agresti-Coull method, as appropriate for binomial data and larger sample sizes [35]. SARS-CoV-2 prevalence was calculated for the homeless, three not-homeless comparator groups, and the total resident population within the population spine.
Logistic regressions were run to explore the effects of having recently experienced homelessness on the probability of having a positive SARS-CoV-2 test. Three regressions were run including the different comparison groups-population, exact matched, and PSM. We adjusted for age, gender, local authority of residence and 2019 WIMD quartile to control for any remaining confounding from the matching variables. The odds ratios, 95% confidence intervals, and indicators of significance levels are provided for the characteristics included in the regressions-though the main interest is in the odds ratios for experiencing homelessness. As the regressions using the population comparator and the exact match comparator are 'population level', consideration of probability values is of lesser importance-the odds ratios reflect actual outcomes of the population rather than samples intended to represent a population.

Results
Of the initial 4,049 PEH during the study period, 3,153 had an entry in the population spine. Table 2 summarises the characteristics of the homeless and three not-homeless comparator groups. The homeless group were similar to PEH previously identified using administrative data sources [22][23][24][25], whilst conforming to known profiles of PEH from surveys conducted in the UK [36]. PEH were disproportionately male (67.9%), in their 30s to 40s (mean age of 38.5 years old ±14.0 years), and more concentrated in urban than rural local authorities-with Cardiff, Swansea and Newport representing the major urban authorities in Wales. There was a large imbalance (Std. diff >0.1) between the homeless and not-homeless population comparator in terms of area-level deprivation, as measured by the 2019 WIMD quartile. Homeless people were almost twice as likely to come from the most deprived areas than the nothomeless population (49.6% compared to 25.6%). Subsequent matching exercises reduced the imbalance between the homeless and not-homeless groups, as indicated by the reduction in the standardised differences for each of the main variables of interest in both the exact match and PSM cohorts.
For the total resident population in the population spine, the prevalence of SARS-CoV-2 was 5.6%, with 167,540 people testing positive by 1 st March 2021 (Table 3). Published statistics, based on PCR testing data only, indicate that 193,415 people resident in a health board in Wales had tested positive for SARS-CoV-2 between 1 st March 2020 and 1 st March 2021 [37]. This difference in testing numbers between our cohort and published figures may largely be due to the removal of records within the SARS-CoV-2 testing data during data processing-we discuss the implications of this in the strengths and limitations section.
Having recently experienced homelessness was found to be associated with a lower odds of testing positive for SARS-CoV-2 by 1 st March 2021 in all three logistic regression models (Table 4). In the model based on population level data, having a homeless event during the study period was associated with a roughly 10% reduced odds of having a positive test result This implies that people who were homeless showed an even lower odds of testing positive compared to people of a similar age, sex, area deprivation, and geographic breakdown as the homeless group-despite the higher incidence of testing amongst PEH.

Discussion
Our study found that the prevalence of SARS-CoV-2 amongst PEH was lower than that for the not-homeless population, 5.0% compared to 5.6%, along with a reduced risk of testing positive (O.R. 0.90). This finding runs counter to other studies that compared prevalence rates to the general population, which found that PEH were more likely to have a positive SARS-CoV-2 test result [10,11]. Our contrasting findings may be attributable to the PEH in our analysis which potentially encompassed people who were roofless and in insecure housing situations. This definition is much broader than houseless PEH using shelters which dominates the literature. The inclusion of different groups of PEH, who may experience differing levels of risk of exposure to SARS-CoV-2 than those in shelters, may have diluted the prevalence rate, compared to if we had focused only on houseless PEH. The much lower use of communal forms of accommodation in Wales during the COVID-19 pandemic compared to other countries may also have contributed to a lower prevalence of SARS-CoV-2 amongst PEH in Wales.
There is a large body of evidence highlighting how communal forms of homelessness accommodation can lead to transmission of communicable diseases such as H1N1/swine flu [2], MRSA [38] and tuberculosis [39]. Furthermore, a recent systematic review concluded that use of accommodation that satisfies basic needs such as a bed and food-as is the case with shelters-can do more harm than no intervention at all [40]. The Welsh policy response centring on the widespread commissioning of self-contained or single-room provision and the closure of communal forms of accommodation, may have had a protective effect in limiting the spread of SARS-CoV-2. As the majority of the PEH in our study were identified in the SMDS, and were therefore largely roofless, houseless, or in insecure accommodation, then we have a greater level of certainty that they would be beneficiaries of these policy changes. However, the role of communal accommodation alone is unlikely to explain lower prevalence rates, and does not explain the increase in this difference when comparing to nothomeless comparators with similar characteristics-the exact match and PSM groups. Here we offer two hypotheses that  relate to the isolation of homeless people and homelessness as a lack of privacy and control. PEH can experience diminished social networks, largely the result of withdrawing or being pushed away from friends and family members [41][42][43][44]. This isolation may have acted to reduce the number of social contacts and interactions amongst PEH, thereby slowing the transmission of SARS-CoV-2 amongst this group to a greater extent than people who were not-homeless. In addition to reduced social connections, homelessness can be conceptualised in part as a lack of privacy and choice to engage in social relations and for people to conduct their lives as they see fit [6]. PEH are often subject to a higher degree of surveillance and control over their private lives than not-homeless people, by virtue of having to interact with systems to meet their most basic needs such as shelter and food [45]. During the COVID-19 pandemic, this control manifested in a concerted effort by local authorities and third sector organisations to enforce and encourage PEH to conform to physical distancing requirements [5]. This may have led to a greater (forced) adherence to COVID-19 guidance amongst PEH, when compared to the not-homeless population, to continue to be permitted access to accommodation and basic necessities.

Study benefits and limitations
The use of non-housing data has meant that a national cohort of PEH could be identified, thereby compensating for the lack of a national individual level homelessness data collection in Wales. However, there are limitations to our analysis arising from the construction of the cohort of PEH. Homelessness was identified through data related to the recording of events, rather than periods of homelessness. Some of the people identified as PEH may therefore have been housed at the time they were infected. Use of data from health and substance use services to identify PEH may also mean that the homeless population described in this analysis have complex (health) needs and experiences. PEH identified through health data may have been accessing these services for SARS-CoV-2 related reasons. Furthermore, as the majority of PEH were identified in the SMDS, they presumably also had some level of substance use issue. Combined, these factors may have led to an inflation of prevalence rates amongst the PEH included in this study compared to PEH identified through housing services data, for example.
As the majority of PEH were present in the SMDS, PEH in our analysis were therefore likely to be predominantly people who were at least initially roofless, houseless, and 'sofa surfers' living in insecure accommodation-as these are the groups defined as homeless/experiencing housing issues in the SMDS [26]. Though not as broad a definition of homelessness as possible [6,26], the PEH in our study combine several groups not included in predominantly shelter-based studies of SARS-CoV-2 prevalence amongst homeless populations. Due to the generalised nature of some of the homelessness event codes, such as 'Z590 -Homelessness', we were unable to drill down into different sub-groups of PEH, who may have had different prevalence rates.
The construction of the population spine using GP registration data meant that people not registered with a GP on 1 st March 2020 were excluded from our analysis. This had the effect of reducing the number of PEH from 4,049, to 3,153 people. Missingness from the population spine may be because people have moved into Wales after the 1 st March 2020, or they may have moved into Wales prior to 1 st March 2020 and not de-registered from their GP elsewhere in the UK, or their GP may have cancelled their registration-meaning that the person has no entry for the snapshot date [46]. Alternatively, not being registered with a GP may be a sign of barriers to accessing public services. Given the range of reasons for missingness, we cannot adequately hypothesise whether presence/absence of people in the population spine is associated with risk of SARS-CoV-2 infection, or the potential direction of any bias from this missingness.
Though we have been able to provide an estimate for the prevalence of SARS-CoV-2 amongst PEH, our outcome measure is based on administrative testing data and therefore largely reflects testing practices. In the initial peak in SARS-CoV-2 infection in the UK, testing focused on symptomatic people who were hospitalised, and key health and social care workers, with increasing community testing in response to local outbreaks in later phases of the pandemic. There will therefore be an under-estimation of the 'true' prevalence of infection from earlier in the pandemic where people were symptomatic but not hospitalised. However, the prevalence of SARS-CoV-2 infection will be lower than the actual rates as administrative data, by definition, will not contain asymptomatic-untested people.
Related to national testing practices, previous research has found that PEH can face barriers to accessing healthcare, for example due to the need for an address when accessing primary care [47]. These barriers may have led to reduced access to SARS-CoV-2 testing amongst PEH, and as a result potentially lower prevalence of SARS-CoV-2 than might have been seen without barriers to services-or at least equal barriers to those faced by the housed population. However, we observed that PEH in our study had higher rates of testing, and lower SARS-CoV-2 prevalence. The fact that our population of PEH were identified through healthcare and substance use services may have biased our estimate towards people able to access services, and therefore who were more able to access SARS-CoV-2 testing.
A final limitation comes from the quality of linkage for the SARS-CoV-2 pathology testing data, where 145,539 testing records were unavailable for linkage, of which 13,532 (9%) were positive for SARS-CoV-2 infection. The reduction in testing records potentially led to a reduced ability to detect SARS-CoV-2 infection. Unfortunately, removing these records was unavoidable where there was no ALF assigned-as linkage was not possible-and was necessary where the ALF match rate was low (<90%) as retaining these records may have led to increased false matches. As this missingness is likely to affect other studies using SARS-CoV-2 testing data with hard to reach (small) populations, then further exploration is required to understand the exact nature of the lack of ALF matches.

Conclusion
This study provides new insights and adds to a nascent international evidence base on the prevalence of SARS-CoV-2 amongst PEH in the UK. Its findings have important policy implications. Globally, the message is clear: the avoidance of communal living spaces will help reduce transmission of SARS-CoV-2 and its possible future variants, and help control other infectious disease outbreaks. Taken together with evidence on the wider harms of such accommodation, a rapid shift away from the use of hostels and shelters in our response to homelessness is called for. In Wales there is a commitment not to return to such provision and it will be important to monitor the impacts of such an important policy shift.