Including household effects in Big Data research: the experience of building a longitudinal residence algorithm using linked administrative data in Wales

Main Article Content

Karen Susan Tingay Matthew Roberts Charles B.A. Musselwhite
Published online: Nov 20, 2018


The effect of the wider social-environment on physical and emotional health has long been an area of study. Extrapolating the impact of the individual's immediate environment, such as living with a smoker or caring for a chronically-ill child, would potentially reduce confounding effects in health-related research. Surveys, including the UK Census, are beginning to collect data on household composition. However, these surveys are expensive, time consuming, and, as such, are only completed by a subsection of the population. Large-scale, linked databanks, such as the SAIL Databank at Swansea University, which hold routinely collected secondary use clinical and administrative datasets, are broader in scope, both in terms of the nature of the data held, and the population. The SAIL databank includes demographic data and a geographic indicator that makes it possible to identify groups of people that share accommodation, and in some cases the familial relationships among them. This paper describes a method for creating households, including considerations for how that information can be securely shared for research purposes. This approach has broad implications in Wales and beyond, opening up possibilities for more detailed population-level research that includes consideration of residential social interactions.


Article Details