Household Matching for the 2021 Census
Main Article Content
Abstract
Matching households between the UK census and census coverage survey is an essential requisite to estimating census overcount and undercount. Census quality requirements are extremely high. In 2021, we will aim to produce outputs within a year of census day (previously this has been within 16 months). Clerical searching is very time-consuming, and so to meet these challenging timelines, we need to increase the automatic match-rate.
Matching is done at the household as well as person level. In 2011, automatic household linkage primarily used a derived ‘head of household’ alongside other household variables but issues with assigning the same HOH to a household on the census and CCS reduced the effectiveness of the method. To combat this, we have designed a method that combines the variables from the household itself with sets of individual person data and runs deterministic match-keys to match households. Following this, associative matching was applied to find some of the remaining households. The method used the households of matched people to produce candidate household matches for clerical review, a process substantially quicker than clerical searching.
Using the 2011 census as our test data where 264,882 matches were found (60% automatically), our new methods matched 94% of households through deterministic match-keys with a precision of 99.99%. An additional 3% were linked through association, 5% were sent for clerical matching, leaving only 1,600 (<0.001%) matches to be found through resource-heavy clerical searching.
Using match keys made up of sets of person data in addition to household variables, and associative matching from person data, we have been able to successfully increase the number of matches made automatically on test data from Census 2011. This will decrease the resources needed for clerical matching and searching in 2021 enabling us to meet shorter timelines and maintain higher quality.
Matching households between the UK census and census coverage survey is an essential requisite to estimating census overcount and undercount. Census quality requirements are extremely high. In 2021, we will aim to produce outputs within a year of census day (previously this has been within 16 months). Clerical searching is very time-consuming, and so to meet these challenging timelines, we need to increase the automatic match-rate.
Matching is done at the household as well as person level. In 2011, automatic household linkage primarily used a derived ‘head of household’ alongside other household variables but issues with assigning the same HOH to a household on the census and CCS reduced the effectiveness of the method. To combat this, we have designed a method that combines the variables from the household itself with sets of individual person data and runs deterministic match-keys to match households. Following this, associative matching was applied to find some of the remaining households. The method used the households of matched people to produce candidate household matches for clerical review, a process substantially quicker than clerical searching.
Using the 2011 census as our test data where 264,882 matches were found (60% automatically), our new methods matched 94% of households through deterministic match-keys with a precision of 99.99%. An additional 3% were linked through association, 5% were sent for clerical matching, leaving only 1,600 (<0.001%) matches to be found through resource-heavy clerical searching.
Using match keys made up of sets of person data in addition to household variables, and associative matching from person data, we have been able to successfully increase the number of matches made automatically on test data from Census 2011. This will decrease the resources needed for clerical matching and searching in 2021 enabling us to meet shorter timelines and maintain higher quality.