Using Integrated Administrative Data to Support A Field Enumeration Census: The Case of The New Zealand 2018 Census

Christine Bycroft
Abby Morgan
Nathaniel Matheson-Dunning


The New Zealand 2018 Census faced major challenges when implementing a new collection model, leading to a lower than expected response rate. The scale of the non-response meant planned methods for unit imputation were unlikely to be suitable, and new methods were developed to use administrative sources to count those who had been missed by the census field collection.

Objectives and Approach
This innovative approach was made possible because of the extensive linked administrative data available in Stats NZ’s Integrated Data Infrastructure, the IDI,
and built on previous research that had developed an administrative New Zealand resident population. New statistical methods were developed to account for known limitations of the administrative sources.

Administrative records were successfully incorporated into the census dataset producing New Zealand’s first ‘combined census’. Assessment against benchmark population distributions and sensitivity analysis indicate that the results are close to what is expected. Stats NZ is confident it has compiled a
census dataset that will provide census usually resident population counts and electoral counts of acceptable quality.

Legal, security and privacy issues have been carefully assessed and managed, although there remain outstanding questions about ‘social license’.

Conclusion / Implications
Administrative data does include many people who are typically hard to count through census field collection, and makes a significant quality improvement over previous census methods for non-response adjustment. A similar approach will be used to mitigate for non-response in the next census.

The lower than expected response rates and new data and methods have brought increased scrutiny of the census results. A different error structure from previous censuses (including improved counts of formerly under-counted ethnic groups) has disrupted time series for many variables.

