Intercensal and Postcensal Estimation of Population Size for Small Geographic Areas in the United States

Abstract Introduction Population estimation techniques are often used to provide updated data for a current year. However, estimates for small geographic units, such as census tracts in the United States, are typically not available. Yet there are growing demands from local policy making, program planning and evaluation practitioners for such data because small area population estimates are more useful than those for larger geographic areas. Objectives To estimate the population sizes at the census block level by subgroups (age, sex, and race/ethnicity) so that the population data can be aggregated up to any target small geographic areas. Methods We estimated the population sizes by subgroups at the census block level using an intercensal approach for years between 2000 and 2010 and a postcensal approach for the years following the 2010 decennial census (2011-2017). Then we aggregated the data to the county level (intercensal approach) and incorporated place level (postcensal approach) and compared our estimates to corresponding US Census Bureau (the Census) estimates. Results Overall, our intercensal estimates were close to the Census’ population estimates at the county level for the years 2000-2010; yet there were substantive errors in counties where population sizes experienced sudden changes. Our postcensal estimates were also close to the Census’ population estimates at the incorporated place level for years closer to the 2010 decennial census. Conclusion The approaches presented here can be used to estimate population sizes for any small geographic areas based on census blocks. The advantages and disadvantages of their application in public health practice should be considered.


Introduction
Population size plays a critical role in both public and private sectors for various purposes, such as allocation of resources and funds, environmental planning, public service delivery, estimation of disease rates, and measurement of the association between environmental exposure and health outcomes. Census data is a traditional and reliable source for population size at multiple geographic areas in the United States and other countries. However, counting population every tenth or fifth year fails to account for population size and unit of geography changes from year to year. Additionally, the needs for small area population estimates and their demographic characteristics that can be aggregated to all higher geographies continues to grow. These estimates can be used as denominators for rates, controls for demographic surveys, evidence to guide administrative planning, small area estimation, and many other applications. To fill these gaps, population estimation techniques have been developed and used in the world for the intervening years (see reviews [1,2]).
The cohort component method [3] is a standard demographic method that is widely used in many countries. It accounts for births, deaths, and net migration. The Census Bureau (the Census) in the United States uses this method to produce annual population estimates by subgroups (age, gender, and race/ethnicity) at the national, state, and countylevel (n=3,142) based on the most recent census [4]. However, this method is difficult to apply to smaller areas because vital statistics and migration data are not easily available for areas below the county level. Therefore, the Census uses a distributive housing unit method [5,6] to produce the total estimates for minor civil divisions (e.g., towns and townships) and incorporated places (e.g., cities, boroughs, and villages). However, the method does not provide subgroup specific population estimates directly, and local jurisdictional boundaries do not necessarily align with census boundaries. Some other population estimation approaches for small areas and the demographic characteristics include iterative proportional fitting approach [7,8], censal-ratio method [9] , statistical techniques that range from simple linear change to complex models [10][11][12], and approaches utilizing remote sensing and geographic information system(GIS) technologies [13,14]. Each approach has its own advantages and disadvantages, but most of these approaches are too complex, which may limit their application in practice.
In the present study, we estimated population sizes for small areas by demographic groups (age, sex, and race/ethnicity) as they change over time. Intercensal estimate are calculated using data in which a beginning and end years are reported and postcensal estimates are calculated using data with only a beginning year. However, precensal estimates could also be calculated using this method when using an end year of data. We generated these estimates at the census block-level because it is the smallest geographic unit [15] in the U.S. Census geographic hierarchy [16], which allows us to aggregate up to any small area geographies, such as census tracts, counties, ZIP Code tabulation areas. First, we calculated census block-level intercensal estimates of population size by subgroups at the census block level using 2000 and 2010 census population data. To evaluate the accuracy of these estimates, we aggregated them to the county level by the subgroups so that they could be compared with the Census' county-level population estimates. Second, we calculated census block-level postcensal estimates of population sizes by subgroups at the block level for years 2011-2017 following the 2010 decennial census. Similarly, we aggregated these estimates to the incorporated place level so that they could be compared with estimates provided by the Census.

Intercensal estimation
One of the challenges encountered in intercensal estimation for small areas is the change in geographic boundaries over time between decennial censuses. For the intercensal period between 2000 and 2010, a block Please in 2000 may have been divided into multiple parts to form several new blocks in 2010 or may have merged with other partial blocks to form a new block in 2010. To account for these changes, we redistributed the 2010 block-level population counts to the 2000 decennial census using the Census 2000 Tabulation Block to 2010 Census Tabulation Block Relationship File [17]). AREALAND_INT and AREALAND_2000 are variables in the Relationship File that indicate intersection of land area shared by the 2000 and 2010 blocks represented by the record and the 2000 land area, respectively. We used a ratio of AREALAND_INT to ARE-ALAND_2000 as a weight to the 2000 population counts for each block part and then summed them to the population for each block. A ratio of population at each block part over population at that block was then applied to the block-level population counts by subgroups (age group in 5-year intervals, sex, and race/ethnicity) in the 2000 decennial census data. Next, we used the two decennial census population counts by subgroups (Pop_2000 and Pop_2010, respectively) as the estimate base and assigned different weights (1-n/10 as weight_2000 and n/10 as weight_2010, n=1, 2, . . . .9) to each intercensal year based on how close it is to either of the two decennial census. For any intercensal year, we estimated its population by calculating Pop_2000*weight_2000 + Pop_2010*weight_2010. For example, the population in 2001 was estimated as Pop_2000*0.9+Pop_2010*0.1. Finally, we aggregated these block-level population estimates by subgroups to total county-level population estimates and compared them with those provided by the Census [18] for each intercensal year. The commonly used Absolute Percent Error (APE) was used to measure the difference between our estimates and those from the Census. Then we selected the middle year, 2005, to compare the two sets of county-level population estimates by age groups (10-year intervals), sex, and race/ethnicity. Given that the population size in some of the subgroups could be very small or even zero, we used the ratio of the Mean Absolute Error (MAE, calculated as the mean of the absolute error between our estimates and the Census' estimates), to the mean of the Census' estimates to compare estimates for this portion of the analysis rather than using APE.

Postcensal estimation
In this approach, we estimated the block-level population by subgroups for a current year (N bi_current ) for the years 2011 to 2017. The Census has both block-level and county-level population data by subgroups for 2010 (N bi_2010 and N ci_2010 , respectively) based on the April 1, 2010 census counts [19] . We assumed that the proportion of the population by subgroups at the block level within a county in 2010 (P bi =N bi_2010 /N ci_2010 ) remained the same in subsequent years. For 2011-2017, we applied this proportion to Census postcensal county-level estimates to arrive at block-level population estimates by subgroup (N bi_current = P bi * N ci_current ). Then N bi_current was aggregated to the incorporated place-level N pi_current for each year and compared to the Census data. The difference between our postcensal estimates and the Census estimates was measured by APE. Both the county-level population estimates by subgroups and the incorporated place-level estimates were downloaded from the Census' website [20]. Table 1 compares the county-level intercensal population estimates generated from our block-based method with those produced by the Census and presents the APEs for 3,143 US counties. Overall, the two sets of the estimates were very close. Relatively higher errors were found in the middle years (farther from either census year) and the highest error at the 90th percentile was 3.6% in 2006. The APEs were especially high for estimates of maximal county population for the years 2006 (192.4%) and 2007 (91.8%). Some of these discrepancies could be explained by the sudden changes in population size due to natural disasters, such as Hurricane Katrina.  (Table 2). Higher errors were associated with subgroups of small population size, such as American Indian/Alaska Natives in the ≥60 years age groups, and Native Hawaiian/Pacific Islanders and two or more races in all age groups. The distribution of errors were quite similar between males and females, though.

Main results
Considering individual-level household income quintile and neighbourhood-level material deprivation quintile as separate exposures, in age-, sex-, and cycle-adjusted models, risk of avoidable hospitalization increased in a graded manner across both income quintiles and deprivation quintiles (Model 1) ( Table 2). Adjustment for demographic variables slightly increased income effect sizes but had no effect on deprivation effect sizes (Model 2). Additional adjustment for other socioeconomic variables attenuated effect sizes, particularly for income quintiles 1 and 2 and deprivation quintiles 3-5 (Model 3). Here, individuals in the lowest income quintile and those living in the most deprived neighbourhoods were more than twice as likely to experience an avoidable hospitalization relative to those in the highest income quintile and living in the least deprived neighbourhoods, respectively. Final adjustment for behavioural variables further attenuated the effects of household income and material deprivation on risk of avoidable hospitalization (Income: RR 1.82 (1.56-2.13) Deprivation: RR 1.67 (1.44-1.95)) (Model 4). When both individual-level household income quintile and neighbourhood-level material deprivation quintile were entered in the model together, a similar pattern was observed with more attenuated effect sizes relative to the single exposure models. Table 3 presents the distributions of our census block-based postcensal population estimates, the Census's postcensal population estimates, and the APEs by incorporated place for 2011-2017. There were 19, 471 incorporated places in total, however 18 were excluded because they were formed after the 2010 Decennial Census. Generally, the distributions of the two sets of estimates were close in magnitude and the errors were in a reasonable range. It also shows that the error levels increase with each passing year, which indicates that our assumption that the percentage of block-level/county-level population size (P bi ) calculated in 2010 remains the same becomes less valid as we get further from 2010. For the large discrepancies in 2016 and 2017 shown in Table 3, we found that they could be due to the different 2010 estimate base that we used. Our postcensal estimation was based on the 2010 census counts, while the Census adjusted this base to reflect changes to the 2010 census population from the Count Question Resolution program, legal boundary and other geographic updates, and edits to the race categories [1]. Table 4 shows how this difference in 2010 values can have a profound effect on postcensal estimates for a few select incorporated places.

Discussion
In the present study, we used intercensal approaches to estimate the population sizes of subgroups at the block level for years between two decennial census years (2000 and 2010) and postcensal approaches for the years 2010-2017. Both approaches presented here were conducted at the block level to allow aggregation to any target small areas by subgroups.
To estimate the population sizes for small areas with demographic characteristics, one would not only consider the estimation errors but also need to take into account constraints of data sources, approach assumptions, complexity, and cost. Compared with most other methods in the current literature that we mentioned before, the methods outlined in this study have some notable advantages. First, decennial Census data are publicly available for all census blocks across the United States which provides local jurisdictions with a reliable source of population data source. Second, the assumptions are relatively straightforward making these methods relatively easy to implement. Third, these methods produce block-level population estimates, which provides more flexibility in generating population estimates for any upper geographic units or locally customized geographic areas. This advantage allows local health departments to produce population estimates for calculating disease rates, mapping disease burdens, or perform other disease surveillance activities for any geographic unit within their jurisdiction. Some precautions should be noted when utilizing the present procedures to estimate population size for small areas. First, our intercensal estimation assumed that changes to the county-level population sizes are linear between the two decennial census years, but sudden changes in population sizes may be caused by natural or human-made disasters, introduction of new residential subdivisions, and gentrification of existing domiciles. Relatedly, any changes to county-level population sizes are uniformly distributed across the county, which will produce overestimates in some blocks within a county and underestimates in other areas, depending on which areas within the county experienced these sudden population changes. Second, the accuracy of estimation was affected when some subgroup population sizes were too small, such as Native Hawaiian/Pacific Islander population. This problem could be addressed in future research by combining population groups, or combining census blocks, or combining race data from other data sources, such as American Community Survey. Third, when calculating the percentages in both procedures,    Abbreviation: APE, absolute percent error. the population size for a certain subgroup could be zero in the decennial census 2010 but non-zero in the following years.
In such a situation, we were not able to obtain percentages. Finally, as our postcensal estimation relied solely on the demographic pattern of the 2010 decennial census, the results may be more useful for years close to the 2010 decennial census than the distant years.

Conclusion
In this study, we estimated intercensal and postcensal subgroup population sizes at the block level. Local jurisdictions can use this method to calculate estimates that can be easily aggregated to any target small areas. The method itself can be generalized to apply to a wide variety of applications, such as calculating vital rates and small area estimation.