Income is one of the most important measures of well-being, but it is notoriously difficult to measure accurately. In the United States, income data are available from surveys, tax records, and government programs, but each of these sources has important strengths and major limitations when used alone.
We link multiple data sources to develop the Comprehensive Income Dataset (CID), a prototype for a restricted micro-level dataset that combines the demographic detail of survey data with the accuracy of administrative measures.
By incorporating information on nearly all taxable income, tax credits, and cash and in-kind government transfers, the CID surpasses previous efforts to provide an accurate and comprehensive measure of income for the population of United States individuals, families, and households. We also evaluate the accuracy of different income sources and imputation methods.
While still in development, we envision the CID enhancing Census Bureau surveys and statistics by investigating measurement error, improving imputation methods, and augmenting surveys with the best possible estimates of income. It can also be used for policy related research, such as forecasting and simulating changes in programs and taxes. Finally, the CID has substantial advantages over other sources to analyze numerous research topics, including poverty, inequality, mobility, and the distributional consequences of government transfers and taxes.