Researchers from the University of Toronto’s Urban Data Research Centre have published a paper describing a new approach to representing the definitions of census attributes e.g. education, age, sex etc., using ontologies and linked data. It enables the integration of census data with other sources of data represented as linked data. The result is a richer representation of urban data to support evidence-based decision-making.

In an increasingly data driven society, census data has become critically important to the decision-making processes in many organisations.  Governments use population and demographic data to inform the planning of transportation infrastructure, disaster mitigation, welfare, health provision, educational services, and more.  Private sector businesses can use census data to understand the marketing environment and adapt their products and services to better suit the needs of the population. Non-profit organisations can use census data to plan community programs that respond to the needs of the population and identify neighborhoods that may need additional services or support.  Yet, the process of integrating the data from the Canadian Census can be challenging for these organisations as the definitions of the census characteristics are described using natural language, making it difficult for machines and programs to correctly interpret and use the data.   

The meaning of census data lies not just in the numbers, but also in the definitions of the attributes/characteristics being measured – if you want to understand a census number, you first need to read and understand the attribute’s definition. This is especially important if you want to integrate census data with other sources of data such as OpenStreetMap.

The article ‘Semantically Interoperable Census Data: Unlocking the Semantics of Census Data Using Ontologies and Linked Data,’ published in the International Journal of Population Data Science (IJPDS), details the design of the Canadian Census Ontology. It enables data from the Canadian Census of Population to be represented as linked data by building upon established ontology-based standards for city indicators and data, such as ISO/IEC 21972 and the ISO/IEC 5087 series.  This allows the natural language descriptions of the census characteristics to be broken down and expressed as a graph defined by ontology entities and properties, thereby making the data machine-interpretable. 

The effectiveness of the ontology is demonstrated using SPARQL query language to answer census related questions, and visualise them using automatically generated choropleth maps and other metric devices.

Next, the authors are pursuing the use of AI Large Language Models (LLM) for translating census characteristic definitions onto the ontology, and using LLMs as a natural language query interface. This will enable users unfamiliar with SPARQL to answer questions supported by the knowledge graph. 

Research Associate and lead author Anderson Wong added, “The Canadian Census Ontology is part of our goal of creating a City Digital Twin knowledge graph that integrates city data from multiple sources. We hope that our work will also help other organizations better integrate census data with their organizational data.”

 

Click here to view the full article

Anderson Wong, Research Associate, Urban Data Research Centre, University of Toronto, Canada

Wong, A., Fox, M. and Katsumi, M. (2024) “Semantically Interoperable Census Data: Unlocking the Semantics of Census Data Using Ontologies and Linked Data”, International Journal of Population Data Science, 9(1). doi: 10.23889/ijpds.v9i1.2378.