
An artificial inhabitants for agent-based modelling in Canada – Scientific Information
Zoning system
The artificial inhabitants era makes use of the multi-level spatial zoning system outlined by Statistics Canada44. On the highest degree, the examine space includes the entire Canada, which is split in 10 provinces and three territories. Every province or territory is split into census subdivisions (CSD), which is the overall time period for municipalities or areas handled as municipal equivalents for statistical functions. All CSD are additional divided into dissemination areas (DA), small geographic items every with a mean inhabitants of 400 to 700 individuals primarily based on knowledge from the earlier census. Every DA is additional divided into dissemination blocks (DB), however solely census inhabitants and dwelling depend knowledge can be found at this scale. DA are the smallest commonplace geographic areas for which all census knowledge is disseminated. The artificial people are produced for the entire Canada and are localised on the DA scale.
Inputs
Two publicly out there knowledge sources, outlined in Table1 are used as enter: 2016 census knowledge and 2018 inhabitants projections. Tables26 present instance extracts of the enter recordsdata.
2016 Census knowledge
The 2016 census knowledge had been launched in varied methods. For this work, we used 4 outputs from the 2016 census:
-
The Particular person PUMF45. This microdata file supplies entry to non-aggregated knowledge on the traits of the people within the Canadian inhabitants. The file comprises a 2.7% pattern of the Canadian inhabitants and supplies entry to 930,421 anonymised particular person information from the 2016 Census questionnaire. Every particular person on this pattern presents 123 variables, a singular identifier and a person weighting issue. People within the PUMF are localised on the provinces (and a bunch gathering the three territories) degree to protect confidentiality.
-
The Hierarchical PUMF46. Equally to the person PUMF, this file supplies entry to non-aggregated knowledge for a pattern of 1% of the Canadian households. The file comprises 343,330 people information associated to 140,705 households, and thus permits the examine of people in relation to their households. Every particular person file is restricted to the provinces degree and consists in 95 variables, a singular identifier, a family identifier and a person weighting issue.
-
The Census Profile47. This file comprises mixture inhabitants counts for varied variables (age, intercourse, training, households, revenue, and so forth) and for varied ranges of geography, together with provinces and territories, CSD and DA. We used the census profile with counts disseminated on the DA degree. We used as enter the census profile cut up into six recordsdata area by area, in an effort to keep away from loading a 5Gb file directly.
-
The Geographic Attribute File48. The file comprises info on the DB degree, primarily based on 2016 Census commonplace geographic areas with correspondences from DB to increased ranges. The file is thus helpful for acquiring the entire geographic hierarchy of areas with the codes and names used for every degree of the geographic hierarchy. For instance, the codes for all DAs belonging to a CSD or a province might be obtained from this file.
It ought to be famous that the PUMF recordsdata don’t embody individuals dwelling in establishments or collective dwellings akin to hospitals, nursing houses, penitentiaries or scholar residences. These persons are estimated to symbolize 1.9% of the Canadian inhabitants in keeping with 2016 Census, greater than half of them dwelling in nursing houses or residences for senior residents. Folks dwelling in collective dwellings are counted within the artificial inhabitants however are assigned into non-public households and have attributes from the PUMF, i.e. attributes from individuals not dwelling in collective dwellings. If the dataset is used to review individuals dwelling in collective dwellings, it would subsequently be essential to adapt the artificial inhabitants, particularly when producing the households.
Furthermore, to guard the confidentiality of people, areas with a inhabitants of lower than 40 individuals should not current within the census profile knowledge and census profile counts are randomly rounded both up or all the way down to a a number of of 5 or 10.
2018 Inhabitants projections
The second knowledge supply is inhabitants projections for provinces and territories49. The nationwide statistical company of Canada develops inhabitants projections by age and intercourse each 5 years for provinces and territories, primarily based on varied assumptions on the inhabitants development. The final projections had been developed in 2018, for 2018 to 2043. The inhabitants projections offers a perspective of the longer term Canadian inhabitants demography in keeping with 9 situations. Every situation is constructed on assumptions about the principle elements of inhabitants development (fertility, life expectancy at delivery, interprovincial migration, immigration and emigration). 5 medium-growth situations (M1, M2, M3, M4 and M5) replicate completely different inside migration patterns noticed prior to now, low-growth (LG) and high-growth (HG) situations discover both decrease or increased inhabitants development than within the medium-growth situations, and fast-aging (FA) and slow-aging (SA) situations take into account both sooner or slower inhabitants getting older than within the medium-growth situations.
We generated an artificial inhabitants for every projection situation to make sure that the mannequin might be utilized to all potential use circumstances. For the dataset validation we used the LG situation, which is predicated on the next assumptions: the fertility charge reaches 1.4 youngsters per lady in 2042/2043; life expectancy at delivery reaches 82.6 years for males and 86.6 years for females in 2042/2043; interprovincial migration is predicated on linear interpolation of just lately noticed migration charges to charges noticed over an extended time period reached in 2030/2031, and charges that stay fixed thereafter; the immigration charge reaches 0.65% in 2042/2043; the annual variety of non-permanent residents reaches 1,259,300 in 2043; the web emigration charge reaches 0.17% in 2042/2043.
All of the enter knowledge sources used to generate the artificial inhabitants are publicly accessible by Statistics Canada Catalogue and might be downloaded from the sources listed in Table1. PUMF are printed underneath the Statistics Canada Open Licence since October 2018. They are often ordered without cost from Statistics Canada Catalogue45,46 or might be downloaded from Abacus50,51, a repository of open knowledge hosted by UBC Library. The enter .csv file for the inhabitants projections might be downloaded by the Statistics Canada Catalogue by deciding on Obtain choices after which CSV – Obtain complete desk Projected inhabitants, by projection situation, age and intercourse, as of July 1.
Workflow
The general workflow for producing the artificial populations on this examine is detailed in Fig.1. The inhabitants synthesis consists of 4 sequential steps: (1) era of a base artificial inhabitants of people for 2016, (2) projection of the bottom artificial inhabitants in direction of future years 2021, 2023 and 2030, (3) project of people into households and (4) project of households sorts. On Fig.1, scripts for every step are in blue and in orange is proven exterior knowledge sources and enter/output knowledge for every script. On the correct of every script, script parameters and one instance of parameters are given. Every workflow step is described as follows.
4-step workflow for producing the artificial inhabitants. Every of the 4 scripts (in blue) takes as enter (in orange) recordsdata from the 2016 census, from the inhabitants projections and an output from the earlier script, in addition to some parameters (in gray).
Base artificial inhabitants era
Step one includes synthesising a inhabitants province by province for the bottom yr 2016, on the DA degree. The QISI method, which mixes IPF and QIS is used to synthesise an integral inhabitants DA by DA. Inhabitants synthesis for one province is carried out as described in Algorithm 1.
Algorithm 1
Inhabitants synthesis algorithm
Seed initialisation
The weighted people localised within the province from the 2016 Particular person PUMF are used to initialise the seed. As a result of convergence issues can happen when one of many rows is zero and the marginal complete is nonzero, we allowed the zero state within the seed to be occupied with a small chance. The people variables within the seed are: age group, intercourse, highest diploma, labour pressure standing, family dimension, complete revenue and family duty.
Marginals initialisation
The combination counts by DA for every variable are loaded from the 2016 Census Profile and are used as marginals (i.e. goal totals) within the IPF process. Generally the subtotal for a variable just isn’t out there on the DA degree. Then the distribution of the variable on the province degree is used to deduce the DA subtotal.
The marginals loaded for every DA are: complete inhabitants, complete variety of households, complete inhabitants by intercourse, complete inhabitants by age group, complete inhabitants by age group and intercourse, complete inhabitants by family dimension, complete inhabitants by highest diploma, complete inhabitants by labour pressure standing, complete inhabitants by revenue group.
The Particular person PUMF variables classes and the Census Profile variables classes don’t at all times match; e.g. classes for age group in PUMF comprise 5 to six years and seven to 9 years whereas Census Profile report counts for five to 9 years. We then used unified variables classes. The correspondence between classes used within the Particular person PUMF, within the Census Profile, and within the artificial inhabitants is detailed in Tables713.
Marginals matching
The subtotals sum for every variable have to be equal to the DA complete inhabitants depend in an effort to apply IPF. Nonetheless, classes of among the variables within the Census Profile report counts just for the inhabitants aged 15 years and over. So as to match the whole inhabitants depend, we added the depend of inhabitants for the age group 014 years to the class No certificates, diploma or diploma for the Highest diploma variable, to the class Not in labour pressure for the Labour pressure standing variable, and to the class <$20,000 for the Complete revenue variable.
Furthermore, resulting from lacking knowledge and randomly rounded variables to protect confidentiality, variable totals don’t at all times match the DA complete inhabitants depend. Complete inhabitants counts by intercourse, by age group, by age group and intercourse, by family dimension, by highest diploma, by labour pressure standing, and by revenue group have subsequently been adjusted to match the whole inhabitants depend. The marginals matching course of is completed for every variable by iteratively growing or lowering the variable marginals following the province marginals distribution, till the variable marginals sum match to DA complete inhabitants
Quasirandom integer sampling of IPF (QISI)
The QISI algorithm first constructs a chance distribution for people, constrained to the marginal sums in each dimension, utilizing IPF. QISI then samples the integral inhabitants utilizing Quasirandom Integer Sampling with out substitute. We used the implementation from the humanleague package deal42, developed for micro-synthesising populations from marginal and seed knowledge.
Inhabitants projection
Inhabitants projections printed by Canadas nationwide statistical company can be found by age and intercourse for every province or territory, for annually from 2018 to 2042, and for 9 inhabitants development situations. We’ve projected the 2016 base artificial inhabitants for the longer term years 2021, 2023 and 2030, province by province, in keeping with every situation.
For every situation, every province, and every projection yr, we calculated the distinction in inhabitants by age group and intercourse between 2016 and the projection yr. Then, for every age group and intercourse, we utilized a resampling, by randomly duplicating or deleting people from the 2016 inhabitants in that age group and intercourse group to match the inhabitants of the projection yr. Algorithm 2 particulars this method.
Algorithm 2
Inhabitants projection algorithm
Family project
The third step consists in assigning the artificial people into households. This step is carried out for every situation, annually of projection and every province or territory, in keeping with Algorithm 3. At this step, an age attribute is added to every artificial particular person when the artificial inhabitants is loaded. The age attribute is randomly drawn within the age group vary of the person. For the people aged 0 to 84, a uniform distribution over the age group vary is used. For the people aged 85 and over, a geometrical distribution over the age group vary with successful chance p=0.2 is used, to replicate the inhabitants fast decline on this age group.
Households initialisation
For every DA, we all know the variety of households that must be assigned by the variety of artificial people who’re recognized as main family maintainer. For every DA, we then create one family by particular person recognized as main family maintainer.
Households dimension dedication
Then, for every family, we get the family dimension from the first maintainer attributes in an effort to know what number of members must be assigned to this family. If the family is one individual, then the family solely comprises the first maintainer and is full. If the family is multiple individual, then it must be accomplished with non-responsible people.
Households completion
Every family is accomplished with non-responsible people. The non-responsible people are grouped by family dimension attribute, in order that they’re assigned to a family with a corresponding dimension. The non-responsible people are labeled by age group both as younger (age <19 years) or as grownup. Younger people are assigned into households as a precedence, to keep away from ending up with a excessive (and so unrealistic) variety of younger people not assigned to any family.
The distribution of non-responsible people age group and intercourse by main maintainers age group and intercourse is inferred from the Hierarchical PUMF, for every family dimension. A non-responsible particular person is linked to an family by randomly sampling one particular person among the many non-responsible people, in keeping with the distribution outlined by census micro-data. For instance, a 2-persons family with a main maintainer male aged 8084 is extra prone to embody a feminine aged 80 than a feminine aged 04. This enables to protect the distribution of family buildings from the 2016 Census. If family construction is vital info for the thought-about use case, the project course of ought to be additional refined. It may take note of the occupational standing, training and revenue of people when assigning them into households, and embody shared flats and aged residences, for a extra exhaustive illustration of family relationships.
When a person is added to an family, his HID attribute will get equal to the family identifier and the person is faraway from the pool of unassigned people.
Remaining people project
Huge households (5 individuals or extra) within the DA are then accomplished with non-responsible people who must be in large households and who weren’t assigned within the earlier step. Lastly, households that aren’t full are stuffed in with unassigned non-responsible people in keeping with the distribution outlined by census microdata. After the family project course of, every particular person has an extra age attribute and a HID attribute associated to his family. In some DA, a small variety of households is not going to be full or a small variety of people is not going to be assigned to an family (as a result of the households quantity and sizes don’t precisely match the people depend). The unassigned people have an HID attribute equal to 1.
Algorithm 3
Family project algorithm
Family kind project
A remaining step consists in assigning a kind to every family. The family kind is inferred from the variety of members within the family and from their age. This step is carried out for every situation, every projection yr and for every province or territory.
Households census categorisation
Statistics Canada classifies households into 9 sorts: One-census-family family with out further individuals: Couple with out youngsters/Couple with youngsters/Lone guardian household, One-census-family family with further individuals: Couple with out youngsters/Couple with youngsters/ Lone guardian household, A number of-census-family family, Non-census-family households: One individual family/Two or extra individual non-census-family family. A census household is outlined as a married couple, a common-law couple or a lone guardian with not less than one youngster dwelling in the identical dwelling. Census household households comprise not less than one census household. Non-census-family households are both one individual dwelling alone or not less than two individuals who reside collectively however don’t represent a census household.
Households simplified categorisation
We outlined the next simplified classes for the family kind: One-person family, {Couples} with out youngsters, {Couples} with youngsters, One-parent-family and Different sort of family. We assigned the 4 most classical family sorts (83% of people within the 2016 census): One-person family, {Couples} with out youngsters, {Couples} with youngsters, and One-parent-family, following simplistic guidelines relating to people ages. Different family buildings (shared lodging, extra advanced household family, ) are thought-about as Different sort of family. This course of is simplistic in the way in which that it doesn’t take note of {couples} with a big age distinction, step households with little age distinction between an grownup and one of many youngsters, or people dwelling in a family with no household relationship.
Family kind project course of
Algorithm 4 describes the project course of. Households composed of 1 particular person are one-person households. Households composed of two members having greater than 16years distinction are assumed to be one-parent household households. In any other case, if each members are aged greater than 16, the family is presumed to be a pair with out youngsters. For households with 3 to six members, the next assumptions are utilized. If the 2 oldest members are aged greater than 16 and different members are lower than 16, or if the 2 oldest members have greater than 16 years distinction with the final member, the family is a pair with youngsters. In any other case, if the oldest member has greater than 16 years distinction with different members, who’re all lower than 16, then the family is a one-parent household. All unassigned households after this course of are thought-about to be different sort of households.
Algorithm 4
Family kind project algorithm

