6 July 2022 - Up-to-date small-area population figures are indispensable for political decision-making. Intercensal population updates allow the provision of up-to-date numbers of inhabitants at the geographical municipality level. The number of inhabitants is continuously updated on the basis of the 2011 Census using data from the statistics on births and deaths and migration statistics. A new experimental approach is used in addition to intercensal population updates so that the growing demand for smaller-area population figures can be met in the short term.
The analyses carried out so far on the representation of the population by means of mobile network data generally show that the distribution of the population can be represented in a good and timely manner using the mobile network data available (Hadam et al. 2020). Based on this, a project on "Experimental georeferenced population figure based on intercensal population updates and mobile network data" is now carried out. It examines whether, and to what extent, it is possible by means of mobile network data to redistribute the population data available from the intercensal population updates from the municipality level to 1x1 km grid cells covering the whole of Germany. The first official georeferenced population figures are expected to be available in late 2022 for reference year 2021 and new data sources and methods will have to be developed for annual updating. The time gap is bridged by using mobile network data as a makeshift solution and the result is used as experimental statistics.
Data basis
For this purpose, mobile network data from the Telefónica Deutschland network are used which are processed and made available by Teralytics as a data provider. Mobile network data are available for the whole of Germany on an INSPIRE-compliant 1x1 km grid. The place of origin of all mobile network signals recorded is determined using the first and the last signal within 24 hours to allow representation of the resident population in Germany by means of mobile network data. If the first and the last mobile network signal within a day are sent from within the same grid cell, that cell is taken as the place of residence. This allows us to determine the potential resident population through mobile network data. Also, mobile network activities were extrapolated using intercensal population updates of the same year. In compliance with the data protection rules, only anonymised values based on a minimum of five mobile network activities per grid cell are transmitted to the Federal Statistical Office so that it is not possible to derive information on individual devices or individuals. The evaluation is available for all weekdays as an average of at least eight selected weeks, excluding vacation and holidays, of a specific year.
Method
In a distribution procedure, the results of intercensal population updates are redistributed from the municipality level to a smaller-area level.
First, the 1x1 km grid cells of the mobile network data are allocated to municipalities, taking their area coverage as a basis. Mobile network activities are also distributed over overlapping municipalities according to their area proportions, so that they are not entirely allocated to one municipality. Biases and uncertainties in the results caused by overlapping grid cells are thus reduced because specific proportions of the mobile network data are allocated to municipalities according to their proportions in the municipality areas.
Based on this, the group-specific probabilities of the smaller-area 1x1 km grid cells to be selected from the potential resident population are determined from the mobile network data. The group-specific selection probabilities of mobile network data per grid cell are obtained by the ratio of mobile network activities per grid cell to the total of mobile network activities in the relevant municipality and, consequently, the “group”. They represent, in a simplified way, the proportions of mobile network activities per grid cell in the group. There is thus a selection probability, or a proportion, of 1 per municipality or group, based on the individual selection probabilities of the relevant grid cells. The number of inhabitants per municipality as obtained by intercensal population updates is then multiplied by the calculated selection probabilities and, on that basis, distributed over the small-area 1x1 km grid cells. Mobile network data thus provide a spatial distribution of the population within a municipality, which can supplement the existing intercensal population updates.
In a last step, the population data in a small-area distribution per municipality are rounded using the official population figure of the relevant municipality. The resulting data are then added to the data from intercensal population updates, which provides the number of inhabitants per municipality. The result is an experimental georeferenced population figure whose key elements correspond to those of official intercensal population updates.
Experimental georeferenced population figure
©Interactive grid map on 'Experimental georeferenced population figure' | Statistisches Bundesamt (Destatis) 2022
The experimental georeferenced population figure is available on the basis of 1x1 km and 10x10 km grid cells and is visualised by an interactive grid map. However, the interactive grid map is only available as a German language application. The experimental georeferenced population figure determined is shown for every grid cell if the cells are currently filled with mobile network activities and are not subject to anonymisation. In addition, grid cells containing an experimental georeferenced population figure of 3 or smaller are kept confidential and the values are indicated as an interval from 0 to 3.
The interactive map shows the experimental georeferenced population figure at a small-area level. The experimental georeferenced population figures are shown in ascending order. Light coloured cells have a low experimental georeferenced population figure, while dark coloured cells have a higher figure. This additionally allows the current population distribution to be compared between regions. As was expected, there are marked differences in the regional distribution of the experimental georeferenced population figure between urban and rural areas.
The experimental georeferenced population figures for 1x1 km grid cells can be downloaded within the interactive grid map application only for the sub-area selected. This is why the population grid of 1x1 km grid cells for the whole of Germany is provided here additionally as a download file - as at December 2019 and a download file - as at December 2020
Conclusion
The demand for up-to-date georeferenced population figures can be met by means of the experimental georeferenced population figure. Due to their experimental character, the benefits of the mobile network data and their comprehensive, small-area availability can be used for the distribution of population data below municipality level.
The nearly perfect correlation (Pearson correlation coefficient of 0.999) between the population data from intercensal population updates and the extrapolated mobile network activities at municipality level suggests high information value of the mobile network data for the total population and supports the reliability of that data source. Overall, this suggests high quality of the results. A final quality assessment of the results will however be possible only after an in-depth validation and plausibility check still to be conducted.
To better assess the plausibility of the results, various geodata from German land surveying are used and combined. The goal is to determine which grid cells show residential land or residential use and, consequently, whether resident population is to be expected, or can be ruled out, in the relevant grid cells. For the purpose, the datasets “Amtliche Hausumringe Deutschland” (HU-DE) and “Haushalte-Einwohner-Bund” (HH-EW-Bund) from the Federal Agency for Cartography and Geodesy are used. The HU-DE dataset enables building layout plans to be filtered explicitly by residential buildings. The geodata of the HH-EW-Bund, which is produced by infas 360 GmbH, provide an estimate of the number of households and inhabitants per address point.
Examining the two data sources enables us to determine for each grid cell whether the allocated experimental georeferenced population figure and its level are plausible or has been misallocated. Overall, for reference year 2019, the allocation of experimental georeferenced population figures was found to be plausible in 27.5% of the grid cells, partly plausible in 37.2%, and implausible in roughly 35.3%. The latter applies especially to sparsely populated regions. As shown in Figure 1 (2019), uncertainties arise definitely regarding the allocation in rural, less densely populated areas due to the small-area processing of mobile network activities performed by the data provider (for detailed results and further information see Hadam (2022)).
Based on these findings, processing the mobile network data are adjusted for the subsequent reference year 2020 in cooperation with the data provider. In small-area processing of mobile network activities, from the level of original mobile network cells to the 1x1 km grid cells used here, uninhabited areas are excluded. This is done by using additional land use information. The "Digitales Landbedeckungsmodell für Deutschland (LBM-DE)" was used to filter such areas.
Finally, applying land use information within the scope of mobile network data processing enables much more plausible results to be produced (see Figure 1 (2020)). Overall, 67.8% of the grid cells and the allocated experimental population figures are now considered plausible, 22.1% partially plausible, and only 10.1% implausible.
Figure 1 Visualisation of plausibility checking for reference years 2019 and 2020
It was also examined to what extent the described distribution procedure can be applied to socio-demographic data such as age and sex. As has been shown by the Federal Statistical Office (2021), the socio-demographic data of the mobile network operators are subject to considerable biases, which can also be found in the procedure described here. As especially people lacking legal capacity and prepaid customers are not included, the experimental georeferenced population figure cannot be shown in a plausible breakdown by age group and sex.
Overall, however, assessing the quality of the results still involves reservations. As mobile network activities of just one network provider in Germany were used, this leads to deviations and uncertainties in the results. This is mainly due to regional market shares and to the data processing methodology applied by the data provider that is not disclosed in detail.
Further feasibility studies, however, especially on the desired utilisation of mobile network data for the production of official statistics, will require anonymised individual data from all mobile network operators in Germany. This would further increase the nationwide representativeness and quality of the data. It is also necessary to create a legal basis in order to permanently secure the access to privately held data and enable their integration into official statistics production in the long term.
References
Hadam, S., Schmid, T., Simm, J. (2020): Kleinräumige Prädiktion von Bevölkerungszahlen basierend auf Mobilfunkdaten aus Deutschland. In: Klumpe, B., Schröder, J., Zwick, M. (Hrsg.) Qualität bei zusammengeführten Daten – Befragungsdaten, Administrative Daten, Neue digitale Daten: Miteinander besser? Springer, Wiesbaden, S 27-44 DOI
Statistisches Bundesamt (2021): Comparison of the mobile network data structures of two mobile network operators
Hadam, S. (2023): Experimentelle georeferenzierte Bevölkerungszahl auf Basis der Bevölkerungsfortschreibung und Mobilfunkdaten. Submitted for publication in AStA Wirtschafts- und Sozialstatistisches Archiv DOI-Fundstelle