Experimental statistics Mobile network data representing the population

EXSTAT

Status as of 30 December 2019

Benefit of mobile network data for official statistics

Reliable information, at the lowest possible geographical level, on the distribution of the population and the number of inhabitants in a country is indispensable for evidence-based policy-making. Detailed information on the distribution of the population during the day is decisive not only in the case of disasters, epidemics or conflicts (Deville et al. 2014). This information also plays an essential role in regional planning when for example decisions are taken regarding the transport and education infrastructure or the cultural and social landscape. A current challenge of population statistics is to change over from a static to a dynamic (namely time-sensitive) representation of the population. Such dynamic representation cannot be achieved by using traditional survey data. New digital data, like mobile network data, have the potential to meet these challenges. Due to their detailed temporal and fine spatial resolution, mobile network data can contribute to reflecting population dynamics and providing more timely information on the population.

Data basis: mobile network data

In order to research the use of mobile network data for official statistics, the Federal Statistical Office entered into cooperation with T-Systems International GmbH and Motionlogic GmbH (both wholly-owned subsidiaries of Deutsche Telekom AG) in September 2017. The conceptual designs of the planned feasibility studies were developed in coordination with the Federal Network Agency, the Federal Commissioner for Data Protection and Freedom of Information, and also T-Systems. A primary objective has been to use mobile network data to provide a valid picture and estimate of Germany’s daytime and resident population. The population figures of the 2011 Census have been used as a benchmark to check the representativeness of the data.

The test dataset available to the Federal Statistical Office contains anonymised aggregated mobile network activity data for Telekom customers in the Land of Nordrhein-Westfalen (NRW). A mobile network activity refers to an event or signal generated at a cell tower, which is initiated by a minimum length of stay of a mobile device in an area studied, also called geometry. All signals which are generated when the mobile device is neither switched off nor placed in flight mode are evaluated. These signalling data, as they are called, are produced automatically and only register the location of the cell tower to which a mobile device is connected at a specific time. The test data available contain the average activities in a so-called statistical week that consists of 24-hour days which were selected from the months of April, May and September 2017. The week comprises five types of days: Monday, Tuesday to Thursday, Friday, Saturday and Sunday. In addition, information is also available on socio-demographic characteristics, like age group and sex, of contract mobile network customers.

To provide a picture of the resident and working population by means of mobile network data, short mobile network activities (for instance commuter movements) have to be filtered out. Therefore the dataset used only contains mobile network activities with a minimum dwell time of two hours.

In compliance with data protection rules, only anonymised values based on a minimum of 30 mobile network activities per geometry were transmitted to the Federal Statistical Office. This data pre-processing makes conclusions as to individual devices or individuals impossible.

Mobile network data providing information on the population

To get a first idea of the correlation, that is, the relationship between the mobile network activities in 2017 and the population figures of the 2011 Census, the two sets of data for Nordrhein-Westfalen, broken down by type of day and time of day, were compared with each other. The Pearson correlation coefficient determines the linear relationship or the strength of the relationship between a mobile network activity and the number of population per hour and geometry. There is a close linear relationship between the two values if the coefficients are close to 1. Figure 1 shows these coefficients for all weekdays by hour of the day. The values indicated reveal a high correlation of over 0.8 between mobile network activities and population figures during the evening hours and throughout Saturday and Sunday. This suggests strong regional similarities between the distribution of the resident population and the distribution of mobile network users during these periods. On weekdays, the correlation declines to less than 0.7 during the day, which is an indication of, on the one hand, stronger regional differences in the distribution of the two data sources and, on the other, the daytime population. Figure 1 shows clear changes in the correlation over the course of the day and week, which in turn are due to changes in the distribution of mobile network activities.

Due to the high correlation, the time period between 8 p.m. and 11 p.m. of a statistical Sunday was selected to represent the resident population by mobile network data. Here it is assumed that mobile network users stay at their place of residence while their mobile end devices are rather likely to be switched on.

For the purpose of further analysis regarding the resident population, the mobile network activities were converted by means of kernel density estimation and calibration as this permits a direct comparison with the population figures of the 2011 Census (see Schmid et al. (2019), Hadam et al. (2020) for more information). Subsequently, the average values of the evening activities were evaluated - in direct comparison with the population figures of the 2011 Census - for a total of 31 administrative districts and 396 municipalities including 22 towns not attached to an administrative district in Nordrhein-Westfalen in order to largely maintain the small-area approach. The following will only refer to the results obtained at district level.

Figure 2 shows, at the level of administrative districts, the geographical distribution of the population as identified in the 2011 Census (left) and of the mobile network activities in 2017 (right). There seems to be no apparent visual or regional difference between mobile network activities and population figures. This means that the distribution of mobile network activities is very similar to the distribution of the population as established in the Census.

For purposes of quantitative information, the geographical differences between the two data sources were determined as shown in Figure 3. It illustrates the regional differences between mobile network activities and population figures at district level. The areas highlighted in red indicate that the (calibrated) mobile network activities counted there exceeded the established number of inhabitants. The areas highlighted in blue instead indicate regions where less (calibrated) mobile network activities were registered. A precise calculation of the difference between mobile network activities and population figures has confirmed the regional differences shown in Figure 3. For 51% of the districts, the population estimates based on the mobile network activities on Sunday evening are sufficiently accurate. At municipality level, however, the population estimates based on mobile network data are acceptable only for 33% of the municipalities. Also, the number of inhabitants is on average represented far better at district level than at the level of municipalities. (See Schmid et al. (2019) and Hadam et al. (2020) for detailed results and more information).

The regional differences shown in Figure 3 can be largely explained by the fact that Deutsche Telekom's market shares across the mobile communications market are higher in rural areas and comparatively small in urbanised areas. This is due to Telekom’s extensive network which provides better network coverage in the rural regions.

In the framework of the EU "City data from LFS and big data" project, the Federal Statistical Office had an additional set of mobile network data at its disposal which contained the average mobile network activities at municipality and district level for a statistical Sunday evening in 2018 for the Federal Republic of Germany as a whole (European Commission 2019). For the first time this allowed mapping and estimating the resident population all over Germany on the basis of mobile network data. Similarly to Figure 3, Figure 4 again highlights the regional differences between the number of inhabitants and the mobile network activities. The key results and the pattern obtained for the Federal Republic of Germany are similar to those identified for Nordrhein-Westfalen. Based on the mobile network data of the Deutsche Telekom network, the population in rural regions is also overestimated in Germany as a whole, while it is underestimated in urbanised areas (see for example Berlin and Brandenburg). Also, the number of inhabitants is represented more accurately at district level than at municipality level in the whole of Germany (see the European Commission 2019 for detailed results and more information).

Conclusion

The results generally show that the mobile network data can – to a certain extent - provide a good picture of the population. The differences observed between the population represented by mobile network data and the figures based on census data may partly be explained by the time difference between the mobile network data from 2017/2018 and the census data from 2011; however, they may also result from the extrapolation method used by the data provider. Irrespective of the underlying geometry, the redistribution of mobile network activities permits additional areas of interest to be studied and explored. At the same time, this approach creates an additional uncertainty in the mobile network activities. The effects of this uncertainty are stronger the smaller the geometries. This is especially noticeable in the estimation of the population at municipality level which is far less than good compared to that at district level. Furthermore, the mobile network activities of only one mobile network provider in Germany were analysed. The resulting differences, caused by the relevant market shares, are also visible in the spatial distribution of the mobile network activities. As a matter of fact, the socio-demographic characteristics, too, reflect the customer structure of the network provider. As it is essential that the data are representative across the whole country, further steps will be required to obtain, to the extent possible, data from all network providers in Germany.

References

Deville, P. et al. (2014): Dynamic population mapping using mobile network data. Proceedings of the National Academy of Sciences. 111 (45) 15888-15893

European Commission (2019): City data from LFS and Big Data (Final report)

Schmid, T., Hadam, S., Salvati, N., Bruckschen, F., Zbiranski, T. (2019): Kleinräumige Prädiktion von Indikatoren basierend auf Mobilfunkdaten. 13th scientific symposium on "Qualität bei zusammengeführten Daten – Befragungsdaten, Administrative Daten, Neue digitale Daten: Miteinander besser?"

Hadam, S., Schmid, T., Simm, J. (2020): Kleinräumige Prädiktion von Bevölkerungszahlen basierend auf Mobilfunkdaten aus Deutschland. In: Klumpe, B., Schröder, J., Zwick, M. 
(ed.): Qualität bei zusammengeführten Daten – Befragungsdaten, Administrative Daten, Neue digitale Daten: Miteinander besser? Springer, Wiesbaden, S 27-44 DOI-Site