Events Conference on Foundations and Advances of Machine Learning in Official Statistics, 3rd to 5th April, 2024

Session 4.2 Applied ML II

Nowcasting for Local Population Counts (Births, Deaths, Migration) via a Self-Developed Application

Kerstin Erfurth* 1


In the case of provisional monthly results, the number of births (for example) that were registered by the registry offices in a month is reported according to the "registration date". However, births and their registrations often do not take place in the same calendar month. This form of statistical reporting has been sufficient for publishing results during the year. Final birth figures by month of birth - i.e. by "event date" - are only available in the annual results with a considerable time lag. However, the demand for current data that reflects an intuitive interpretation of the statistics is growing. For this purpose, the Berlin-Brandenburg Statistical Office has developed an estimation procedure that allows a provisional birth count to be published after just two months, based on the actual month of birth. This makes it possible to switch from the registration date to the event date.

To this end, a model was sought that is capable of extracting knowledge from available but incomplete data and using it for an estimate. This knowledge can be used to create a forecast for the recent past - a so-called nowcasting. Nowcasting estimates can give a better indication of the current situation than raw, unfinished case counts. The approach presented here for estimating population figures at the most recent time period uses a regression approach. The method learns using provisional figures from a subset of registration offices in the near past, which allow a good forecast of the final total number. The selection criterion for determining the subset of registration offices is based on a strong correlation between the cases available at a point in time and the final number of cases in the past. In order to integrate this procedure into practical work, a tool was developed that visualises this approach and also enables comparisons with other procedural approaches. This application is being used for the first time for the publication of population statistics and it is planned to add further machine learning methods in the future. The aim of this presentation is to demonstrate the current status of the application tool.

*: Speaker

1: Statistical Office, Berlin-Brandenburg - Germany