Events Conference on Foundations and Advances of Machine Learning in Official Statistics, 3rd to 5th April, 2024

Session 2.2 Streamlining Processes and Data Integration

Streamlining Business Functions in Official Statistical Production with Machine Learning

Sandra Barragan1, Carlos Sáez1, David Salgado* 1, Luis Sanguiao1


The use of artificial intelligence and machine learning techniques in the production of official statistics should be directed to the increase of quality in many of its dimensions. In this line, the use of statistical learning models and similar techniques should positively impinge on many traditional demands pressing on the production and release of multiple outputs by statistical offices to meet users' and stakeholders' needs.

In this work we present a set of different ongoing proposals at Statistics Spain to increase frequency, timeliness, granularity, accuracy, and cost-efficiency in the production of survey statistics (possibly combined with administrative data). Herein we provide the key core ideas in the use of statistical learning models. Firstly, to increase accuracy we delve into model-assisted estimation using ML algorithms. Secondly, we illustrate the use of random forests to compute local scores in selective editing to gain efficiency in error detection of categorical variables. Thirdly, we make use of gradient boosting algorithms to early impute survey microdata during data collection and data editing to produce early estimates in Short-Term Business Statistics. Fourthly, we integrate survey and admin data to impute target variables in the Structural Business Survey beyond the probabilistic sample aiming at a higher granularity. Fifthly, we pursue this same idea using patterns in the historic microdata sets in Short-Term Business Statistics to reduce response burden. Finally, we show how to use random forests to time-disaggregate sampling designs in order to produce weekly and monthly aggregates from quaterly survey data.

Beyond technical details, we focus on the strategic vision in connection with quality and the modernisation of official statistical production.

*: Speaker

1: Statistics Spain