Events Conference on Foundations and Advances of Machine Learning in Official Statistics, 3rd to 5th April, 2024

Session 4.2 Processes

The process for machine learning implementation at Statistics Sweden

Jens Malmros* 1

Abstract

The production of statistics at Statistics Sweden is governed by the Swedish Process Model, which is similar to the GSBPM. The Process Model is operationalized in the Process Support System (PSS), which describes the phases and subprocesses of the statistical production. To facilitate the use of machine learning to improve efficiency in processes such as imputation, editing, and coding, Statistics Sweden has developed a process for machine learning implementation for integration in the Process Model and PSS.

The process for machine learning implementation was introduced as an overarching process in the PSS in 2023. The process, which is in part built on CRISP-DM, may be used to support the development and implementation of machine learning applications for official statistics production. In the initial subprocess, user needs, conditions, risks, and demands are mapped. This is followed by subprocesses on development and validation of machine learning models, resulting in the selection of a model to be used as input for the final deployment and production subprocesses. Each subprocess is described by its realisation, inputs, and outputs, of which the latter are typically used as inputs for subsequent subprocesses.

The development of the process was motivated by the process-based workflow at Statistics Sweden and by previous experiences of machine learning development. The process is still undergoing development on, for example, model maintenance, and is currently in use for several machine learning projects, in which it has shown to be useful.

1: Statistics Sweden