Events Conference on Foundations and Advances of Machine Learning in Official Statistics, 3rd to 5th April, 2024

Session 3.2 Quality, Fairness and Reproducability

Quality Dimensions of Machine Learning in Official Statistics

Younes Saidani1, Dumpert Florian* 1

Abstract

Official statistics distinguishes itself through the legally stipulated requirement to ensure the quality of its publications. To this end, it adheres to European quality frameworks, which are operationalised at the national level in the form of quality manuals. Hitherto, these have been designed and interpreted with the requirements of “classical” statistical production processes in mind. Thus, in order to ensure continued adherence to quality standards, a tailored quality framework must be developed to accompany the increasing use of machine learning (ML) methods in official statistics. This paper makes three contributions to the development of such a quality framework for the use of ML in official statistics: (1) It identifies relevant quality dimensions for ML by analysing the quality principles contained in the European Statistics Code of Practice and (2) fleshes them out in light of the methodological peculiarities of ML. Unlike previous works, (2a) robustness is proposed as a stand-alone quality dimension, (2b) machine learning operations (MLOps) and fairness are discussed as two cross-cutting issues with relevance to most quality dimensions, and (2c) suggestions are made how quality assurance can be conducted in practice for each quality dimension. This work provides the conceptual groundwork for embedding ML quality indicators in the quality management systems used by official statistics for assessment and reporting, thus ensuring that the quality standard of official statistics continues to be met when new statistical procedures are used.

*: Speaker

1: Federal Statistical Office - Germany