Events Conference on Foundations and Advances of Machine Learning in Official Statistics, 3rd to 5th April, 2024

Session 3.2 Quality, Fairness and Reproducability

Legal Implications of the use of Machine Learning in Official Statistics

Leon Krög* 1


In its landmark Census Decision, the German constitutional court ruled that National Statistical Organizations must constantly question their processing methods to make sure that they are still complying with modern developments.

One such development is machine learning (ML), which can be a beneficial tool for Statistical Organizations by allowing more efficient processing of vast amounts of data.

However, statistics dealing with personal data must comply with the requirements of the General Data Protection Regulation (GDPR) and the national laws that regulate statistics (such as the German Bundesstatistikgesetz). Both require that data which is used in generating public statistics cannot be attributed to natural persons. Some argue that the use of ML could pose a threat for data protection due to its capability to analyze large amounts of data and recognize patterns in it, turning anonymous data back into personal data. Since anonymization and aggregation of data are conditions for privileges in the processing of official statistical data, the use of machine learning could be limited by data protection law. The present work therefore analyzes whether the use of ML really threatens anonymous statistics data from a legal point of view.

Furthermore, the EU is currently working on a legal framework to regulate artificial intelligence depending on the risk it poses, grouping AI applications in four different risk-levels. This work therefore also assesses how the use of ML in official statistics would be classified under the risk framework of the recent AI-Act. By doing so, it analyzes to what extent the use of ML in official statistics would be regulated by European law.

*: Speaker

1: University of Mannheim - Germany