Events Conference on Foundations and Advances of Machine Learning in Official Statistics, 3rd to 5th April, 2024

Session 3.1 From Text to Code

NACE-Coding with Machine Learning at the Federal Statistical Office

Susanne Wegner* 1, Elias Minther1

Abstract

In the realm of official statistics, the categorization of economic activities plays a crucial role in analyzing and interpreting data. The NACE (Nomenclature générale des activités économiques dans les Communautés européennes) coding system is widely utilized for this purpose, but manual coding by domain experts is time-consuming and prone to inconsistencies. Thus, there is a growing need to automate the NACE coding process to address these challenges.

This talk focuses on leveraging Machine Learning (ML) techniques for automated NACE coding in official statistics. By harnessing the power of ML, we aim to optimize the coding process, increase its efficiency, and improve the quality of the resulting classifications.

We propose an approach that combines various ML algorithms and techniques to tackle NACE coding. We experiment with several ML algorithms such as Support Vector Machines (SVM), Logistic Regression, Naïve Bayes (NB), hierarchical approaches and Large Language Models (LLMs), compare their performance and discuss our results and findings.

Our results highlight the potential of ML algorithms and techniques but also the limitations and points out ways to further improve the quality of automated NACE coding.

*: Speaker

1: Federal Statistical Office - Germany