Veranstaltungen 15. Wissenschaftliche Tagung am 20. und 21. Juni 2024

Datenerhebung, Datenqualität und Datenethik in Zeiten von künstlicher Intelligenz

Using the Large Language Model BERT to categorize open-ended responses to the "most important political problem" in the German Longitudinal Election Study (GLES)

Dr. Julia Weiß, Jan Marquardt

GESIS – Leibniz Institute for the Social Sciences in Mannheim


Open-ended survey questions play a pivotal role in capturing unforeseeable trends, yet handling the ensuing unstructured text data presents considerable challenges. The quantitative usability of such data requires categorization, a labor-intensive process in terms of both costs and time, particularly when dealing with extensive datasets. The German Longitudinal Election Study (GLES) spanning from 2018 to 2022, encompassing nearly 400,000 uncoded mentions, instigated an exploration of innovative coding methods. Our objective was to evaluate diverse machine learning approaches to identify the most efficient and cost-effective method for establishing a long-term solution for coding responses, ensuring concurrent high quality. Specifically, we aimed to determine the optimal approach for the long-term coding of open-ended mentions concerning the "most important political problem" in the GLES.

Pre-2018, GLES data underwent manual coding. Transitioning to a (partially) automated process entailed a thorough revision of the codebook. Subsequently, we leveraged an extensive dataset comprising nearly 400,000 open responses to the question about the "most important political problem" in GLES surveys conducted between 2018 and 2022. The coding process was streamlined using the Large Language Model BERT (Bidirectional Encoder Representations from Transformers). Throughout the entire process, a comprehensive array of crucial aspects was rigorously tested, including hyperparameter fine-tuning, downsizing the "other" category, simulations with different amounts of training data, quality control across various survey modes, and the utilization of training data from 2017, all leading to the final implementation.

The "new" codebook already exhibits notable quality and consistency, as indicated by its Fleiss Kappa value of 0.90 for the matching of individual codes. Building upon this refined codebook, 43,000 mentions were manually coded, serving as the training dataset for BERT. The final implementation of coding for the extensive dataset of almost 400,000 mentions using BERT yielded exceptional results, with a 0/1 loss of 0.069, a Micro F1 score of 0.946, and a Macro F1 score of 0.878.

The outcomes underscore the effectiveness of the (partially) automated coding approach, placing emphasis on accuracy achieved through the refined codebook and the robust performance of BERT. This strategic shift towards advanced language models signifies an innovative departure from traditional manual methods, placing a premium on efficiency in the coding process.