Veranstaltungen 15. Wissenschaftliche Tagung am 20. und 21. Juni 2024

Datenerhebung, Datenqualität und Datenethik in Zeiten von künstlicher Intelligenz

Evaluating Machine Learning Algorithms to Detect Interviewer Falsification

Silvia Schwanhäuser1, Joseph Sakshaug1 2 3, Yuliya Kosyakova1 4, Natalja Menold5, Peter Winkler5


Interviewers play a vital role for the quality of survey data, by motivating participation and addressing respondents' inquiries. At the same time, interviewer-administered surveys are inherently susceptible to the influence of fraudulent interviewer behavior. Even small amounts of such falsified data can severely bias estimation results. Consequently, the identification of falsified interviews is an important issue in the quality control process. However, control procedures, such as re-interviewing or monitoring, can be time-consuming. Other data-driven detection methods can flag suspicious patterns in the survey data, but their implementation is cumbersome and costly.

An understudied detection approach is the use of innovative machine learning algorithms. Although some studies propose unsupervised algorithms such as cluster analysis or principal component analysis, there is hardly any literature on otherwise widely used supervised algorithms such as neural networks or decision trees. This is mainly due to the lack of appropriate test and training data, including a sufficient number of falsifiers and falsified interviews to evaluate the respective algorithms.

This study overcomes this limitation and examines the application of machine learning algorithms for detecting falsifications using both experimental and survey data. The experimental data were collected specifically to study falsifications, while the survey data were obtained from two large nationally representative German panel surveys that included falsified interviews. We evaluate the effectiveness of various supervised algorithms in order to detect (future) falsifications in different data sets. We use a variety of well-known algorithms such as regression models, decision trees, support vector machines, and neural networks.

Our findings suggest that most algorithms can easily re-identify similar types or strategies of fraudulent behavior. However, changes in the fabrication strategies of falsifiers pose a challenge to all algorithms. Therefore, supervised algorithms can help to quickly identifying known falsification patterns, but may not be suitable as a sole quality control process.

1: Institute for Employment Research (IAB), Germany

2: University of Mannheim, Germany

3: University of Munich, Germany

4: University of Bamberg, Germany

5: University of Dresden (TU-Dresden), Germany

6: University of Giessen, Germany