Veranstaltungen 15. Wissenschaftliche Tagung am 20. und 21. Juni 2024

Datenerhebung, Datenqualität und Datenethik in Zeiten von künstlicher Intelligenz

Exploring ML-algorithms in social surveys to improve respondent experience and data quality, exemplified by the Adult Education Survey

Katharina Rossbach, Eirik Fredborg


Designing surveys containing questions which are clear and comprehensibly formulated as well as easy to answer can be challenging. Another dimension to consider when designing a survey is the sample. The sample consists often of various demographic groups which have different abilities and knowledge. Tailoring the questionnaire design to each group is challenging and often too time consuming. Firstly, one should determine which groups require a different survey design. Afterwards, it is necessary to map how each group understands the questions and then adjust the questions accordingly. In most of our surveys, we have taken the standpoint of creating one questionnaire which fits all respondents, with a few exceptions such as the survey about participation in cultural activities where we did some adjustments of the questions for children.

However, more possibilities might open up with the use of AI technology. We could create dynamic algorithms to analyse respondents’ behaviour and characteristics to design dynamic questionnaires. It can help us therefore to design more user-friendly surveys based on respondent characteristics, preferences as well as previous data available, depending on the survey at hand.

A survey, for which we would like to evaluate the incorporation of ML algorithms, is the Adult Education Survey (AES). The survey is conducted every 6 years and contains a section, the participation in non-formal education, which is quite demanding for respondents. The questions at the beginning of this section ask about the number and names of courses, workshops, private lessons and work-related training the respondents have taken part in during the last 12 months. However, challenges appear for the respondents to differentiate in various ways: to differentiate between workshops and courses, if private lessons such as driving lessons shall be counted as one activity or several. This leads to miscalculation of the number of activities. Since follow-up questions will be given on randomly selected activities, some respondents get into trouble, e.g. they write the same name for three activities, and get then follow-up questions, thinking they are asked repetitively the same questions. This might lead to frustration for the respondents and drop-off, or they might just click through the survey in order to finish it, and hence, it will affect the data quality.

In this paper, we will outline ideas how algorithms can be set up to improve the data quality and respondent experience for non-formal educational activities in the AES. Firstly, we will explore an algorithm for predicting which answers given by respondents lead to potential problems, such as low data quality or drop-off, in the follow up questions. Secondly, we will evaluate to identify low data quality or non-response based on respondent characteristics, such as being a student, language skills and age, using another algorithm. Lastly, potential restrictions, ethical issues and challenges with the use of those kind of algorithms will be described in this paper as well as how to handle them.