Events Conference on Foundations and Advances of Machine Learning in Official Statistics, 3rd to 5th April, 2024

Session 4.1 Measurement Error and Sampling

Incorporating machine learning in capture-recapture estimation of survey measurement error

Joep Burger* 1, Jonas Klingwort1, Maaike Walraad2


Capture-recapture (CRC) is currently considered a promising method to use non-probability samples to estimate survey measurement error. In previous studies, we derived adjusted survey estimates using CRC by combining probability-based survey data (as the initial data source) and non-probability road sensor data (as the secondary data source). The design-based survey estimate was considerably lower than the CRC estimates, which are based on multiple data sources and statistical models. A likely explanation is measurement error in the survey, which is conceivable given the response burden of diary questionnaires. This paper explores the potential of machine learning as a more flexible alternative to the commonly used regression models as the basis for a number of CRC estimators. Moreover, we report on the potential impact of the quality of the non-probability source degrading over time. In particular, we study differences in prediction quality, point estimates, variance estimates, and estimates of measurement error in five years. Results show that machine learning clearly outperforms the regression models, but the obtained CRC point estimates remain largely unaffected. Log-linear estimators, in combination with machine learning models seem more sensitive to a declining number of working sensors than the Lincoln-Peterson estimator, Huggins estimator, and log-linear estimators with regression models.

*: Speaker

1: Statistics Netherlands

2: Utrecht University - The Netherlands