Events Conference on Foundations and Advances of Machine Learning in Official Statistics, 3rd to 5th April, 2024

Session 3.3 Methology I

Challenges in constructing confidence intervals around point estimates for resampling based performance

Sebastian Fischer1, Hannah Schulz-Kümpel1, Roman Hornung* 2


When evaluating the performance of ML models, a typical metric is the generalization error, i.e. the expected loss between observation and prediction of a new observed data point. Accordingly, a variety of point estimates for the generalization error have been proposed and implemented, all based on applying some resampling procedure to the data on which the model is fit. While these point estimates are generally quite useful and unobjectionable given a standard i.i.d. assumption for the data sample, the same does not necessarily hold for the methods that have been proposed to construct confidence intervals around them. Indeed, some such intervals put forward in the literature do not even satisfy the standard definition of a confidence interval. Having identified some central complexities regarding the conceptual modelling, proof of asymptotic behaviour, and empirical evaluation of point estimates and corresponding "confidence intervals" for the generalization error, chief among them the dependency structures created by applying resampling procedures, we have developed a methodological framework and empirical study design to systematically and extensively compare existing methods in this context. In this talk, we will summarize the methodological complexities that must be understood to make the necessary statistical considerations for an inference setting as complex as that of the generalization error and discuss selected results of the empirical method comparison.

*: Speaker

1: LMU Munich, Munich Center for Machine Learning - Germany

2: LMU Munich - Germany