Navigating model drift in a occupation classification study
Susie Jentoft*1
Abstract
Data underlying any machine learning model is prone to change over time in a process called model drift. This phenomenon is often overlooked when establishing models. The extent of change and effect on the model performance should be monitored to avoid decreasing predictive performance. This study explores model drift in a small case study of occupation classifications based on text variables from the Norwegian Labour Force Survey. We perform a comprehensive investigation of model drift including drift detection, drift understanding and drift adaption aspects. Firstly, Kolmogorov-Smirnoff drift detection is tested, together with a novel multivariate approach, using the RV coefficient, to explore local changes within occupation classes. Drift mitigation is explored using four adaptive methods: fixed-windows, weighting, Hoeffding adaptive trees and a new targeted matching approach to create training data. Feature drift was detected using both descriptive and statistical methods for one of the groups explored; a model with occupations of very different natures. Using RV values, drift was visualized and seen within classes of several of the occupations investigated. Slight decreases in model performance were observed when models were trained on a fixed, early period. Specific adaptive methods to learn under drift did not perform better than a generic approach using all data. However, within classes where gradual drift was observed, an adaptive weighting algorithm performed best. In the occupation class that showed a recurrent drift pattern, the novel targeted matching algorithm performed slightly better than other methods. Further investigations on how these methods perform on larger classification models are recommended to generalize these findings.