UDC 007:681.512.2
CORRECTION OF COVARIANT DRIFT OF THE CONCEPT FOR ENSEMBLES OF MACHINE LEARNING MODELS
I. Yu. Kashirin, Dr. Sc. (Tech.), full professor, RSREU, Ryazan, Russia;
orciS.org/0000-0003-1694-7410, e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
The article discusses a new approach to detect and correct one of data drift types in machine learning models, namely covariant conceptual drift. The approach assumes that machine learning model is designed as a set of models of various levels. The method of collecting the ensemble is a compositional begging method. The Begging method first uses weak models of the same type as ensemble components, then a number of iterations are used to increase the accuracy of resulting model to a certain level ac-ceptable for solving a forecasting problem in terms of accuracy and computational complexity. Various drift formulas of the concept based on conditional and unconditional probabilities of obtaining the target variable depending on feature vector data in input dataset are studied. The notions of positive and negative drift of the concept are introduced depending on their belonging to the corresponding class used in forecasting. A new approach uses a conceptual knowledge base of subject area, which allows a priori classifying the elements of feature vector in the form of generic taxonomy. A classified feature vector is a hierarchical structure that allows using bootstrap algorithm to form subsamples of features (folds) for preliminary training of weak models of the first and second levels. In this case, the folds can be ordered and then a resulting order can be used to identify and compensate for co-variant drift of working model concept. As an example for experimental research, the subject area of air transportation services from the Kaggle international repository is taken. Software implementation was performed using Spider v.4 toolkit in Python v.4. The results of the experiments show the effectiveness of a new approach to correct concept drift.. The aim of the work is to obtain a new approach to identify and correct a covariant concept drift, which makes it possible to correct the drift in ensembles of machine learning models.
Key words: : concept drift, ensembles of machine learning models, big data, prediction accuracy, knowledge base, ontological knowledge models.