UDC 007:681.512.2
VECTORIZATION OF TEXT BASED ON ICF+ ONTOLOGY IN ENSEMBLES OF ML MODELS TO CLASSIFY ELECTRONIC RESOURCES
I. Yu. Kashirin, Dr. Sc. (Tech.), Professor of the Department of VPM, Ryazan State Technical University, Ryazan, Russia;
orcid.org/0000-0003-1694-7410, e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
The original technology of designing and applying machine learning models, as well as their ensembles, for classification and complex analysis of English-language political texts of domestic and pro-Western electronic media is considered. An end-to-end example of software implementation in Python v.3.10, Anaconda v.2.1 is considered. In technology software implementation, the following components are used: search retriever, Python patterns, intelligent insertion of special tokens. The effectiveness of the technology presented is confirmed by a series of practical experiments using the example of solving the problem of binary classifi- cation of news articles by ideological orientation into pro-Western and pro-Russian. The results of the study will be useful in forecasting crisis political situations. The aim of the work is to present a new way of ontological vectorization of political news, which allows analyzing and predicting social situations of various levels of detail. of political news, which allows analyzing and predicting social situations of various levels of detail.
Key words: : Bert models, ontological models, text vectorization, tokenizer, retriever, political news, ensembles of ML models, forecasting, semantic similarity.