UDC 004.891
REFINING CENTROIDS OF VECTOR REPRESENTATIONS OF REGULAR EXPRESSIONS USING HYBRID OPTIMIZATION ALGORITHMS
L. A. Demidova, Dr. in technical sciences, Full Professor, Department of Corporate Information Systems, Instituteof Information Technologies, MIREA – Russian Technological University, Moscow, Russia; orcid.org/0000-0003-4516-3746, e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
N. A. Moroshkin, post-graduate student, Department of Corporate Information Systems, Institute of Infor mation Technologies, MIREA – Russian Technological University, Moscow, Russia;
orcid.org/0009-0002-8787-2452, e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
The article considers the solution to the problem of clustering vector representations of abstract syntax trees of regular expressions, for the formation of which the BERT model is used, using standard fuzzy C-means algorithm and its modifications. The main object of the study is hybrid optimization algorithms for the purpose of refining cluster centroids, using one of gradient optimization methods, such as GD, Adam, and RMSProp, in combination with one of evolutionary algorithms, such as classical Differential Evolution (DE) algorithm and its modifications –L-SRTDE and L-SHADE-RSP algorithms. The aim of the study is to determine the feasibility of using hybrid algorithms for optimizing cluster centroids for a standard fuzzy C-means algorithm and its modifications in clustering vector representations of regular expressions, taking into account their structural features. This study provides a comparative analysis of the results of various optimization approaches for refining cluster centroids, using gradient methods and evolutionary algorithms, both individually and as part of a hybrid optimization algorithm. Cluster analysis was performed using vec tor representations of regular expressions in a 32-dimensional space constructed using UMAP nonlinear dimensionality reduction algorithm. Clustering quality was assessed using a cluster silhouette index. The experimental results confirm the feasibility of using hybrid optimization algorithms that use a combination of gradient methods and evolutionary algorithms for refining cluster centroids for a standard fuzzy C-means algorithm and its modifications. The proposed hybrid optimization algorithms provide more accurate sepa ration of vector representations of regular expressions, which improves the quality of clustering problem solution.
Key words: regular expressions, fuzzy clustering, GD, differential evolution, L-SRTDE, L-SHADE-RSP.
