UDC 004.85
AN APPROACH TO REGULAR EXPRESSION ANALYSIS USING GRAPH NEURAL NETWORKS AND CONTRASTIVE LEARNING
L. A. Demidova, Dr. in technical sciences, Full Professor, Professor at the Department of Corporate Information Systems, Institute of Information Technologies, MIREA – Russian Technological University, Moscow, Russia; orcid.org/0000-0003-4516-3746, e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
V. E. Zhuravlev, Post-graduate Student at the Department of Corporate Information Systems, Institute of Information Technologies, MIREA – Russian Technological University, Moscow, Russia;
orcid.org/0009-0008-2942-0312, e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
The paper explores an approach to feature extraction from regular expressions using graph neural net works and contrastive learning. A novel method for constructing graph representations of regular expres sions based on their textual form is proposed. The resulting graphs preserve both semantic and structural properties of original regular expressions. To analyze these graph representations, a machine learning mod el is introduced, leveraging graph neural network and global aggregation with attention mechanism. Model parameters are optimized using contrastive learning in self-supervised paradigm, where similar graphs are generated automatically through random augmentations. The experiments utilize a dataset of several thou sand regular expressions collected from Regex101 website. The final model, trained on a dedicated training subset, is evaluated based on the quality and interpretability of vector representations for regular expres sions it produces. To assess this, clustering is performed on validation subset, demonstrating high quality of feature extraction from regular expressions and confirming the effectiveness of graph neural networks and contrastive learning.
Key words: : regular expressions, machine learning, feature extraction, representation learning, graph neural networks, attention mechanism, contrastive learning, clustering, k-means.
