This email address is being protected from spambots. You need JavaScript enabled to view it.
 
+7 (4912) 72-03-73
 
Интернет-портал РГРТУ: https://rsreu.ru

UDC 004.891.3

OVERVIEW OF METHODS FOR CLASSIFYING SOUNDS OF URBAN ENVIRONMENT

G. M. Mkrtchyan, postgraduate student, Assistant of the Department of MCaIT MTUCI, Moscow, Russia; orcid.org/0000-0002-5802-5513, e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

N. A. Kravchenko, student of MTUCI, Moscow, Russia;

orcid.org/0009-0006-8897-2331, This email address is being protected from spambots. You need JavaScript enabled to view it.

Classifying the sounds of urban environment is a complex task that has some common facets both with the task of classifying images and the task of processing natural language. The article describes the methods of audio data preparation, and presents some types of deep neural network architectures used to classify sounds of urban environment such as 1DCNN, EsResNet, AST, PaSST. The advantages and disadvantages of such architectures are discussed. The methods of knowledge distillation and transfer used to increase the effectiveness of the methods used are considered. The aim of the work is to compare the results of model training on several datasets, including ESC-50, UrbanSound8K and FSD50K, based on mAP and Accuracy metrics.

Key words: : : convolutional neural network, end-to-end 1DCNN, ESResNet, AST, PaSST, knowledge transfer, knowledge distillation, UrbanSound8k, ESC-50, FSD50K, audio signal classification, feature extraction, spectrogram, datasets, evaluation metrics.

 Download