This email address is being protected from spambots. You need JavaScript enabled to view it.
 
+7 (4912) 72-03-73
 
Интернет-портал РГРТУ: https://rsreu.ru

UDC 004.934:681.518

ACOUSTIC DESCRIPTORS OF HARMONIC SPEECH STRUCTURE FOR EMOTION ASSESSMENT

O. V. Melnik, Dr. in technical sciences, full professor, RSREU, Ryazan, Russia;

orcid.org/0000-0002-3513-2180, e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

S. I. Babaev, Ph. D. (technical sciences.), associate professor, RSREU, Ryazan, Russia;

orcid.org/0000-0001-5829-8223, e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

M. N. Saraev, post-graduate student, RSREU, Ryazan, Russia;

orcid.org/0009-0006-5118-3478, e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

The article presents classical acoustic descriptors based on the harmonic structure of speech used for automatic assessment of emotional states (neutral state – stress). The aim of the work is to systematize meth ods for analyzing the harmonic structure of speech, reveal their physiological basis and assess their in formativeness with respect to emotional changes. Key methods are considered: analysis of harmonic-to noise ratio (HNR), estimation of fundamental frequency (F0), parameters characterizing instability of period and amplitude (Jitter and Shimmer), spectral analysis based on short-time Fourier transform (STFT), cepstral analysis and formant analysis. Their extraction algorithms and sensitivity to emotional changes are described. Particular emphasis is placed on physiologically interpretable parameters (F0, HNR, Jitter and Shimmer) and on the fundamental methods underlying their calculation - spectral and cepstral analysis. The limitations of each method are highlighted, and recommendations for selecting descriptors are provided. The practical significance of the methods discussed lies in demonstrating their applicability on illustrative material: in a paired comparison (neutral - stress) characteristic changes were observed - a decrease in HNR, increases in Jitter and Shimmer, an increase in signal energy (MFCC0), and greater formant variability (F1 – F4). This confirms the sensitivity of the descriptors to emotional stress and supports the use of a com bined feature set. The article will be useful to specialists in signal processing, psycholinguistics and emotion recognition systems.

Key words: speech harmonics, HNR method, Fundamental Frequency estimation, Jitter, Shimmer, STFT analysis, Formant Analysis, Cepstral Analysis, emotion assessment, stress.

 Download