A method for extracting the semantic features of speech signal recognition based on empirical wavelet transform

Oleksandr Lavrynenko, Denys Bakhtiiarov, Vitalii Kurushkin, Serhii Zavhorodnii, Veniamin Antonov, Petro Stanko

Abstract


The subject of this study is methods for improving the efficiency of semantic coding of speech signals. The purpose of this study is to develop a method for improving the efficiency of semantic coding of speech signals. Coding efficiency refers to the reduction of the information transmission rate with a given probability of error-free recognition of semantic features of speech signals, which will significantly reduce the required source bandwidth, thereby increasing the communication channel bandwidth. To achieve this goal, it is necessary to solve the following scientific tasks: (1) to investigate a known method for improving the efficiency of semantic coding of speech signals based on mel-frequency cepstral coefficients; (2) to substantiate the effectiveness of using the adaptive empirical wavelet transform in the tasks of multiple-scale analysis and semantic coding of speech signals; (3) to develop a method of semantic coding of speech signals based on adaptive empirical wavelet transform with further application of Hilbert spectral analysis and optimal thresholding; and (4) to perform an objective quantitative assessment of the increase in the efficiency of the developed method of semantic coding of speech signals in contrast to the existing method. The following scientific results were obtained during the study: a method of semantic coding of speech signals based on empirical wavelet transform is developed for the first time, which differs from existing methods by constructing a set of adaptive bandpass Meyer wavelet filters with further application of Hilbert spectral analysis to find the instantaneous amplitudes and frequencies of the functions of internal empirical modes, which will allow the identification of semantic features of speech signals and increase the efficiency of their coding; for the first time, it is proposed to use the method of adaptive empirical wavelet transform in the tasks of multiple-scale analysis and semantic coding of speech signals, which will increase the efficiency of spectral analysis by decomposing the high-frequency speech oscillation into its low-frequency components, namely internal empirical modes; the method of semantic coding of speech signals based on mel-frequency cepstral coefficients was further developed, but using the basic principles of adaptive spectral analysis with the help of empirical wavelet transform, which increases the efficiency of this method. Conclusions: We developed a method for semantic coding of speech signals based on empirical wavelet transform, which reduces the encoding rate from 320 to 192 bps and the required bandwidth from 40 to 24 Hz with a probability of error-free recognition of approximately 0.96 (96%) and a signal-to-noise ratio of 48 dB, according to which its efficiency is increased by 1.6 times as compared to the existing method. We developed an algorithm for semantic coding of speech signals based on empirical wavelet transform and its software implementation in the MATLAB R2022b programing language.

Keywords


semantic features of speech signals; mel-frequency cepstral coefficients; adaptive spectral analysis; empirical wavelet transform; adaptive wavelet-filters Meyer; functions of internal empirical modes; Hilbert spectral analysis; optimal threshold proces

Full Text:

PDF

References


Boucheron, L. E., De Leon, P. L., & Sandoval, S. Low bit-rate speech coding through quantization of mel-frequency cepstral coefficients. IEEE Transactions on Audio, Speech, and Language Processing, 2012, vol. 20, no. 2, pp. 610-619. DOI: 10.1109/TASL.2011.2162407.

Chai, L., Du, J., Liu, Q., & Lee, C. A Cross-entropy-guided measure (CEGM) for assessing speech recognition performance and optimizing dnn-based speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, vol. 29, no. 1, pp. 106-117. DOI: 10.1109/TASLP.2020.3036783.

Patel, M., Kothari, A., & Koringa, H. A novel approach for semantic segmentation of automatic road network extractions from remote sensing images by modified UNet. Radioelectronic and Computer Systems, 2022, no. 3, pp. 161-173. DOI: 10.32620/reks.2022.3.12.

Bu, S., Zhao, Y., Zhao, T., Wang, S., & Han, M. Modeling speech structure to improve T-F masks for speech enhancement and recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, vol. 30, no. 1, pp. 2705-2715. DOI: 10.1109/TASLP.2022.3196168.

Barannik, V., Sidchenko, S., Barannik, D., Yermachenkov, A., Savchuk, M., & Pris, M. Video images compression method based on floating positional coding with an unequal codograms length. Radioelectronic and Computer Systems, 2023, no. 1, pp. 134-146. DOI: 10.32620/reks.2022.1.11.

Ai, Y., Ling, Z., Wu, W., & Li, A. Denoising-and-dereverberation hierarchical neural vocoder for statistical parametric speech synthesis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, vol. 30, no. 1, pp. 2036-2048. DOI: 10.1109/TASLP.2022.3182268.

Lee, K., & Ellis, D. P. W. Audio-based semantic concept classification for consumer video. IEEE Transactions on Audio, Speech, and Language Processing, 2010, vol. 18, no. 6, pp. 1406-1416. DOI: 10.1109/TASL.2009.2034776.

Luo, M., Wang, D., Wang, X., Qiao, S., & Zhou, Y. Error-diffusion based speech feature quantization for small-footprint keyword spotting. IEEE Signal Processing Letters, 2022, vol. 29, no. 1, pp. 1357-1361. DOI: 10.1109/LSP.2022.3179208.

Karbasi, M., Zeiler, S., & Kolossa, D. Microscopic and blind prediction of speech intelligibility: theory and practice. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, vol. 30, no. 1, pp. 2141-2155. DOI: 10.1109/TASLP.2022.3184888.

Milner, B., & Darch, J. Robust acoustic speech feature prediction from noisy mel-frequency cepstral coefficients. IEEE Transactions on Audio, Speech, and Language Processing, 2011, vol. 19, no. 2, pp. 338-347. DOI: 10.1109/TASL.2010.2047811.

Milner, B., & Shao, X. Prediction of fundamental frequency and voicing from mel-frequency cepstral coefficients for unconstrained speech reconstruction. IEEE Transactions on Audio, Speech, and Language Processing, 2007, vol. 15, no. 1, pp. 24-33. DOI: 10.1109/TASL.2006.876880.

Hazra, S., Ema, R., Galib, S., Kabir, S., & Adnan, N. Emotion recognition of human speech using deep learning method and MFCC features. Radioelectronic and Computer Systems, 2022, no. 4, pp. 161-172. DOI: 10.32620/reks.2022.4.13.

Zhang, Y., & Ling, Z. Extracting and predicting word-level style variations for speech synthesis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, vol. 29, no. 1, pp. 1582-1593. DOI: 10.1109/TASLP.2021.3074757.

Tulyakova, N., & Trofymchuk, O. Adaptive myriad filter with time-varying noise- and signal-dependent parameters. Radioelectronic and Computer Systems, 2022, no. 2, pp. 217-238. DOI: 10.32620/reks.2022.2.17.

Farias, F., & Coelho, R. Blind adaptive mask to improve intelligibility of non-stationary noisy speech. IEEE Signal Processing Letters, 2021, vol. 28, no. 1, pp. 1170-1174. DOI: 10.1109/LSP.2021.3086405.

Rudenko, O., & Bezsonov, O. Adaptive identification under the maximum correntropy criterion with variable center. Radioelectronic and Computer Systems, 2022, no. 1, pp. 216-228. DOI: 10.32620/reks.2022.1.17.

Daubechies, I., Lu, J., & Wu, H-T. Synchro¬squeezed wavelet transforms: An empirical mode decomposition-like tool. Journal of Applied and Computational Harmonic Analysis, 2011, vol. 30, no. 2, pp. 243-261. DOI: 10.1016/j.acha.2010.08.002.

Gilles, J. Empirical Wavelet Transform. IEEE Transactions on Signal Processing, 2013, vol. 61, no. 16, pp. 3999-4010. DOI: 10.1109/TSP.2013.2265222.

Donoho, D. L., Javanmard, A., & Montanari, A. Information-theoretically optimal compressed sensing via spatial coupling and approximate message passing. IEEE Transactions on Information Theory, 2013, vol. 59, no. 11, pp. 7434-7464. DOI: 10.1109/TIT.2013.2274513.

Lavrynenko, O., Konakhovych, G., & Bakhtiiarov, D. Method of voice control functions of the UAV. Proc. IEEE 4th Int. Conf. on Methods and Systems of Navigation and Motion Control (MSNMC), Kyiv, Oct. 18-20, 2016, pp. 47-50. DOI: 10.1109/MSNMC.2016.7783103.

Lavrynenko, O., Odarchenko, R., Konakhovych, G., Taranenko, A., Bakhtiiarov, D., & Dyka, T. Method of semantic coding of speech signals based on empirical wavelet transform. Proc. IEEE 4th Int. Conf. on Advanced Information and Communication Technologies (AICT), Lviv, Sept. 21-25, 2021, pp. 18-22. DOI: 10.1109/AICT52120.2021.9628985.

Odarchenko, R., Lavrynenko, O., Bakhtiiarov, D., Dorozhynskyi, S., Antonov, V., & Zharova, O. Empirical wavelet transform in speech signal compression problems. Proc. IEEE 8th Int. Conf. on Problems of Infocommunications, Science and Technology (PIC S&T), Kharkiv, Oct. 5-7, 2021, pp. 599-602. DOI: 10.1109/PICST54195.2021.9772156.

Veselska, O., Lavrynenko, O., Odarchenko, R., Zaliskyi, M., Bakhtiiarov, D., Karpinski, M., & Rajba, S. A Wavelet-based steganographic method for text hiding in an audio signal. Sensors, 2022, vol. 22, no. 15, pp. 1-25. DOI: 10.3390/s22155832.




DOI: https://doi.org/10.32620/reks.2023.3.09

Refbacks

  • There are currently no refbacks.