A method for extracting the semantic features of speech signal recognition based on empirical wavelet transform
Abstract
Keywords
Full Text:
PDFReferences
Boucheron, L. E., De Leon, P. L., & Sandoval, S. Low bit-rate speech coding through quantization of mel-frequency cepstral coefficients. IEEE Transactions on Audio, Speech, and Language Processing, 2012, vol. 20, no. 2, pp. 610-619. DOI: 10.1109/TASL.2011.2162407.
Chai, L., Du, J., Liu, Q., & Lee, C. A Cross-entropy-guided measure (CEGM) for assessing speech recognition performance and optimizing dnn-based speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, vol. 29, no. 1, pp. 106-117. DOI: 10.1109/TASLP.2020.3036783.
Patel, M., Kothari, A., & Koringa, H. A novel approach for semantic segmentation of automatic road network extractions from remote sensing images by modified UNet. Radioelectronic and Computer Systems, 2022, no. 3, pp. 161-173. DOI: 10.32620/reks.2022.3.12.
Bu, S., Zhao, Y., Zhao, T., Wang, S., & Han, M. Modeling speech structure to improve T-F masks for speech enhancement and recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, vol. 30, no. 1, pp. 2705-2715. DOI: 10.1109/TASLP.2022.3196168.
Barannik, V., Sidchenko, S., Barannik, D., Yermachenkov, A., Savchuk, M., & Pris, M. Video images compression method based on floating positional coding with an unequal codograms length. Radioelectronic and Computer Systems, 2023, no. 1, pp. 134-146. DOI: 10.32620/reks.2022.1.11.
Ai, Y., Ling, Z., Wu, W., & Li, A. Denoising-and-dereverberation hierarchical neural vocoder for statistical parametric speech synthesis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, vol. 30, no. 1, pp. 2036-2048. DOI: 10.1109/TASLP.2022.3182268.
Lee, K., & Ellis, D. P. W. Audio-based semantic concept classification for consumer video. IEEE Transactions on Audio, Speech, and Language Processing, 2010, vol. 18, no. 6, pp. 1406-1416. DOI: 10.1109/TASL.2009.2034776.
Luo, M., Wang, D., Wang, X., Qiao, S., & Zhou, Y. Error-diffusion based speech feature quantization for small-footprint keyword spotting. IEEE Signal Processing Letters, 2022, vol. 29, no. 1, pp. 1357-1361. DOI: 10.1109/LSP.2022.3179208.
Karbasi, M., Zeiler, S., & Kolossa, D. Microscopic and blind prediction of speech intelligibility: theory and practice. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, vol. 30, no. 1, pp. 2141-2155. DOI: 10.1109/TASLP.2022.3184888.
Milner, B., & Darch, J. Robust acoustic speech feature prediction from noisy mel-frequency cepstral coefficients. IEEE Transactions on Audio, Speech, and Language Processing, 2011, vol. 19, no. 2, pp. 338-347. DOI: 10.1109/TASL.2010.2047811.
Milner, B., & Shao, X. Prediction of fundamental frequency and voicing from mel-frequency cepstral coefficients for unconstrained speech reconstruction. IEEE Transactions on Audio, Speech, and Language Processing, 2007, vol. 15, no. 1, pp. 24-33. DOI: 10.1109/TASL.2006.876880.
Hazra, S., Ema, R., Galib, S., Kabir, S., & Adnan, N. Emotion recognition of human speech using deep learning method and MFCC features. Radioelectronic and Computer Systems, 2022, no. 4, pp. 161-172. DOI: 10.32620/reks.2022.4.13.
Zhang, Y., & Ling, Z. Extracting and predicting word-level style variations for speech synthesis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, vol. 29, no. 1, pp. 1582-1593. DOI: 10.1109/TASLP.2021.3074757.
Tulyakova, N., & Trofymchuk, O. Adaptive myriad filter with time-varying noise- and signal-dependent parameters. Radioelectronic and Computer Systems, 2022, no. 2, pp. 217-238. DOI: 10.32620/reks.2022.2.17.
Farias, F., & Coelho, R. Blind adaptive mask to improve intelligibility of non-stationary noisy speech. IEEE Signal Processing Letters, 2021, vol. 28, no. 1, pp. 1170-1174. DOI: 10.1109/LSP.2021.3086405.
Rudenko, O., & Bezsonov, O. Adaptive identification under the maximum correntropy criterion with variable center. Radioelectronic and Computer Systems, 2022, no. 1, pp. 216-228. DOI: 10.32620/reks.2022.1.17.
Daubechies, I., Lu, J., & Wu, H-T. Synchro¬squeezed wavelet transforms: An empirical mode decomposition-like tool. Journal of Applied and Computational Harmonic Analysis, 2011, vol. 30, no. 2, pp. 243-261. DOI: 10.1016/j.acha.2010.08.002.
Gilles, J. Empirical Wavelet Transform. IEEE Transactions on Signal Processing, 2013, vol. 61, no. 16, pp. 3999-4010. DOI: 10.1109/TSP.2013.2265222.
Donoho, D. L., Javanmard, A., & Montanari, A. Information-theoretically optimal compressed sensing via spatial coupling and approximate message passing. IEEE Transactions on Information Theory, 2013, vol. 59, no. 11, pp. 7434-7464. DOI: 10.1109/TIT.2013.2274513.
Lavrynenko, O., Konakhovych, G., & Bakhtiiarov, D. Method of voice control functions of the UAV. Proc. IEEE 4th Int. Conf. on Methods and Systems of Navigation and Motion Control (MSNMC), Kyiv, Oct. 18-20, 2016, pp. 47-50. DOI: 10.1109/MSNMC.2016.7783103.
Lavrynenko, O., Odarchenko, R., Konakhovych, G., Taranenko, A., Bakhtiiarov, D., & Dyka, T. Method of semantic coding of speech signals based on empirical wavelet transform. Proc. IEEE 4th Int. Conf. on Advanced Information and Communication Technologies (AICT), Lviv, Sept. 21-25, 2021, pp. 18-22. DOI: 10.1109/AICT52120.2021.9628985.
Odarchenko, R., Lavrynenko, O., Bakhtiiarov, D., Dorozhynskyi, S., Antonov, V., & Zharova, O. Empirical wavelet transform in speech signal compression problems. Proc. IEEE 8th Int. Conf. on Problems of Infocommunications, Science and Technology (PIC S&T), Kharkiv, Oct. 5-7, 2021, pp. 599-602. DOI: 10.1109/PICST54195.2021.9772156.
Veselska, O., Lavrynenko, O., Odarchenko, R., Zaliskyi, M., Bakhtiiarov, D., Karpinski, M., & Rajba, S. A Wavelet-based steganographic method for text hiding in an audio signal. Sensors, 2022, vol. 22, no. 15, pp. 1-25. DOI: 10.3390/s22155832.
DOI: https://doi.org/10.32620/reks.2023.3.09
Refbacks
- There are currently no refbacks.