Emotion recognition of human speech using deep learning method and MFCC features
Abstract
Keywords
Full Text:
PDFReferences
Lin, Y. L. and Wei, G. Speech emotion recognition based on HMM and SVM. 2005 International Conference on Machine Learning and Cybernetics (ICMLC), 2005, vol. 8, pp. 4898-4901. DOI: 10.1109/ICMLC.2005.1527805.
Li, M., Yang, B., Levy, J., Stolcke, A., Rozgic, V., Matsoukas, S., Papayiannis, C., Bone, D. and Wang, C. Contrastive unsupervised learning for speech emotion recognition. ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6329-6333. DOI: 10.1109/ICASSP39728.2021.9413910.
Zhou, K., Sisman, B., Liu, R. and Li, H. Seen and unseen emotional style transfer for voice conversion with a new emotional speech dataset. ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 920-924). DOI: 10.1109/ICASSP39728.2021.9413391.
Pepino, L., Riera, P. and Ferrer, L. Emotion recognition from speech using wav2vec 2.0 embeddings. arXiv, 2021, vol. abs.2104.03502. DOI: 10.48550/arXiv.2104.03502.
Tripathi, S., Kumar, A., Ramesh, A., Singh, C. and Yenigalla, P. Deep learning-based emotion recognition system using speech features and transcriptions. arXiv, 2019, vol. abs.1906.05681. DOI: 10.48550/arXiv.1906.05681.
Abdel-Hamid, O., Mohamed, A. R., Jiang, H., Deng, L., Penn, G. and Yu, D. Convolutional neural networks for speech recognition. IEEE/ACM Transactions on audio, speech, and language processing, 2014, vol. 22, iss. 10, pp. 1533-1545. DOI: 10.1109/TASLP.2014.2339736.
Yoon, S., Byun, S., Dey, S. and Jung, K. Speech emotion recognition using a multi-hop attention mechanism. ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 2822-2826. DOI: 10.1109/ICASSP.2019.8683483.
Lim, W., Jang, D. and Lee, T. Speech emotion recognition using convolutional and recurrent neural networks. 2016 Asia-Pacific signal and information processing association annual summit and conference (APSIPA), 2016, pp. 1-4. DOI: 10.1109/APSIPA.2016.7820699.
Dolka, H., VM, A. X. and Juliet, S. Speech emotion recognition using ANN on MFCC features. 2021 3rd International Conference on Signal Processing and Communication (ICPSC), 2021, pp. 431-435. DOI: 10.1109/ICSPC51351.2021.9451810.
Atmaja, B. T., Shirai, K. and Akagi, M. Speech emotion recognition using speech features and word embedding. 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2019, pp. 519-523. DOI: 10.1109/APSIPAASC47483.2019.9023098.
Lian, Z., Tao, J., Liu, B., Huang, J., Yang, Z. and Li, R. Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition. Interspeech, 2020, pp. 394-398. DOI: 10.21437/Interspeech.2020-1705.
Dash, A. K., Pradhan, R., Rout, J. K. and Ray, N. K. A constructive model for sentiment analysis of speech using deep learning. 2018 International Conference on Information Technology (ICIT), 2018, pp. 1-6. DOI: 10.1109/ICIT.2018.00013.
Yoon, S., Byun, S. and Jung, K. Multimodal speech emotion recognition using audio and text. 2018 IEEE Spoken Language Technology Workshop (SLT), 2018, pp. 112-118. DOI: 10.1109/SLT.2018.8639583.
Zielonka, M., Piastowski, A., Czyżewski, A., Nadachowski, P., Operlejn, M. and Kaczor, K. Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets. Electronics, 2022, vol. 11, iss. 22, article no. 3831. DOI: 10.3390/electronics11223831.
Yaloveha, V., Podorozhniak, A. and Kuchuk, H. Convolutional neural network hyperparameter optimization applied to land cover classification. Radioelectronic and Computer Systems, 2022, no. 1, pp. 115-128. DOI: 10.32620/reks.2022.1.09.
Mono vs Stereo: The Complete Guide. Available at: https://www.hifireport.com/mono-vs-stereo-the-complete-guide/ (accessed Oct. 21, 2022).
Selvaraj, M., Bhuvana, R. and Padmaja, S. Human speech emotion recognition. International Journal of Engineering & Technology, 2016, vol. 8, no. 1, pp. 311-323. Available at: https://www.enggjournals.com/ijet/docs/IJET16-08-01-090.pdf. (accessed Oct. 01, 2022).
Chakroborty, S., Roy, A. and Saha, G. Fusion of a complementary feature set with MFCC for improved closed set text-independent speaker identification. 2006 IEEE International Conference on Industrial Technology, 2006, pp. 387-390. DOI: 10.1109/ICIT.2006.372388.
Varshini, P., Soundarya, R. et al. Speech Emotion Analyzer. International Journal of Research Publication and Reviews, 2021, vol 2, Iss. 8, pp. 1026-1034. Available at: https://www.ijrpr.com/uploads/V2ISSUE8/IJRPR1095.pdf. (accessed Oct. 01, 2022).
Dua, S., Kumar, S. S., Albagory, Y., Ramalingam, R., Dumka, A., Singh, R., Rashid, M., Gehlot, A., Alshamrani, S. S. and AlGhamdi, A. S. Developing a Speech Recognition System for Recognizing Tonal Speech Signals Using a Convolutional Neural Network. Applied Sciences, 2022, vol. 12, iss. 12, article no. 6223. DOI: 10.3390/app12126223.
Trinh Van, L., Dao Thi Le, T., Le Xuan, T. and Castelli, E. Emotional Speech Recognition Using Deep Neural Networks. Sensors, 2022, vol. 22, iss. 4, article no. 1414. DOI: 10.3390/s22041414.
Understanding Architecture of LSTM. Available at: https://www.analyticsvidhya.com/blog/2021/01/understanding-architecture-of-lstm/ (accessed Oct. 21, 2022).
Long short-term memory. Available at: https://en.wikipedia.org/wiki/Long_short-term_memory (accessed Oct. 21, 2022).
Artificial Neural Network Tutorial. Available at: https://www.javatpoint.com/artificial-neural-network (accessed Oct. 21, 2022).
Sahu, G. Multimodal speech emotion recognition and ambiguity resolution. arXiv, 2019, vol. abs.1904.06022. DOI: 10.48550/arXiv.1904.06022.
Livieris, I. E., Pintelas, E. and Pintelas, P. A CNN–LSTM model for gold price time-series forecasting. Neural computing and applications, 2020, vol. 32, iss. 23, pp.17351-17360. DOI: 10.1007/s00521-020-04867-x.
Toronto emotional speech set (TESS). Available at: https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess (accessed Oct. 21, 2022).
Surrey Audio-Visual Expressed Emotion (SAVEE). Available at: https://www.kaggle.com/datasets/ejlok1/surrey-audiovisual-expressed-emotion-savee (accessed Oct. 21, 2022).
RAVDESS Emotional speech audio. Available at: https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio (accessed Oct. 21, 2022).
Pseudocode for the automated artificial neural network algorithm to produce the trained network library. Available at: https://www.researchgate.net/figure/Pseudocode-for-the-automated-artificial-neural-network-algorithm-to-produce-the-trained_fig1_325677157 (accessed Oct. 21, 2022).
DOI: https://doi.org/10.32620/reks.2022.4.13
Refbacks
- There are currently no refbacks.