Reading Tools

Indexing metadata

Emotion recognition of human speech using deep learning method and MFCC features


Dublin Core		PKP Metadata Items	Metadata for this Document

1.	Title	Title of document	Emotion recognition of human speech using deep learning method and MFCC features

2.	Creator	Author's name, affiliation, country	Sumon Kumar Hazra; Jashore University of Science and Technology; Bangladesh

2.	Creator	Author's name, affiliation, country	Romana Rahman Ema; Jashore University of Science and Technology; Bangladesh

2.	Creator	Author's name, affiliation, country	Syed Md. Galib; Jashore University of Science and Technology; Bangladesh

2.	Creator	Author's name, affiliation, country	Shalauddin Kabir; Jashore University of Science and Technology; Bangladesh

2.	Creator	Author's name, affiliation, country	Nasim Adnan; Jashore University of Science and Technology; Bangladesh

3.	Subject	Discipline(s)

3.	Subject	Keyword(s)	speech emotion recognition (SER); deep learning method; advanced AI; mel frequency cepstral coefficients (MFCCs); audio data

3.	Subject	Subject classification	004.934.032.26

4.	Description	Abstract	Subject matter: Speech emotion recognition (SER) is an ongoing interesting research topic. Its purpose is to establish interactions between humans and computers through speech and emotion. To recognize speech emotions, five deep learning models: Convolution Neural Network, Long-Short Term Memory, Artificial Neural Network, Multi-Layer Perceptron, Merged CNN, and LSTM Network (CNN-LSTM) are used in this paper. The Toronto Emotional Speech Set (TESS), Surrey Audio-Visual Expressed Emotion (SAVEE) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) datasets were used for this system. They were trained by merging 3 ways TESS+SAVEE, TESS+RAVDESS, and TESS+SAVEE+RAVDESS. These datasets are numerous audios spoken by both male and female speakers of the English language. This paper classifies seven emotions (sadness, happiness, anger, fear, disgust, neutral, and surprise) that is a challenge to identify seven emotions for both male and female data. Whereas most have worked with male-only or female-only speech and both male-female datasets have found low accuracy in emotion detection tasks. Features need to be extracted by a feature extraction technique to train a deep-learning model on audio data. Mel Frequency Cepstral Coefficients (MFCCs) extract all the necessary features from the audio data for speech emotion classification. After training five models with three datasets, the best accuracy of 84.35 % is achieved by CNN-LSTM with the TESS+SAVEE dataset.

5.	Publisher	Organizing agency, location	Національний аерокосмічний університет "Харківський авіаційний інститут"

6.	Contributor	Sponsor(s)

7.	Date	(YYYY-MM-DD)	2022-11-29

8.	Type	Status & genre	Peer-reviewed Article

8.	Type	Type

9.	Format	File format	PDF

10.	Identifier	Uniform Resource Identifier	https://nti.khai.edu/ojs/index.php/reks/article/view/reks.2022.4.13

10.	Identifier	Digital Object Identifier (DOI)	https://doi.org/10.32620/reks.2022.4.13

11.	Source	Title; vol., no. (year)	Radioelectronic and Computer Systems; No 4 (2022): Radioelectronic and Computer Systems

12.	Language	English=en	en

14.	Coverage	Geo-spatial location, chronological period, research sample (gender, age, etc.)

15.	Rights	Copyright and permissions	Copyright (c) 2022 Sumon Kumar Hazra, Romana Rahman Ema, Syed Md. Galib, Shalauddin Kabir, Nasim Adnan

reks Reading Tools

Indexing metadata

Emotion recognition of human speech using deep learning method and MFCC features