| |
| Dublin Core |
PKP Metadata Items |
Metadata for this Document |
| |
| 1. |
Title |
Title of document |
Emotion recognition of human speech using deep learning method and MFCC features |
| |
| 2. |
Creator |
Author's name, affiliation, country |
Sumon Kumar Hazra; Jashore University of Science and Technology; Bangladesh |
| |
| 2. |
Creator |
Author's name, affiliation, country |
Romana Rahman Ema; Jashore University of Science and Technology; Bangladesh |
| |
| 2. |
Creator |
Author's name, affiliation, country |
Syed Md. Galib; Jashore University of Science and Technology; Bangladesh |
| |
| 2. |
Creator |
Author's name, affiliation, country |
Shalauddin Kabir; Jashore University of Science and Technology; Bangladesh |
| |
| 2. |
Creator |
Author's name, affiliation, country |
Nasim Adnan; Jashore University of Science and Technology; Bangladesh |
| |
| 3. |
Subject |
Discipline(s) |
|
| |
| 3. |
Subject |
Keyword(s) |
speech emotion recognition (SER); deep learning method; advanced AI; mel frequency cepstral coefficients (MFCCs); audio data |
| |
| 3. |
Subject |
Subject classification |
004.934.032.26 |
| |
| 4. |
Description |
Abstract |
Subject matter: Speech emotion recognition (SER) is an ongoing interesting research topic. Its purpose is to establish interactions between humans and computers through speech and emotion. To recognize speech emotions, five deep learning models: Convolution Neural Network, Long-Short Term Memory, Artificial Neural Network, Multi-Layer Perceptron, Merged CNN, and LSTM Network (CNN-LSTM) are used in this paper. The Toronto Emotional Speech Set (TESS), Surrey Audio-Visual Expressed Emotion (SAVEE) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) datasets were used for this system. They were trained by merging 3 ways TESS+SAVEE, TESS+RAVDESS, and TESS+SAVEE+RAVDESS. These datasets are numerous audios spoken by both male and female speakers of the English language. This paper classifies seven emotions (sadness, happiness, anger, fear, disgust, neutral, and surprise) that is a challenge to identify seven emotions for both male and female data. Whereas most have worked with male-only or female-only speech and both male-female datasets have found low accuracy in emotion detection tasks. Features need to be extracted by a feature extraction technique to train a deep-learning model on audio data. Mel Frequency Cepstral Coefficients (MFCCs) extract all the necessary features from the audio data for speech emotion classification. After training five models with three datasets, the best accuracy of 84.35 % is achieved by CNN-LSTM with the TESS+SAVEE dataset. |
| |
| 5. |
Publisher |
Organizing agency, location |
Національний аерокосмічний університет "Харківський авіаційний інститут" |
| |
| 6. |
Contributor |
Sponsor(s) |
|
| |
| 7. |
Date |
(YYYY-MM-DD) |
2022-11-29
|
| |
| 8. |
Type |
Status & genre |
Peer-reviewed Article |
| |
| 8. |
Type |
Type |
|
| |
| 9. |
Format |
File format |
PDF |
| |
| 10. |
Identifier |
Uniform Resource Identifier |
https://nti.khai.edu/ojs/index.php/reks/article/view/reks.2022.4.13 |
| |
| 10. |
Identifier |
Digital Object Identifier (DOI) |
https://doi.org/10.32620/reks.2022.4.13 |
| |
| 11. |
Source |
Title; vol., no. (year) |
Radioelectronic and Computer Systems; No 4 (2022): Radioelectronic and Computer Systems |
| |
| 12. |
Language |
English=en |
en |
| |
| 14. |
Coverage |
Geo-spatial location, chronological period, research sample (gender, age, etc.) |
|
| |
| 15. |
Rights |
Copyright and permissions |
Copyright (c) 2022 Sumon Kumar Hazra, Romana Rahman Ema, Syed Md. Galib, Shalauddin Kabir, Nasim Adnan
|