By: Akshay Shetye, Shruti Chavan, Akshay Parab, Kaushik Patil, and Sumitra Kulkarni
The study “faces and speech recognition” underscores the critical role of speech emotion recognition (SER) and its diverse applications across various fields, such as medicine, human-computer interaction, and customer service. SER has gained significant importance in cognitive psychology due to its potential to enhance user experience, improve patient care, and optimize the interaction between users and products. The process involves the identification and extraction of essential features from speech signals, which are key to accurately recognizing emotional states. The study further explores a range of classification algorithms that are employed to categorize these features, showing a transition from traditional AI techniques, like voice and energy-based analysis, to more advanced deep learning methods. These modern approaches utilize big data and neural network architectures to significantly improve the accuracy, reliability, and robustness of speech emotion recognition systems. Additionally, review articles in this field often draw upon data from SER research, offering valuable insights into the challenges and intricacies involved in the composition and implementation of such systems. The study traces the evolution of SER technology, emphasizing the benefits of deep learning, particularly its ability to learn directly from raw data, as well as the challenges posed by factors such as large log files and the need to accommodate various devices. Comprehensive reviews serve as crucial resources for researchers, practitioners, and policymakers, enabling a deeper understanding of the current state of SER technologies, including their strengths, limitations, and areas for future development. These insights are pivotal in driving the field forward, facilitating the development of innovative applications, and maximizing the potential of SER technologies in real-world scenarios.
Keywords: Feature extraction, machine learning, classification algorithm, natural language processing, speech emotion recognition
Citation:
Refrences:
- Kamińska D, Sapiński T, Anbarjafari G. Efficiency of chosen speech descriptors in relation to emotion recognition. EURASIP J Audio Speech Music Process. 2017;2017:1–9. doi:10.1186/s13636-017-0100-x.
- Avots E, Sapiński T, Bachmann M, Kamińska D. Audiovisual emotion recognition in wild. Machine Vision and Applications. 2019;30(5):975–985. doi:10.1007/s00138-018-0960-9.
- Baishya R. Unique solution of unpolarized evolution equations. Int J Res Appl Sci Eng Technol. 2020;8(4):499–509.
- Poria S, Cambria E, Bajpai R, Hussain A. A review of affective computing: From unimodal analysis to multimodal fusion. Inform fusion. 2017;37:98–125. doi:10.1016/j.inffus.2017.02.003.
- Caliskan A, Bryson JJ, Narayanan A. Semantics derived automatically from language corpora contain human-like biases. Sci. 2017;356(6334):183–186. doi:10.1126/science.aal4230.
- Cho J, Pappagari R, Kulkarni P, Villalba J, Carmiel Y, Dehak N. Deep neural networks for emotion recognition combining audio and transcripts. Interspeech. 2018;247–251.
- Zheng L, Li Q, Ban H, Liu S. Speech emotion recognition based on convolution neural network combined with random forest. Chinese Control and Decision Conference (CCDC). Shenyang, China: 2018, Jun 9–11. 4143–4147. IEEE. doi:10.1109/CCDC.2018.8407844.
- Weißkirchen N, Bock R, Wendemuth A. Recognition of emotional speech with convolutional neural networks by means of spectral estimates. Seventh International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW). San Antonio, TX, USA: 2017, Oct 23–26. 50–55. IEEE. doi:10.1109/ACIIW.2017.8272585.
- Pandey SK, Shekhawat HS, Prasanna SM. Deep learning techniques for speech emotion recognition: A review. International Conference Radioelektronika (RADIOELEKTRONIKA). Pardubice, Czech Republic: 2019, Apr 16–18. 1–6. IEEE. doi:10.1109/RADIOELEK.2019.8733432.
- Liu Y, Zhou M, Cao H, Liu H. Speech emotion recognition based on deep learning: A comprehensive survey. IEEE Trans Affect Comput. 2023;1–1.
- Trinh Van L, Dao Thi Le T, Le Xuan T, Castelli E. Emotional speech recognition using deep neural networks. Sensors. 2022;22(4):1414. doi:10.3390/s22041414.
- Lieskovská E, Jakubec M, Jarina R, Chmulík M. A review on speech emotion recognition using deep learning and attention mechanism. 2021;10(10):1163. doi:10.3390/electronics10101163.
- Wani TM, Gunawan TS, Qadri SA, Kartiwi M, Ambikairajah E. A comprehensive review of speech emotion recognition systems. IEEE Access. 2021;9:47795–47814. doi:1109/ACCESS.2021.3068045.
- Pan J, Fang W, Zhang Z, Chen B, Zhang Z, Wang S. Multimodal emotion recognition based on facial expressions, speech, and EEG. IEEE Open J Eng Med Biol. 2023;5:396–403. doi:10.1109/OJEMB.2023.3240280.
- Han J, Zhang Z, Pantic M, Schuller B. Internet of emotional people: Towards continual affective computing cross cultures via audiovisual signals. Future Gener Comput Syst. 2021;114:294–306. doi:10.1016/j.future.2020.08.002.