SlideShare a Scribd company logo
Speech Emotion Recognition
Guided by:
Mrs. R.K. Patole
111707049-Pragya Sharma
141807005-Kanchan Itankar
141807008-Saniya Shaikh
141807009-Triveni Vyavahare
Aim
Speech Emotion
Recognition using
Machine Learning
[1] Speech Emotion Recognition using Neural Network and MLP Classifier (Jerry
Joy, Aparna Kannan, Shreya Ram, S. Rama)
● MLP Classifier
● 5 features extracted- MFCC, Contrast, Mel Spectrograph Frequency, Chroma
and Tonnetz
● Accuracy 70.28%
[2]Voice Emotion Recognition using CNN and Decision Tree (Navya Damodar,
Vani H Y, Anusuya M A.)
● Decision Tree , CNN
● MFCCs extracted
● Accuracy 72% CNN, 63% Decision Tree
Literature Review
● To build a model to recognize emotion from speech using the librosa and
sklearn libraries and the RAVDESS dataset.
● To present a classification model of emotion elicited by speeches based on
deep neural networks MLP Classification based on acoustic features such as
Mel Frequency Cepstral Coefficient (MFCC). The model has been trained to
classify eight different emotions (calm, happy, fearful, disgust, angry, neutral,
surprised,sad).
Objective
Applications
Business Marketing
Suicide prevention
Voice Assistant
● As human beings speech is amongst the most natural way to express ourselves. We depend
so much on it that we recognize its importance when resorting to other communication
forms like emails and text messages where we often use emojis to express the emotions
associated with the messages. As emotions play a vital role in communication, the detection
and analysis of the same is of vital importance in today’s digital world of remote
communication.
● Emotion detection is a challenging task, because emotions are subjective. There is no
common consensus on how to measure them. We define a Speech Emotions Recognition
system as a collection of methodologies that process and classify speech signals to detect
emotions embedded in them.
Motivation
● Human machine interaction is widely used nowadays in many applications. One of the medium
of interaction is speech. The main challenges in human machine interaction is detection of
emotion from speech.
● Emotion can play an important role in decision making. Emotion can be detected from different
physiological signal also. If emotion can be recognized properly from speech then a system can
act accordingly. Identification of emotion can be done by extracting the features or different
characteristics from the speech and training needed for a large number of speech database to
make the system accurate.
● An emotional speech RAVDESS dataset is selected then emotion specific features are extracted
from those speeches and finally a MLP classification model is used to recognize the emotions.
Introduction
System Block Diagram
Methodology
Preprocessing Feature Extraction Classification
1.Preprocessing
The removal of unwanted noise signal from the speech.
➢Silent removal
➢Background Noise
removal
➢Windowing
➢Normalization
2.Feature Extraction
● Extract the feature from audio file
● Used to identify How we speak
➢ Pitch
➢ Loudness
➢ Rhythm,etc
Dataset
Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset.
● [3]RAVDESS dataset has recordings of 24 actors, 12 male actors and 12 female
actors, the actors are numbered from 01 to 24 in North American accent.
● All emotional expressions are uttered at two levels of intensity: normal and strong,
except for the ‘neutral’ emotion, it is produced only in normal intensity. Thus, the
portion of the RAVDESS, that we use contains 60 trials for each of the 24 actors,
thus making it 1440 files in total.
[1] Training process workflow
[1] Testing process workflow
3.Classification
● Match the feature with corresponding emotions
Multilayer Perceptron
Multi-Layer Perceptron Classifier
● A multilayer perceptron (MLP) is a class of feedforward
artificial neural network (ANN).
● MLP consists of at least three layers of nodes-input
layer,hidden layer and output layer.
● MLPs are suitable for classification prediction problems
where inputs are assigned a class or label.
Building the MLP Classifier involves the following steps-
1. Initialisation MLP Classifier.
2. Neural Network.
3. Prediction.
4. Accuracy Calculation.
Multi-Layer Perceptron Classifier
Fig:- Multi-Layer Perceptron Classifier
Feature Extraction
From the Audio data we have extracted three key features which have been used in this, namely:
● MFCC (Mel Frequency Cepstral Coefficients)
● Mel Spectrogram
● Chroma
MFCC (Mel Frequency Cepstral Coefficients)
Mel Spectrogram
A Fast Fourier Transform is computed on overlapping windowed segments of the signal,
and we get what is called the spectrogram. This is just a spectrogram that depicts amplitude
which is mapped on a Mel scale.
Chroma
A Chroma vector is typically a 12-element feature vector indicating how much energy of
each pitch class is present in the signal in a standard chromatic scale.
MFCC Chroma
Accuracy
Classification Matrix
Confusion Matrix
1.angry
2.calm
3.disgust
4.fearful
5.happy
6.neutral
7.sad
8.surprised
● The proposed model achieved an accuracy of 66.67%.
● Calm was the best identified emotion.
● The model gets confused between similar emotions like calm-neutral, happy-surprised.
● We tested the model on our own voice file for the sentence “Dogs are sitting by the door” and it
identified the emotion correctly.
Conclusion
Future Work
● The system could take into consideration multiple speakers from different geographic locations
speaking with different accents.
● Though standard feed forward MLP is powerful tool for classification problems, we can use
CNN, RNN models with larger data sets and high computational power machines and compare
between them.
● Study shows that people suffering with autism have difficulty expressing their emotions
explicitly. Image based speech processing in real time can prove to be of great assistance.
References
[1] Jerry Joy, Aparna Kannan, Shreya Ram, S. Rama Speech Emotion Recognition using Neural
Network and MLP Classifier, IJESC, April 2020.
[2]Navya Damodar, Vani H Y, Anusuya M A. Voice Emotion Recognition using CNN and
Decision Tree. International Journal of Innovative Technology and Exploring Engineering
(IJITEE), October 2019.
[3]RAVDESS Dataset: https://zenodo.org/record/1188976#.X5r20ogzZPZ
[4]MLP/CNN/RNN Classification:
https://machinelearningmastery.com/when-to-use-mlp-cnn-and-rnn-neural-networks/
[5]MFCC:https://medium.com/prathena/the-dummys-guide-to-mfcc-aceab2450fd

More Related Content

PPTX
SPEECH BASED EMOTION RECOGNITION USING VOICE
VamshidharSingh
 
PPTX
Emotion recognition
Madhusudhan G
 
PPTX
detect emotion from text
Safayet Hossain
 
PPTX
Lip Reading.pptx
NivethaT15
 
PDF
construction management system final year report
chiragbarasiya
 
PPTX
Drowsiness Detection using machine learning (1).pptx
sathiyasowmi
 
DOCX
Detecting Mental Disorders in social Media through Emotional patterns-The cas...
Shakas Technologies
 
PPTX
EEG Signal processing
DikshaKalra9
 
SPEECH BASED EMOTION RECOGNITION USING VOICE
VamshidharSingh
 
Emotion recognition
Madhusudhan G
 
detect emotion from text
Safayet Hossain
 
Lip Reading.pptx
NivethaT15
 
construction management system final year report
chiragbarasiya
 
Drowsiness Detection using machine learning (1).pptx
sathiyasowmi
 
Detecting Mental Disorders in social Media through Emotional patterns-The cas...
Shakas Technologies
 
EEG Signal processing
DikshaKalra9
 

What's hot (20)

PPTX
Machine learning seminar ppt
RAHUL DANGWAL
 
PPTX
Image captioning
Muhammad Zbeedat
 
PDF
Emotion Speech Recognition - Convolutional Neural Network Capstone Project
Diego Rios
 
PPTX
Neural network & its applications
Ahmed_hashmi
 
PPTX
Histogram Equalization
Kalyan Acharjya
 
PDF
Facial emotion recognition
Rahin Patel
 
PDF
Twitter sentimentanalysis report
Savio Aberneithie
 
PPTX
Attendance Management System using Face Recognition
NanditaDutta4
 
PPTX
Driver drowsiness monitoring system using visual behavior and Machine Learning.
AasimAhmedKhanJawaad
 
PDF
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
PPTX
Deep learning presentation
Tunde Ajose-Ismail
 
PPTX
Facial Emotion Recognition: A Deep Learning approach
AshwinRachha
 
PPTX
Sign language recognizer
Bikash Chandra Karmokar
 
PPTX
Machine learning ppt
Rajat Sharma
 
PDF
Emotion Recognition Based On Audio Speech
IOSR Journals
 
PDF
Deep learning - A Visual Introduction
Lukas Masuch
 
PDF
Deep learning and Healthcare
Thomas da Silva Paula
 
PPTX
Facial Expression Recognition System using Deep Convolutional Neural Networks.
Sandeep Wakchaure
 
PPTX
Attendance system based on face recognition using python by Raihan Sikdar
raihansikdar
 
Machine learning seminar ppt
RAHUL DANGWAL
 
Image captioning
Muhammad Zbeedat
 
Emotion Speech Recognition - Convolutional Neural Network Capstone Project
Diego Rios
 
Neural network & its applications
Ahmed_hashmi
 
Histogram Equalization
Kalyan Acharjya
 
Facial emotion recognition
Rahin Patel
 
Twitter sentimentanalysis report
Savio Aberneithie
 
Attendance Management System using Face Recognition
NanditaDutta4
 
Driver drowsiness monitoring system using visual behavior and Machine Learning.
AasimAhmedKhanJawaad
 
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
Deep learning presentation
Tunde Ajose-Ismail
 
Facial Emotion Recognition: A Deep Learning approach
AshwinRachha
 
Sign language recognizer
Bikash Chandra Karmokar
 
Machine learning ppt
Rajat Sharma
 
Emotion Recognition Based On Audio Speech
IOSR Journals
 
Deep learning - A Visual Introduction
Lukas Masuch
 
Deep learning and Healthcare
Thomas da Silva Paula
 
Facial Expression Recognition System using Deep Convolutional Neural Networks.
Sandeep Wakchaure
 
Attendance system based on face recognition using python by Raihan Sikdar
raihansikdar
 
Ad

Similar to Speech emotion recognition (20)

PDF
Ho3114511454
IJERA Editor
 
PDF
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model
sipij
 
PDF
Signal & Image Processing : An International Journal
sipij
 
PDF
Emotion Recognition through Speech Analysis using various Deep Learning Algor...
IRJET Journal
 
PDF
SPEECH EMOTION RECOGNITION SYSTEM USING RNN
IRJET Journal
 
PDF
Improved speech emotion recognition with Mel frequency magnitude coefficient
mokamojah
 
PDF
Speech Emotion Recognition Using Neural Networks
ijtsrd
 
PDF
A Study to Assess the Effectiveness of Planned Teaching Programme on Knowledg...
ijtsrd
 
PDF
ASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODEL
sipij
 
PDF
ASERS-CNN: Arabic Speech Emotion Recognition System based on CNN Model
sipij
 
PDF
Signal & Image Processing : An International Journal
sipij
 
PPTX
Speech-Emotion-Recognition-with-Transformers.pptx
tharunvenkat21
 
PPTX
Emotion Recognition in AI: Techniques, Applications, and Future Directions
Data & Analytics Magazin
 
PDF
H010215561
IOSR Journals
 
PPTX
Seminar pfgguuuytyyuuyyyyggpt Smruti.pptx
Mm071
 
PPTX
speech emirjopjkfsnakfnkjsdnsdjdnknfksdnknj
TauqeerUddin
 
PPTX
SPEECH EMOTION RECOGNITION SYSTEM (1).pptx
ManimegalaM3
 
PPTX
Emotions detection voice using ai ml Project-PPT(1)12.pptx
devaragudidinesh
 
PDF
A017410108
IOSR Journals
 
PDF
A017410108
IOSR Journals
 
Ho3114511454
IJERA Editor
 
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model
sipij
 
Signal & Image Processing : An International Journal
sipij
 
Emotion Recognition through Speech Analysis using various Deep Learning Algor...
IRJET Journal
 
SPEECH EMOTION RECOGNITION SYSTEM USING RNN
IRJET Journal
 
Improved speech emotion recognition with Mel frequency magnitude coefficient
mokamojah
 
Speech Emotion Recognition Using Neural Networks
ijtsrd
 
A Study to Assess the Effectiveness of Planned Teaching Programme on Knowledg...
ijtsrd
 
ASERS-CNN: ARABIC SPEECH EMOTION RECOGNITION SYSTEM BASED ON CNN MODEL
sipij
 
ASERS-CNN: Arabic Speech Emotion Recognition System based on CNN Model
sipij
 
Signal & Image Processing : An International Journal
sipij
 
Speech-Emotion-Recognition-with-Transformers.pptx
tharunvenkat21
 
Emotion Recognition in AI: Techniques, Applications, and Future Directions
Data & Analytics Magazin
 
H010215561
IOSR Journals
 
Seminar pfgguuuytyyuuyyyyggpt Smruti.pptx
Mm071
 
speech emirjopjkfsnakfnkjsdnsdjdnknfksdnknj
TauqeerUddin
 
SPEECH EMOTION RECOGNITION SYSTEM (1).pptx
ManimegalaM3
 
Emotions detection voice using ai ml Project-PPT(1)12.pptx
devaragudidinesh
 
A017410108
IOSR Journals
 
A017410108
IOSR Journals
 
Ad

Recently uploaded (20)

PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
The Future of Artificial Intelligence (AI)
Mukul
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 

Speech emotion recognition

  • 1. Speech Emotion Recognition Guided by: Mrs. R.K. Patole 111707049-Pragya Sharma 141807005-Kanchan Itankar 141807008-Saniya Shaikh 141807009-Triveni Vyavahare
  • 3. [1] Speech Emotion Recognition using Neural Network and MLP Classifier (Jerry Joy, Aparna Kannan, Shreya Ram, S. Rama) ● MLP Classifier ● 5 features extracted- MFCC, Contrast, Mel Spectrograph Frequency, Chroma and Tonnetz ● Accuracy 70.28% [2]Voice Emotion Recognition using CNN and Decision Tree (Navya Damodar, Vani H Y, Anusuya M A.) ● Decision Tree , CNN ● MFCCs extracted ● Accuracy 72% CNN, 63% Decision Tree Literature Review
  • 4. ● To build a model to recognize emotion from speech using the librosa and sklearn libraries and the RAVDESS dataset. ● To present a classification model of emotion elicited by speeches based on deep neural networks MLP Classification based on acoustic features such as Mel Frequency Cepstral Coefficient (MFCC). The model has been trained to classify eight different emotions (calm, happy, fearful, disgust, angry, neutral, surprised,sad). Objective
  • 6. ● As human beings speech is amongst the most natural way to express ourselves. We depend so much on it that we recognize its importance when resorting to other communication forms like emails and text messages where we often use emojis to express the emotions associated with the messages. As emotions play a vital role in communication, the detection and analysis of the same is of vital importance in today’s digital world of remote communication. ● Emotion detection is a challenging task, because emotions are subjective. There is no common consensus on how to measure them. We define a Speech Emotions Recognition system as a collection of methodologies that process and classify speech signals to detect emotions embedded in them. Motivation
  • 7. ● Human machine interaction is widely used nowadays in many applications. One of the medium of interaction is speech. The main challenges in human machine interaction is detection of emotion from speech. ● Emotion can play an important role in decision making. Emotion can be detected from different physiological signal also. If emotion can be recognized properly from speech then a system can act accordingly. Identification of emotion can be done by extracting the features or different characteristics from the speech and training needed for a large number of speech database to make the system accurate. ● An emotional speech RAVDESS dataset is selected then emotion specific features are extracted from those speeches and finally a MLP classification model is used to recognize the emotions. Introduction
  • 10. 1.Preprocessing The removal of unwanted noise signal from the speech. ➢Silent removal ➢Background Noise removal ➢Windowing ➢Normalization
  • 11. 2.Feature Extraction ● Extract the feature from audio file ● Used to identify How we speak ➢ Pitch ➢ Loudness ➢ Rhythm,etc
  • 12. Dataset Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset. ● [3]RAVDESS dataset has recordings of 24 actors, 12 male actors and 12 female actors, the actors are numbered from 01 to 24 in North American accent. ● All emotional expressions are uttered at two levels of intensity: normal and strong, except for the ‘neutral’ emotion, it is produced only in normal intensity. Thus, the portion of the RAVDESS, that we use contains 60 trials for each of the 24 actors, thus making it 1440 files in total.
  • 15. 3.Classification ● Match the feature with corresponding emotions Multilayer Perceptron
  • 16. Multi-Layer Perceptron Classifier ● A multilayer perceptron (MLP) is a class of feedforward artificial neural network (ANN). ● MLP consists of at least three layers of nodes-input layer,hidden layer and output layer. ● MLPs are suitable for classification prediction problems where inputs are assigned a class or label.
  • 17. Building the MLP Classifier involves the following steps- 1. Initialisation MLP Classifier. 2. Neural Network. 3. Prediction. 4. Accuracy Calculation.
  • 18. Multi-Layer Perceptron Classifier Fig:- Multi-Layer Perceptron Classifier
  • 19. Feature Extraction From the Audio data we have extracted three key features which have been used in this, namely: ● MFCC (Mel Frequency Cepstral Coefficients) ● Mel Spectrogram ● Chroma
  • 20. MFCC (Mel Frequency Cepstral Coefficients)
  • 21. Mel Spectrogram A Fast Fourier Transform is computed on overlapping windowed segments of the signal, and we get what is called the spectrogram. This is just a spectrogram that depicts amplitude which is mapped on a Mel scale. Chroma A Chroma vector is typically a 12-element feature vector indicating how much energy of each pitch class is present in the signal in a standard chromatic scale.
  • 26. ● The proposed model achieved an accuracy of 66.67%. ● Calm was the best identified emotion. ● The model gets confused between similar emotions like calm-neutral, happy-surprised. ● We tested the model on our own voice file for the sentence “Dogs are sitting by the door” and it identified the emotion correctly. Conclusion
  • 27. Future Work ● The system could take into consideration multiple speakers from different geographic locations speaking with different accents. ● Though standard feed forward MLP is powerful tool for classification problems, we can use CNN, RNN models with larger data sets and high computational power machines and compare between them. ● Study shows that people suffering with autism have difficulty expressing their emotions explicitly. Image based speech processing in real time can prove to be of great assistance.
  • 28. References [1] Jerry Joy, Aparna Kannan, Shreya Ram, S. Rama Speech Emotion Recognition using Neural Network and MLP Classifier, IJESC, April 2020. [2]Navya Damodar, Vani H Y, Anusuya M A. Voice Emotion Recognition using CNN and Decision Tree. International Journal of Innovative Technology and Exploring Engineering (IJITEE), October 2019. [3]RAVDESS Dataset: https://zenodo.org/record/1188976#.X5r20ogzZPZ [4]MLP/CNN/RNN Classification: https://machinelearningmastery.com/when-to-use-mlp-cnn-and-rnn-neural-networks/ [5]MFCC:https://medium.com/prathena/the-dummys-guide-to-mfcc-aceab2450fd