Speech Emotion Detector

View the video rundown here

Detect emotion in an audio clip

Trained on Ryerson Audio-Visual Database of Emotional Speech and Song dataset. Audio clips from the RADVESS dataset are classified as:

emotions={
  '01':'neutral',
  '02':'calm',
  '03':'happy',
  '04':'sad',
  '05':'angry',
  '06':'fearful',
  '07':'disgust',
  '08':'surprised'
}

We will use Machine Learning to classify a subsection of them as either: ('calm', 'happy', 'fearful', 'disgust')

First we use librosa library for analyzing and extracting audio features. Next, we read in the data and train a model. After we evaluate the model we can use it for predicting the emotion of other audio clips.

Note: file_name[6] refers to the gender of the speaker. Might it be a stretch to also predict the gender of a speaker?

We can read in WAV files, preferrably in mono but conversion is handled as well.

Steps: Ask for the upload, check file extension, save audio file, run predicitons and graphs, save graphs, display graphs and predictions.

Github