II.4.2 Measuring Emotions from Facial Expressions

AI is giving us powerful tools to recognize emotions from facial images in five lines of Python programming code, thanks to deep learning and machine learning libraries. After first recognizing faces in a larger picture and putting a bounding box around it, these libraries use deep learning models using Convolutional Neural Networks (CNN). In the final step the emotion of the face is recognized based on a model being trained with thousands of face pictures that have been prelabeled with the emotion being shown on the face. 

Figure 41. Steps of Facial Emotion Recognition

Figure 41 describes the steps necessary to compute the emotions. In the first step, once the picture is taken, the face has to be detected in the picture. This is frequently done using Haar-Cascades, basically sums of rectangles in the image. In the next step the face has to be rotated to align the eyes horizontally, the face is then cropped to the minimal face area necessary to express and recognize an emotion. Afterwards the image is down-sampled to reduce unnecessary information and speed up processing, and the picture intensity is normalized to better recognize the edges in the picture.  In the end a classifier is built using Convolutional Neural Networks with prelabeled face images that have been manually labeled. Mostly we use the six emotions defined by Paul Ekman, namely happy, sad, anger, fear, disgust, and surprise. Additionally, a seventh emotion, neutral, is added, as we found that this increases emotion recognition accuracy. There are many publicly available face-datasets with thousands of faces which are already pre-labelled, for instance the Cohn-Kanade Dataset. Instead of building a single classifier for predicting seven different emotional states, it can be better to build seven classifiers where each is able to predict exactly one emotional state (e.g., happy or not, surprised or not etc.).  In the end only the likelihood that a particular emotion is recognized is being stored, which also fully respects individual privacy, as it is not possible to reconstruct a face from an emotion likelihood. It should be possible with such a system to reach an accuracy of 70 to 80% based on a prelabeled test dataset.

We have used our face emotion recognition system to recognize emotions of the audience in a theater play, emotions of musicians and the audience in a concert, of participants in zoom meetings, and for analyzing the emotions of participants in face-to-face meetings.


Popular Posts