Dimitris Fanis Logo Image
Dimitris Fanis

Emotion Recognition in Movie Synopses

Sometimes, movies evoke strong and deep emotions to their viewers, and these emotions may vary. Is the notion of emotion correlated to users' watchlists? This is a Machine Learning project that automatically predicts a set of emotions and then tries to give an answer to that question via statistical analysis.

Project Overview

What makes a movie e.g. suspenseful, dramatic, or sci-fi, and which words hidden in movie plots can predefine this? The answer to that would help extract significant information from narrative texts and build an automatic system that could produce emotional tags.

This project dissertation was submitted in partial fulfillment of requirements for the degree of MSc Information Management at Strathclyde University (August 2020). It was marked with distinction, and it has received citations from other academic papers based on mentioned reports at academia.edu.

The aim was to identify, define, and automatically predict a set of emotions in movies based on their movie abstracts and metadata. This was achieved with a series of steps including data collection and cleaning, descriptive and inferential statistics, data preprocessing, and Machine & Deep Learning tools and models with a core element and focus on Natural Language Processing (NLP).

The problem was treated as a multi-label classification one, and the result was the prediction of emotions in 55,577 unlabelled movies, as well as the identification of correlations between the predicted emotions and users' ratings and preferences.

Overall, various correlation tests indicated a strong relationship between users' watchlist and their respective emotional tags. It was concluded that the notion of emotion can constitute an important feature in the movie industry with regard to recommender systems and advertising companies for generating, finding, and placing a higher level of personalized content.

Tools & Technologies Used

Data Science
Machine Learning
Deep Learning
Pandas
NumPy
Scikit-Learn
TensorFlow
Natural Language Processing (NLP)
Sentiment Analysis / Topic Modeling / Named Entity Recognition (NER)/ Non-Negative Matrix Factorization (NMF)
Feature Engineering
Multi-label Classification
Statistical Analysis
Hypothesis Testing
APIs