Automatic Detection of Emotions in Text

Our research is the first attempt to offer a solution for detecting emotions in Hungarian texts. In general, emotion analysis is mostly popular in behavioral sciences and psychology, however, in the recent years it also started to spread in the field of NLP (Natural Language Processing).

 The background

It is important to make a distinction between the widely used sentiment analysis and emotion analysis. Emotion analysis aims to extract emotional states from a given text. Detecting emotions is extremely hard, they come and go so quickly and they are usually associated with extra-linguistic clues such as facial expressions, tone and etc.

In the Internet Era, it is becoming more and more important to analyze and extract emotions from texts, not just because it is uniquely fascinating and challenging to NLP experts but also because it is becoming strikingly important in the field of economy, if for example we would like to measure customer satisfaction.

Our research group hypothesizes that words with emotional meaning or content should be the best markers of the speaker’s/writer’s emotional intent, so we have constructed a Hungarian Emotion Dictionary. The dictionary consists of sub-dictionaries, each based on Ekman’s six basic emotions, namely sadness, anger, disgust, fear, surprise and joy. Our team manually annotated several blog posts and their comments to test the efficiency of using our dictionaries for emotion analysis.

How can we use emotion analysis?

During the local elections in 2014, we analyzed Hungarian tweets related mayoral candidates in Budapest. We found that anger is the best predictor of winning! We were surprised, since most studies (like this classic from Bollen et al. found number of mentions and/or positive sentiment the best factors of success. The number of Hungarian Twitter users is very small, and less fine-grained solutions like sentiment analysis or the frequency of mentions could give us a bad picture since most of the tweets were neutral, and mentions of small party candidates were very rare.  So, we analyzed tweets by our emotion dictionaries and gave each candidate an emotion score that reflects the relative proportion of each emotion in tweets mentioning him/her. From the six basic emotions, it was the mean square error of anger which were in accord with the results of opinion polls and later the final outcome of the election.

The Economist’s R-word index is one of the most well-known indicator of the economy. It is so simple, as it depicts the frequency of the term “recession” in the Wall Street Journal and in the Financial Times, yet it is mostly accurate. We created a corpus, or a collection of articles from various news sites and blogs. We found no correlation between the frequency of “recession” and its Hungarian synonyms and the GDP. However, the level of fear and anger are usually increasing before the GDP starts to decline.