New studies have found that sarcastic people are the smartest, and even more successful than other people. However, regardless of how accurate these findings are, sarcasm represents an interesting and a valid way of interaction and communication between people, and even sometimes, between people from different countries and cultures. Sarcasm is the use of irony to mock something or somebody, or to show contempt. But how does sentiment analysis factor in?
In simpler words, sarcasm is when someone writes a certain thing, but means exactly the opposite of what he wrote. People use sarcasm in almost every social network, especially Twitter. That’s why it makes the job harder for social media monitoring tools when performing accurate sentiment analysis. So is it possible to measure the sentiment behind sarcasm? And if yes, is it neutral, positive or negative?
Before getting into the details of measuring sarcasm and understanding how sentiment analysis is applied, we need to take a step back. Let’s first introduce the science of Natural Language Processing (NLP), to explain how computers receive and interact with human languages.
Sentiment Analysis and Natural Language Processing
To begin with, NLP is a component of Artificial Intelligence (AI) that studies the interaction between computers and natural languages, which are all languages that humans generate. In other words, it is concerned with the ability of computer programs to process and make sense of human languages. It is a challenging field because, in the past, computers received human’s speech through a highly-structured programming language, like: Java, Ruby and C. In that case, computers found it easier to implement and define sentiment analysis on that given speech. Nevertheless, NLP is trying to make them understand the normal human speech. The problem is that it’s ambiguous and can carry different meanings. That depends on many variables that include; the context, and regional dialects.
What is natural language processing? @rhwolniewicz sums it up.https://t.co/F6tHz3kHOF#NLP #BigData #Linguistics
— TM7.NL (@TM7NL) October 19, 2014
NLP’s most current approaches and newest algorithms are based on Machine Learning (ML), which is also a type of Artificial Intelligence, but studies patterns in data, and uses them to enhance the understanding of the computer program. So how does Machine Learning differ from NLP? And how they help in analyzing human languages and performing sentiment analysis? Simply, the following points can explain the difference. NLP aims to build systems or programs that are able to understand language. Machine learning aims to build systems or programs that are able to learn from pattern recognition and experience. Together, NLP and ML, aim to create systems or programs that can learn how to make sense and process language.
Sarcasm and Sentiment Analysis
So, how can we link all of this to sarcasm and defining its sentiment analysis? Sarcasm is becoming a language in itself, and causing certain words, and sometimes, whole phrases, to lose their literal meaning. That’s because people rarely use them in any other context but sarcastic. So, people usually consider it to be a form of criticism, yet in a politer version. It’s like a gentler insult, as described by some language experts. Furthermore, it has two contradictory qualities that often make it harder for people to fully-understand the meaning behind it: sarcasm is funny and mean. As people perceive sarcasm differently, one statement might seem positive to someone, but negative to another. So imagine how hard it will be for computers to decide and accurately perform a sentiment analysis. One more important question to think about is, is it possible for computers to detect sarcasm when processing language?
What makes detecting sarcasm harder than detecting other applications of NLP, like spam emails for example, is that there are no certain vocabulary that people associate with sarcastic sentences. Sarcasm hides in the context of the sentence, its tone and contradiction. Computer cannot detect facial expressions to determine the sentiment analysis, so it needs to understand that the person who wrote the sarcastic sentence, actually means the opposite of what he wrote. This is how Mathieu Cliché- who is a data scientist and physics PhD researcher at Cornell University, USA- first thought when he was curious about the possibility of building a sarcasm detector.
Online Sarcasm Detector:
Cliché streamed over a hundred of thousands of tweets with and without the label #sarcasm, to code a classification algorithm that knows how to differentiate between the sarcastic and non-sarcastic sentences, which helps to determine an accurate sentiment analysis. In July 2014, he built the “Online Sarcasm Detector” web application , which allows users to test the detector themselves. Here is how he did it:
- Where did he get the data:
- Using Twitter API, he tracked tweets during a certain period of time, and chose some that had the label #Sarcasm, and others that didn’t. He depended on people to decide on what was sarcastic and what was not.
- Twitter facilitated as many examples as he needed.
- Pre-processing the data:
- He eliminated the tweets that had the sarcasm implemented in a photo or an attached link, and removed all the hashtags and the mentions.
- He removed the duplicates.
- After deleting all this “noise,” as Cliché calls them, he added only 3-words long tweets to the data set.
- The used features:
- N-grams: Unigrams, which are one words like awesome, and Bigrams, which are two words that usually go together like peanut butter. To extract bigrams, each tweet was tokenized, stemmed and un-capitalized, adding each bigram to a binary feature dictionary.
- Sentiment: Cliché had a hypothesis that sarcastic tweets carry contradictory feelings in their sentiment analysis. In other words, he thought that they start very positively, but end very negatively. To measure the sentiment, he divided each tweet into two and three parts, and he tested each part for sentiment. The scores were calculated using two sentiment analysis tools, SentiWordNet and TextBlob.
- Topics: Some words are often used together in tweets, these grouped words are called topics. These topics are collected from each tweet, and are classified based on which topics are more associated with sarcasm.
- His findings:
- Sarcastic tweets are more positive than non-sarcastic ones.
- The first half of the sarcastic tweet is more positive, while the second half is usually more negative.
- Sarcastic tweets are more expressive of feelings than non-sarcastic tweets
- Topics that include these words (jersey, procrastinating,hot, complain, storm, mom), people usually associate them with non-sarcastic tweets. While these words (love, life, today, lol, feel, bad) are more associated with sarcastic tweets.
Cliché engineered several features that help classify the tweets to sarcastic and non-sarcastic.
Cliché gathered tweets during the 2014 FIFA World Cup in a three-week period. The following findings came from the sentiment analysis that resulted from the classifier:
Sarcasm and Sentiment Analysis in Algorithms
As the science of natural language processing is developing a day after another, computers and programs will implement better understanding of human languages. And although computers find sarcasm to be challenging when performing a solid and accurate sentiment analysis, algorithms are improving and computers are getting smarter.