Sentiment Analysis

Learn what sentiment analysis is, how it works, and its applications in real-world scenarios like social media monitoring, product reviews, and customer feedback.
Explore and compare the working mechanisms, outputs, and strengths of four popular sentiment tools: VADER, TextBlob, SentiWordNet, and SentiStrength.
Gain hands-on experience by applying each tool to analyze sentiment in diverse textual datasets (e.g., tweets, comments, or news articles), and interpret their results.
Evaluate and contrast the performance of these tools based on language style (formal/informal), domain specificity, and polarity accuracy, identifying which tools suit specific use cases best.

Target Audience

This tutorial is for computational social scientists, computer scientists interested in analyzing subjective text.
Basic Python skills and sentiment polarity understanding is required.

Use Cases

Detects sentiments in social media posts

Duration

1 hour

Environment Setup

Set up the virtual working environment be exucting the following command

# ! conda env create -f environment.yml

import nltk
nltk.download("punkt")
nltk.download("punkt_tab")
nltk.download('averaged_perceptron_tagger_eng')
nltk.download("wordnet")
nltk.download("stopwords")
nltk.download("vader_lexicon")

[nltk_data] Downloading package punkt to /home/codecheck/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to
[nltk_data]     /home/codecheck/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/codecheck/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.
[nltk_data] Downloading package wordnet to
[nltk_data]     /home/codecheck/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /home/codecheck/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /home/codecheck/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!

True

1. Sentiment Analysis

Sentiment analysis is a natural language processing (NLP) technique used to identify and categorize emotions or opinions expressed in text. It is commonly applied to determine whether a piece of writing—such as a review, tweet, or customer feedback—is positive, negative, or neutral. By analyzing linguistic features like word choice, tone, and context, sentiment analysis enables organizations to understand public perception, monitor brand reputation, and gain insights from large volumes of textual data.

1.1. SentiStrength

SentiStrength is a rule-based sentiment analysis tool designed to detect the strength of positive and negative emotions in short texts, particularly from social media. Unlike many traditional sentiment tools, it outputs both a positive and negative score for each input, allowing it to capture mixed sentiments within a single statement. SentiStrength is optimized for informal language, handling slang, spelling variations, emoticons, and repeated letters effectively. It has been widely used in research due to its high performance on short, noisy text, and it supports customization and adaptation to different domains, including variants like SentiStrength-SE for software engineering contexts.

SentiStrength analyze individual words for polarity, taking negation (e.g., not, don’t), booster words (e.g., very, extremely), questions, idioms, emojis and punctuations into account, which makes it suitable for analyzing social media posts. It is a very simple, transparent and high speed method that works across various linguistics contexts.

1.2. VADER

Vader (Valence Aware Dictionary and sEntiment Reasoner) is a rule-based sentiment analysis tool that is specifically designed for analyzing social media texts. Vader is a pre-trained sentiment analysis model that provides a sentiment score for a given text. It uses a dictionary of words and rules to determine the sentiment of a piece of text. For each word the scores ranges from -4 as most negative to +4 as most positive. Vader also takes into account the intensity of the sentiment emphasized through capitalization or punctuation.

1.3. TextBlob

TextBlob (along with other NLP tasks) analyzes the sentiments in text offering subjectivity and polarity scores. A higher subjectivity score indicates the presence of sentiments in the text while the polarity score demonstrates the orientation of subjectivity. The subjectivity scores are from 0 - 1 with 0 being most objective and 1 being most subjective. The polarity scores are from -1 most negative to +1 most positive. Generally, polarity score is used for sentiment analysis, however, it is important to cross check that higher polarity scores have higher subjectivity score too.

1.4. SentiWordNet

SentiWordNet is a lexical resource built on top of WordNet that assigns sentiment scores to synsets (sets of cognitive synonyms), capturing their positive, negative, and objective orientations. Unlike rule-based or machine learning sentiment tools, SentiWordNet operates at the word sense level, enabling more fine-grained sentiment analysis by considering the contextual meaning of words. Each synset is associated with numerical scores that represent its sentiment polarity, making the resource suitable for tasks requiring detailed semantic analysis. It is commonly used in research and academic applications where explainability and control over linguistic features are important, and it supports multiple languages through extensions or mappings.

2. Comparison

2.1. Language Support (English and Deutsch)

	SentiStrength	VADER	TextBlob	SentiWordNet
English	☑	☑	☑	☑
Deutsch	☑ (SentiStrength_de)	☑ (GerVader)	☑ (TextBlob_de)	☒

2.2. Comparison Across Features

Feature	SentiStrength	VADER	TextBlob	SentiWordNet
Type	Lexicon + Rule-based	Lexicon + Rule-based (social-media optimized)	Lexicon + Rule-based	Lexicon-based (WordNet sentiment scores)
Output Format	Two scores: +1 to +5 (positive), -1 to -5 (negative)	Compound (-1 to 1), with pos/neu/neg scores	Polarity (-1 to 1), Subjectivity (0 to 1)	Positive, Negative, Objective scores (0 to 1)
Handles Negation	Yes	Yes	Basic	No (relies on word-level sentiment)
Handles Emojis/Slang	Limited	Very good	Poor	None
Context Awareness	Limited	Limited	Very limited	None (word-level only)
Customizability	Yes (custom lexicons)	Limited	Limited	Moderate

3. Perform Sentiment Analysis

import pandas as pd
from sentiments_analysis import (get_sentiments_sentistrength, get_sentiments_vader, 
                                 get_sentiments_textblob, get_sentiments_sentiwordnet)

[nltk_data] Downloading package wordnet to
[nltk_data]     /home/codecheck/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package sentiwordnet to
[nltk_data]     /home/codecheck/nltk_data...
[nltk_data]   Unzipping corpora/sentiwordnet.zip.

output = {
    'en' : { 'text': [], 'sentistrength': [], 'vader': [], 'textblob': [], 'sentiwordnet': []}, 
          'de' : { 'text': [], 'sentistrength' : [], 'vader' : [], 'textblob': []}
          }

3.1. With English Texts

input_texts_en = [
'I absolutely loved the performance — it was breathtaking from start to finish!',
'The movie was enjoyable, though a bit slow in parts.',
'That was the worst experience and is unacceptable'
]

output['en']['text'] = input_texts_en

# for bulk analysis update the text in file 'data/input_text_en.txt' for English texts, having one item 
    # (document, sentence, social media post per line)

# 1. Using SentiStrength

# Calling SentiStrength method. It uses trinary mode by default, providing pos, neg and neu scores 
# The other modes of operation are dual, scale and binary

scores = []
for text in input_texts_en:
    score = get_sentiments_sentistrength(text, lang='en')
    scores.append(score)
output['en']['sentistrength'] = scores
print(scores)

[{'pos': 5, 'neg': -1, 'neu': 1}, {'pos': 3, 'neg': -1, 'neu': 1}, {'pos': 1, 'neg': -3, 'neu': -1}]

# 2. Using VADER

# Calling VADER sentiment analysis method on input_texts

scores = []
for text in input_texts_en:
    score = get_sentiments_vader(text, lang='en')
    scores.append(score)
output['en']['vader'] = scores
print(scores)

[{'neg': 0.0, 'neu': 0.595, 'pos': 0.405, 'compound': 0.8169}, {'neg': 0.0, 'neu': 0.756, 'pos': 0.244, 'compound': 0.4404}, {'neg': 0.542, 'neu': 0.458, 'pos': 0.0, 'compound': -0.7964}]

# 3. Using TextBlob

# Calling TextBlob sentiment analysis method on input_texts

scores = []
for text in input_texts_en:
    score = get_sentiments_textblob(text, lang='en')
    scores.append(score)
output['en']['textblob'] = scores
print(scores)

[{'polarity': 0.85, 'subjectivity': 0.9}, {'polarity': 0.09999999999999998, 'subjectivity': 0.5}, {'polarity': -1.0, 'subjectivity': 1.0}]

# 4. Using SentiWordNet

# Calling SentiWordNet sentiment analysis method on input_texts
# Please note that the 0 scores do not indicate absense of positive or negative polarity but rather no Synset for the 
# given word in WordNet that is required for SentiWordNet

scores = []
for text in input_texts_en:
    score = get_sentiments_sentiwordnet(text, lang='en')
    scores.append(score)

output['en']['sentiwordnet'] = scores
print(scores)

[{'pos': 0.625, 'neg': 0.0}, {'pos': 0.5, 'neg': 0.375}, {'pos': 0.25, 'neg': 1.5}]

df_en = pd.DataFrame(output['en'])

df_en

	text	sentistrength	vader	textblob	sentiwordnet
0	I absolutely loved the performance — it was br...	{'pos': 5, 'neg': -1, 'neu': 1}	{'neg': 0.0, 'neu': 0.595, 'pos': 0.405, 'comp...	{'polarity': 0.85, 'subjectivity': 0.9}	{'pos': 0.625, 'neg': 0.0}
1	The movie was enjoyable, though a bit slow in ...	{'pos': 3, 'neg': -1, 'neu': 1}	{'neg': 0.0, 'neu': 0.756, 'pos': 0.244, 'comp...	{'polarity': 0.09999999999999998, 'subjectivit...	{'pos': 0.5, 'neg': 0.375}
2	That was the worst experience and is unacceptable	{'pos': 1, 'neg': -3, 'neu': -1}	{'neg': 0.542, 'neu': 0.458, 'pos': 0.0, 'comp...	{'polarity': -1.0, 'subjectivity': 1.0}	{'pos': 0.25, 'neg': 1.5}

3.2 With Deutsch Texts

input_texts_de = [
'Ich war begeistert von der Show – einfach fantastisch!',
'Das Essen war ganz gut, aber nichts Besonderes.',
'Das Treffen begann um zehn Uhr morgens.'
]

output['de']['text'] = input_texts_de

# for bulk analysis update the text in file 'data/input_text_de.txt' for English texts, having one item 
    # (document, sentence, social media post per line)

# 1. Using SentiStrength

# Calling SentiStrength method. It uses trinary mode by default, providing pos, neg and neu scores 
# The other modes of operation are dual, scale and binary

scores = []
for text in input_texts_de:
    score = get_sentiments_sentistrength(text, lang='de')
    scores.append(score)
output['de']['sentistrength'] = scores
print(scores)

[{'pos': 2, 'neg': -1, 'neu': 1}, {'pos': 1, 'neg': -1, 'neu': 0}, {'pos': 1, 'neg': -1, 'neu': 0}]

# 2. Using VADER

# Calling VADER sentiment analysis method on input_texts

scores = []
for text in input_texts_de:
    score = get_sentiments_vader(text, lang='de')
    scores.append(score)

output['de']['vader'] = scores
print(scores)

[{'neg': 0.0, 'neu': 0.334, 'pos': 0.666, 'compound': 0.8748}, {'neg': 0.194, 'neu': 0.532, 'pos': 0.275, 'compound': 0.2302}, {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}]

# 3. Using TextBlob

# Calling TextBlob sentiment analysis method on input_texts

scores = []
for text in input_texts_de:
    score = get_sentiments_textblob(text, lang='de')
    scores.append(score)
output['de']['textblob'] = scores
print(scores)

[{'polarity': 1.0, 'subjectivity': 0.0}, {'polarity': 1.0, 'subjectivity': 0.0}, {'polarity': 0.0, 'subjectivity': 0.0}]

df_de = pd.DataFrame(output['de'])

df_de

	text	sentistrength	vader	textblob
0	Ich war begeistert von der Show – einfach fant...	{'pos': 2, 'neg': -1, 'neu': 1}	{'neg': 0.0, 'neu': 0.334, 'pos': 0.666, 'comp...	{'polarity': 1.0, 'subjectivity': 0.0}
1	Das Essen war ganz gut, aber nichts Besonderes.	{'pos': 1, 'neg': -1, 'neu': 0}	{'neg': 0.194, 'neu': 0.532, 'pos': 0.275, 'co...	{'polarity': 1.0, 'subjectivity': 0.0}
2	Das Treffen begann um zehn Uhr morgens.	{'pos': 1, 'neg': -1, 'neu': 0}	{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...	{'polarity': 0.0, 'subjectivity': 0.0}

3.3. Comparing Results

output['en']

{'text': ['I absolutely loved the performance — it was breathtaking from start to finish!',
  'The movie was enjoyable, though a bit slow in parts.',
  'That was the worst experience and is unacceptable'],
 'sentistrength': [{'pos': 5, 'neg': -1, 'neu': 1},
  {'pos': 3, 'neg': -1, 'neu': 1},
  {'pos': 1, 'neg': -3, 'neu': -1}],
 'vader': [{'neg': 0.0, 'neu': 0.595, 'pos': 0.405, 'compound': 0.8169},
  {'neg': 0.0, 'neu': 0.756, 'pos': 0.244, 'compound': 0.4404},
  {'neg': 0.542, 'neu': 0.458, 'pos': 0.0, 'compound': -0.7964}],
 'textblob': [{'polarity': 0.85, 'subjectivity': 0.9},
  {'polarity': 0.09999999999999998, 'subjectivity': 0.5},
  {'polarity': -1.0, 'subjectivity': 1.0}],
 'sentiwordnet': [{'pos': 0.625, 'neg': 0.0},
  {'pos': 0.5, 'neg': 0.375},
  {'pos': 0.25, 'neg': 1.5}]}

df_en

	text	sentistrength	vader	textblob	sentiwordnet
0	I absolutely loved the performance — it was br...	{'pos': 5, 'neg': -1, 'neu': 1}	{'neg': 0.0, 'neu': 0.595, 'pos': 0.405, 'comp...	{'polarity': 0.85, 'subjectivity': 0.9}	{'pos': 0.625, 'neg': 0.0}
1	The movie was enjoyable, though a bit slow in ...	{'pos': 3, 'neg': -1, 'neu': 1}	{'neg': 0.0, 'neu': 0.756, 'pos': 0.244, 'comp...	{'polarity': 0.09999999999999998, 'subjectivit...	{'pos': 0.5, 'neg': 0.375}
2	That was the worst experience and is unacceptable	{'pos': 1, 'neg': -3, 'neu': -1}	{'neg': 0.542, 'neu': 0.458, 'pos': 0.0, 'comp...	{'polarity': -1.0, 'subjectivity': 1.0}	{'pos': 0.25, 'neg': 1.5}

For English Text

S1: clearly positive with high positive score from Sentistrength, TextBlob and SentiWordNet. On VADER, although it has higher neutral score which scores that the sentence is more neutral part, however the compound score close to one suggests that its positive for VADER as well. Its subjectivity is very high on TextBlob as well, indicating that the sentence is a good fit for sentiment analysis.

S2: This is a positive sentence with lower intensity as per SentiStrength. Again VADER has a higher neutral score for it but the compound value is in favor of positive sentiments (much lower in intensity than S1). Its a borderline case, in terms of subjectivity for TextBlob and have very low positive score. For SentiWordNet it it has similar positive and negative scores. All of these indicators suggest, a just above neutral positive sentence.

S3: This sentence has higher negative score on SentiStrength, vader (higher negative score and compount score close to -1), maximum negative score on TextBlob and high negative score on SentiWordNet. The subjectivity is also maximum.

df_de

	text	sentistrength	vader	textblob
0	Ich war begeistert von der Show – einfach fant...	{'pos': 2, 'neg': -1, 'neu': 1}	{'neg': 0.0, 'neu': 0.334, 'pos': 0.666, 'comp...	{'polarity': 1.0, 'subjectivity': 0.0}
1	Das Essen war ganz gut, aber nichts Besonderes.	{'pos': 1, 'neg': -1, 'neu': 0}	{'neg': 0.194, 'neu': 0.532, 'pos': 0.275, 'co...	{'polarity': 1.0, 'subjectivity': 0.0}
2	Das Treffen begann um zehn Uhr morgens.	{'pos': 1, 'neg': -1, 'neu': 0}	{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...	{'polarity': 0.0, 'subjectivity': 0.0}

output['de']

{'text': ['Ich war begeistert von der Show – einfach fantastisch!',
  'Das Essen war ganz gut, aber nichts Besonderes.',
  'Das Treffen begann um zehn Uhr morgens.'],
 'sentistrength': [{'pos': 2, 'neg': -1, 'neu': 1},
  {'pos': 1, 'neg': -1, 'neu': 0},
  {'pos': 1, 'neg': -1, 'neu': 0}],
 'vader': [{'neg': 0.0, 'neu': 0.334, 'pos': 0.666, 'compound': 0.8748},
  {'neg': 0.194, 'neu': 0.532, 'pos': 0.275, 'compound': 0.2302},
  {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}],
 'textblob': [{'polarity': 1.0, 'subjectivity': 0.0},
  {'polarity': 1.0, 'subjectivity': 0.0},
  {'polarity': 0.0, 'subjectivity': 0.0}]}

For Deutsch Text

S1: clearly positive has positive score from Sentistrength and VADER. But according to TextBlob the subjectivity is 0 and therefore, it may either be ignored being objective sentence or considered neutral.

S2: This is neutral as per SentiStrength, VADER and TextBlob

S3: This sentence is also even more neutral as SentiStrength, VADER and TextBlob

Taxonomy