Sentiment Analysis

Sentiment analysis estimates tone numerically. It works only when the measure matches how tone operates in your corpus.


What it is

Sentiment analysis usually means one of several tools. The important question is what the score is supposed to measure.

Dictionary methods count terms from a curated lexicon such as LIWC, VADER, NRC, or AFINN. They are transparent and easy to rerun. They struggle with sarcasm and negation, especially after domain shift.

Supervised classifiers work best in-domain and require a labeling plan with validation. LLM-based ratings are quick to set up. Their scores can change with the prompt or model version. Treat that route as experimental unless your supervisor has approved it and you can evaluate it properly under the Ethics & AI policy.

The weak point depends on the material. Sarcasm-heavy social media breaks many dictionaries. Classifiers trained on movie reviews fail on policy documents. A thesis needs to show that the chosen measure is valid for the actual texts.


What you learn in the DH course

In the DH course, the sentiment unit is mainly about validation. Students practice the following.

  • Comparing dictionary methods and checking where each one breaks
  • Building a supervised classifier from labeled examples
  • Handling negation, intensifiers, and other contextual modifiers
  • Inter-annotator agreement (Cohen’s kappa, Krippendorff’s alpha) for labeled data
  • Validating sentiment scores against human judgment
  • Reporting limits without treating the score as self-explanatory

What you need to learn first

  • Preprocessing. Dictionary methods depend heavily on tokenization and lemmatization. See Preprocessing.
  • Basic statistics. You need agreement metrics, confidence intervals, and a working sense of reliability.
  • Python or R. Python options include vaderSentiment, nltk, and transformers. R users can start with sentimentr or quanteda.sentiment.

What you can do with it

  • Chart whether coverage of a policy turned negative after a key event
  • Compare the tone of government and opposition speeches across a legislative term
  • Track sentiment toward a country or leader in foreign-language press
  • Surface high-emotion passages for qualitative close reading
  • Build a scalar covariate for a topic model or regression