Sentiment Analysis
Assigning numerical values that summarize how positive, negative, or emotionally charged a text is — useful when you need a scalar measure of tone across a large corpus.
What it is
Sentiment analysis covers three broad families, each with different assumptions about what “sentiment” is and whom it generalizes to:
- Dictionary methods: counting terms from a curated lexicon (LIWC, VADER, NRC, AFINN). Transparent and reproducible; struggles with negation, sarcasm, and domain shift.
- Supervised classifiers: training a model (logistic regression, SVM, fine-tuned transformer) on human-labelled examples. More accurate in-domain, but requires labelled training data and careful validation.
- LLM-based rating: prompting a large language model to rate each text. Fast to set up; variable across prompts and model versions; needs rigorous evaluation, supervisor guidance, and compliance with the Ethics & AI policy before trusted for a thesis.
Each family has weaknesses that matter more or less depending on your texts. Sarcasm-heavy social media breaks dictionary methods. Classifiers trained on movie reviews fail on policy documents. LLMs drift across model releases. Choose with the limits in mind.
What you learn in the DH course
This page draws from the course’s sentiment analysis material. Students who take it come away with:
- Dictionary methods: LIWC, VADER, NRC, AFINN — what each measures and where it breaks
- Building a supervised classifier: labelling strategy, feature extraction, train/validation/test split
- LLM-based sentiment rating: prompt design, reproducibility, version pinning
- Handling negation, intensifiers, and contextual modifiers
- Inter-annotator agreement (Cohen’s kappa, Krippendorff’s alpha) for labelled data
- Validating sentiment scores against human judgment
- Reporting sentiment methods in a methodology chapter — limitations are mandatory
What you need to learn first
- Preprocessing: dictionary methods especially depend on tokenization and lemmatization. See Preprocessing.
- Basic statistics: agreement metrics, confidence intervals, reliability thinking.
- Python or R:
vaderSentiment,nltk,transformersin Python;sentimentr,quanteda.sentimentin R.
What you can do with it
- Chart whether coverage of a policy in major newspapers turned negative after a key event
- Compare emotional tone of government vs. opposition speeches across a legislative term
- Track sentiment toward a country or leader over time in foreign-language press
- Surface high-emotion passages for qualitative close reading
- Build a scalar covariate you can use in a topic model or regression (e.g. STM with
sentimentas a prevalence covariate)
Related methods
- Preprocessing — dictionary methods are especially sensitive to it.
- Framing Analysis — sentiment is one of several dimensions framing scholarship measures; the two often appear together.
- Topic Analysis — sentiment-within-topic is a common analytical move.