Computational & Quantitative Approaches

For work that treats a corpus programmatically — extracting patterns, measuring themes, building numerical representations that scale beyond what a single reader can manage. The pages below cover the most common computational approaches I point students to across the BA and MA programs I supervise.

The distinction with the qualitative approaches is one of analytical posture, not subject matter. You can do computational work on a small corpus of parliamentary speeches; you can do discourse analysis on tweets. What changes is what the method measures and how the researcher’s judgment enters the analysis — in computational work, judgment sits largely in design and validation; in qualitative work, it sits in interpretation itself.

Launch the wizard

If you already know you need a computational pipeline — OCR from scans, cleanup, metadata assembly, analysis-ready outputs — the standalone wizard below routes you to the right path for your compute and scale and hands you a starter kit for Claude Code or Codex.

corpus-building - companion resource

$ corpus-building-wizard|

Turn a folder of source files into an analysis-ready text corpus. The wizard hands you a starter kit for Claude Code or Codex. →

Preparation before analysis

Preprocessing

Tokenisation, normalisation, and the cleanup steps that shape every downstream result

Three core computational methods

Topic Analysis

Discovering themes across a corpus with LDA, STM, and embedding-based methods

Sentiment Analysis

Measuring affect with dictionaries, classifiers, and LLM-based rating

Word Embeddings

Vector representations of words and documents for similarity, drift, and classification

Need to sort out the corpus or pipeline before choosing one of these?

Use the corpus page for planning and organization, or the wizard above if the issue is workflow, OCR, or compute rather than method choice.

Go to Building a Corpus

In the classroom

These methods are also taught in two of my courses at Leiden. If you’re a student in one of them, the method pages above double as a reference alongside the weekly sessions.

BA2

Digital Korea

12-session course in computational text analysis with Orange Data Mining and R, primarily for Korean Studies. Covers the full preprocessing → classification → topic modeling pipeline.

BA3

Text as Data (DH strand)

Six-seminar digital-humanities strand of the BA3 Contemporary Korea and Digital Humanities course. No programming required; introduces descriptive, clustering, classification, and topic-modeling methods on pre-prepared Korean corpora.

If your thesis draws on either course, the method pages here extend what’s covered in class with the methodological scaffolding you’ll need for the methods chapter.

Combining with qualitative methods

Most strong theses combine a computational measure with a qualitative reading. See the Qualitative Approaches page for that side of the split; the end of that page lists common pairings (framing + topic analysis, discourse analysis + keyword-in-context tooling, comparative case study + descriptive statistics).

Overview and other methods

Return to the Methods overview for the broader orientation, or consult its “Other Methods to Explore” table for less commonly used approaches that aren’t covered in depth.