Word Embeddings

Numerical representations of words (and documents) that encode semantic relationships as geometric ones — the foundation of most modern NLP and a powerful tool for tracing concepts at scale.

What it is

Word embeddings represent each word (or document) as a vector in a high-dimensional space such that semantically related items land close to each other. Two broad families:

Static embeddings: one vector per word, learned from co-occurrence patterns in a training corpus. Word2Vec (skip-gram, CBOW), GloVe, and fastText are the classics. Fast to train, easy to inspect, blind to context: “bank” has one vector whether it’s a river or a financial institution.
Contextual embeddings: a fresh vector per word in context, produced by a pre-trained transformer (BERT, RoBERTa, sentence-transformers). Much more accurate for downstream tasks, harder to interpret directly, and computationally heavier.

Embeddings aren’t an analysis on their own. They’re a representation you feed into something else: a similarity search, a classifier, a clustering algorithm, a time-trajectory measurement.

What you learn in the DH course

This page draws from the course’s word-embedding material. Students who take it come away with:

Vector-space semantics and the distributional hypothesis (“a word is known by the company it keeps”)
Training a Word2Vec / GloVe / fastText model on your own corpus vs. using a pre-trained model
Contextual embeddings: BERT, multilingual BERT, sentence-transformers, and when each is worth the compute
Similarity operations: cosine distance, nearest neighbours, analogy tasks
Aligning embedding spaces across time (to measure semantic change) or across languages
Using embeddings as input features for classification, clustering, or topic analysis
Reporting embedding-based methods: pinning model versions, documenting training corpus, acknowledging limits

What you need to learn first

Preprocessing: embeddings learn from the vocabulary you feed them; decisions here propagate into the geometry. See Preprocessing.
Linear algebra basics: cosine similarity, vector arithmetic, dimensionality reduction. You don’t need to derive it, but you need a mental model.
Python: essentially all embedding tooling is Python-first (gensim, transformers, sentence-transformers). R bindings exist but lag.

What you can do with it

Measure how the meaning of a political keyword shifts across decades (diachronic embeddings)
Surface near-synonyms and related terms you’d otherwise miss in keyword searches
Cluster documents by semantic similarity, even when they share no keywords
Build a retrieval system for a large corpus (semantic search instead of exact-match)
Feed sentence- or document-level embeddings into a sentiment or classification model
Cross-language alignment: find the Korean equivalent of an English concept by projecting embeddings into a shared space

Preprocessing — sets the vocabulary the embedding sees.
Topic Analysis — embedding-based topic methods (BERTopic, Top2Vec) are built directly on contextual embeddings.
Sentiment Analysis — modern sentiment classifiers use contextual embeddings as their feature layer.

Word Embeddings

What it is

What you learn in the DH course

What you need to learn first

What you can do with it

Related methods