Overview

Course: BA2 Korean Studies, Leiden University
Instructor: Dr. Steven Denney
Time: Mondays, 15:15-17:00
Location: Huizinga 0.09 (DH Lab) & Arsenaal B0.05
Duration: 12 sessions (February 02 - May 18)

Brief Course Description

This course introduces computational text analysis as a research method in Korean and area studies. You will learn to treat text as data, transforming written sources into formats that can be analyzed computationally. The course emphasizes Korean-language primary sources, while allowing students to supplement these with other language materials, which will be organized and analyzed as text corpora. In addition, the course reviews recent research that applies digital tools and methods in the digital humanities and computational social sciences.

While this course is designed primarily for students in the Korea Studies program at Leiden University, it welcomes students from other programs and will support the use of primary source materials in languages other than Korean.

Using Orange Data Mining, a visual platform that makes computational methods accessible without advanced programming, you will work through the complete text analysis pipeline:

Preprocessing - Preparing text for analysis
Descriptive analysis - Finding patterns in word usage
Clustering - Discovering natural groupings in documents
Classification - Categorizing texts using rules and machine learning
Topic modeling - Uncovering hidden themes across collections

You will also develop foundational R programming skills through guided tutorials. No prior programming experience is required.

The course culminates in a Research Methods Project applying text analysis to Korean-language materials.

Before the first class, please complete the software installation steps in the Getting Started guide.

Learning Objectives

By the end of this course, you will be able to:

Apply text preprocessing, descriptive analysis, clustering, classification, and topic modeling
Practice data management and transparency best practices
Establish a foundation in the R programming language
Reflect on the strengths and limitations of computational methods in research

Weekly Schedule

Week	Date	Topic
1	Feb. 02	Introduction & Getting Started
2	Feb. 09	Foundations of Computational Text Analysis
3	Feb. 16	Text Preprocessing Basics
4	Feb. 23	From Words to Numbers: BoW, TF-IDF & Visualization
5	Mar. 02	Practice & Deepen: Hands-On Lab
6	Mar. 09	Midterm Review & Assessment
7	Mar. 16	Clustering
8	Mar. 30	Word Embeddings
9	Apr. 13	Sentiment Analysis - Dictionary & Rule-Based
10	Apr. 20	Topic Modeling (LDA)
11	May 11	Final Review & Assessment
12	May 18	Research Methods Project Workshop

See the Syllabus for detailed weekly content, readings, and assignments.

Resources

Textbook

Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as Data: A New Framework for Machine Learning and the Social Sciences. Princeton University Press.

Tools & Platforms

Orange Data Mining - Visual data analysis platform (orangedatamining.com)
Swirl - Interactive R tutorials in RStudio (swirlstats.com)
DataCamp - Online R courses (institutional access provided)

This course is part of the Korean Studies program in the Humanities Faculty at Leiden University.