Overview
Course: BA2 Korean Studies, Leiden University
Instructor: Dr. Steven Denney
Time: Mondays, 15:15-17:00
Location: Huizinga 0.09 (DH Lab) & Arsenaal B0.05
Duration: 12 sessions (February 02 - May 18)
Brief Course Description
This course introduces computational text analysis as a research method in Korean and area studies. You will learn to treat text as data, transforming written sources into formats that can be analyzed computationally. The course emphasizes Korean-language primary sources, while allowing students to supplement these with other language materials, which will be organized and analyzed as text corpora. In addition, the course reviews recent research that applies digital tools and methods in the digital humanities and computational social sciences.
While this course is designed primarily for students in the Korea Studies program at Leiden University, it welcomes students from other programs and will support the use of primary source materials in languages other than Korean.
Using Orange Data Mining, a visual platform that makes computational methods accessible without advanced programming, you will work through the complete text analysis pipeline:
- Preprocessing - Preparing text for analysis
- Descriptive analysis - Finding patterns in word usage
- Clustering - Discovering natural groupings in documents
- Classification - Categorizing texts using rules and machine learning
- Topic modeling - Uncovering hidden themes across collections
You will also develop foundational R programming skills through guided tutorials. No prior programming experience is required.
The course culminates in a Research Methods Project applying text analysis to Korean-language materials.
Before the first class, please complete the software installation steps in the Getting Started guide.
Learning Objectives
By the end of this course, you will be able to:
- Apply text preprocessing, descriptive analysis, clustering, classification, and topic modeling
- Practice data management and transparency best practices
- Establish a foundation in the R programming language
- Reflect on the strengths and limitations of computational methods in research
Weekly Schedule
| Week | Date | Topic |
|---|---|---|
| 1 | Feb. 02 | Introduction & Getting Started |
| 2 | Feb. 09 | Foundations of Computational Text Analysis |
| 3 | Feb. 16 | Text Preprocessing Basics |
| 4 | Feb. 23 | Text Preprocessing Practice |
| 5 | Mar. 02 | Descriptive Patterns in Text |
| 6 | Mar. 09 | Midterm Review & Assessment |
| 7 | Mar. 16 | Clustering |
| 8 | Mar. 30 | Classification I - Dictionary & Rule-Based |
| 9 | Apr. 13 | Classification II - Machine Learning (SVM) |
| 10 | Apr. 20 | Topic Modeling (LDA) |
| 11 | May 11 | Final Review & Assessment |
| 12 | May 18 | Research Methods Project Workshop |
See the Syllabus for detailed weekly content, readings, and assignments.
Resources
Textbook
Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as Data: A New Framework for Machine Learning and the Social Sciences. Princeton University Press.
Tools & Platforms
- Orange Data Mining - Visual data analysis platform (orangedatamining.com)
- Swirl - Interactive R tutorials in RStudio (swirlstats.com)
- DataCamp - Online R courses (institutional access provided)
This course is part of the Korean Studies program in the Humanities Faculty at Leiden University.