Topical Reading: Digital Humanities

Course: BA3 Korean Studies, Leiden University
Instructor: Dr. Steven Denney
Time & Place: Fridays, 11:15-13:00, Huizinga 0.09
Duration: 6 seminars (October 10 - November 21)


Expanded Description

This is the DH strand of the BA3 course Contemporary Korea and Digital Humanities. This course is meant to introduce students to digital humanities (DH) methods, focusing on text-as-data approaches. Using Orange Data Mining and pre-prepared Korean corpora, students will learn how to clean, analyze, and interpret textual data.

The DH strand complements the topical reading seminars by equipping students with methodological skills to support their undergraduate research and to introduce them to the DH side of research in the Humanities and Social Sciences. There are no programming requirements whatsoever in this course, although students will have the opportunity to explore ways to acquire such skills.

Students will learn how to prepare, analyze, and interpret text using Orange Data Mining. The aim is not technical mastery, but to understand how computational methods can support thesis research in the KoreaStudies (BA) program at Leiden University.


Tutorials

Each week lists required Orange Data Mining Tutorials.

  • These tutorials are required viewing before class
  • They are short (5-10 minutes each) and introduce the widgets you will use hands-on
  • Watching them in advance will free up class time and have you better prepared for applying methods to Korean corpora

You can download the Orange Data Mining application here

Tutorials (to watch before class) are available here: Orange Data Mining Tutorials (YouTube Playlist)


Assignments

In addition to attending weekly sessions, you are required to complete weekly assignments (“deliverables”). These tasks reinforce the skills introduced in tutorials and class exercises. Assignments can be found in the folder marked with the same name (Assignments).

Format

Each deliverable consists of:

  1. One or more screenshots of your Orange workflow/output
  2. A short written reflection (approximately 1 paragraph)

Submission

  • Commit your deliverables to your GitHub repository in the appropriate weekly folder (e.g., week01/, week02/, etc.)
  • Deadline: 17:00 on the Monday following class unless otherwise specified

Grading:

  • 2 = fully complete and accurate
  • 1 = attempted but not fully complete/accurate
  • 0 = incomplete, late, or not attempted

Note: You do not need to upload assignments to Brightspace. The instructor will review your GitHub repo and record grades.

Weekly deliverables, together with attendance, make up 30% of your DH strand grade (see Assessment section).


Optional: R Programming Extensions

Students interested in developing foundational R programming skills alongside the DH strand are encouraged to explore the R Programming Extensions. These optional activities complement our work with Orange Data Mining and offer pathways to begin coding and analyzing text directly in R.

We will make use of two platforms:

  • Swirl - interactive, in-R tutorials for learning R at your own pace: swirlstats.com/students.html
  • DataCamp - an online learning platform with a dedicated class account

All enrolled students have access to the shared DataCamp classroom. The primary course to complete there is: Introduction to Text Analysis in R

Assessment policy for the optional R Programming track:

  • Extra credit will be awarded upon satisfactory completion of the designated lessons or modules (up to 0.25 points added to the final DH strand grade)
  • Students who opt in but do not complete required lessons may receive a minor penalty to their DH strand grade. Opt in only if you plan to finish.

GitHub Repository Requirement

See the Getting Started guide for detailed instructions on setting up your repository.

You are required to maintain a private GitHub repository for this course:

  1. Create a new private repo named: DH-TopicalReading-<Surname>
  2. Add the instructor (username: scdenney) as a collaborator
  3. Keep the repo private, unless you explicitly choose to share it
  4. Organize the repo with the following structure:
DH-TopicalReading-<Surname>/
├── assignments/
│   ├── week01/
│   │   ├── week01-deliverable.md
│   │   └── screenshots/
│   ├── week02/
│   │   └── ...
│   ├── week06/
│   └── final-project/
└── README.md

Each week’s deliverable (markdown file + screenshots) must be committed to the correct subfolder.

At the start of the course, submit the URL of your repo to the instructor at this Google Sheet.

This organization mirrors best practices for research data management and is part of the course’s learning objectives.


Corpus Overview

The primary dataset for this course is the National Institute of Korean History (NIKH) history textbook corpus. This collection brings together Korean history textbooks produced under successive national curricula, spanning from the late Joseon and Korean Empire through the Japanese colonial period, liberation, and the postwar national curricula up to the present.

You may peruse an online-navigable version of the history textbooks through the National Institute of Korean History’s official website: contents.history.go.kr

Because textbooks are central to the formation of collective memory and national identity, this corpus is especially well suited for exploring questions of modern Korean identity.

For supplementary purposes, additional pre-prepared corpora are available:

  • Kaebyok (1920-1935): An interwar magazine reflecting cultural, intellectual, and political debates in colonial Korea
  • Kyongje Yongu (1987-2017): A North Korean economics journal, useful for examining how policy and ideology interact in the DPRK. Read more at 38 North

Other corpora will be introduced during the course to support student exploration. For the final project, students will be required to use one of the pre-prepared corpora, except the NIKH practice corpora, or to have approved the use of one of their own.


Weekly Outline

Week 1 (Oct. 10): Introduction to DH, GitHub & Data Management

  • Lecture: What is DH? Why text-as-data matters for Korean Studies. FAIR data principles.
  • Hands-On: GitHub setup, orientation to Orange workflows and widgets.
  • Tutorials: Welcome to Orange, Data Workflows, Widgets & Channels.
  • Deliverable: Create GitHub repo + README reflection.

Week 2 (Oct. 17): Text Preprocessing

  • Lecture: Tokenization, stopwords, normalization. Korean-specific preprocessing challenges.
  • Hands-On: Import corpora, apply preprocessing, compare raw vs. cleaned.
  • Tutorials: Text Preprocessing, Importing Text Documents.
  • Deliverable: Preprocessing workflow screenshot + reflection.

Week 3 (Oct. 24): Descriptive Patterns

  • Lecture: Frequency, keywords, word clouds. From descriptive to interpretive claims.
  • Hands-On: Group analysis of corpora, frequency/word cloud outputs, keyword contrasts, clustering/projection.
  • Tutorials: Text Clustering, Multivariate Projection (Freeviz).
  • Deliverable: Word cloud + reflections.

Week 4 (Nov. 7): Classification & Prediction

  • Lecture: Supervised methods, labels, evaluation, and applications in thesis research.
  • Hands-On: Apply sentiment classification, evaluate accuracy, discuss limits.
  • Tutorials: Text Classification, Making Predictions, Model Evaluation.
  • Deliverable: Sentiment analysis + reflections.

Week 5 (Nov. 14): Clustering & Similarity

  • Lecture: Unsupervised methods; clustering documents. Strengths and pitfalls.
  • Hands-On: Hierarchical and k-means clustering, interpret clusters/topics and compare approaches.
  • Tutorials: Hierarchical Clustering, k-Means & Topic Modeling widget demo.
  • Deliverable: Clustering + reflections.

Week 6 (Nov. 21): Topic Modeling & Wrap-Up

  • Lecture: Review: Comparing clusters vs. topics. Designing a DH project.
  • Hands-On: Review of previous week, final project preparation.
  • Tutorials: Review previous week.
  • Deliverable: None - group preparation for final project.

Final Project (Dec. 05)

The final project will take the form of an in-person, four-hour “hackathon” held in the DH Lab. Working in small groups, students will complete a text-as-data analysis project that draws directly on the skills developed in this six-week strand of the course.

Each group will:

  • Select from a pre-prepared corpus (or set of corpora). It will be possible to use your own.
  • Formulate a research question.
  • Design and carry out a workflow in Orange Data Mining.
  • Generate and interpret findings.
  • Write up results and reflections on the process and findings.
  • Submit it to Brightspace.

This is a timed, in-class assignment (not a take-home project). Further details will be provided in advance.


Assessment

The DH strand of the course is worth 25% of the full course grade. That 25% is broken down as follows:

Component Weight
Weekly Deliverables & Attendance 30%
Final Project 70%

Attendance

Full attendance is expected. Missing any sessions will put you behind. If you cannot attend all sessions, speak with the instructor in advance.