Final Assessment

Date: Monday, 4 May 2026 · In class

15:15–15:20 Announcements
15:20–16:20 Final assessment (60 minutes — quiz + application exercise)
16:20–16:30 Break
16:30–17:00 Discussion · introduction to the final paper

You have 60 minutes to complete both parts. The clock starts at 15:20.

The assessment has two parts, both completed during class on your laptop. The quiz covers the second-half methods from Weeks 7–10 (clustering, word embeddings, sentiment analysis, LDA / topic modeling). The application exercise asks you to build an Orange Data Mining pipeline that answers a research question on a new corpus.


Part 1 — Online Quiz (~15 min)

Ten multiple-choice questions covering Weeks 7–10. Auto-graded, one point per correct answer.

Take the Quiz

Please take this on your laptop (not your phone).


Part 2 — Application Exercise (~45 min)

Build pipelines in Orange Data Mining to answer two of the three sub-tasks below. You may either do both sub-tasks on a single dataset or do one sub-task on each — your choice.

For each sub-task, your answer rests on a visualization. Box Plot, violin plot, dendrogram, LDAvis view, t-SNE scatter, Word Cloud, and bar plot are all fair game — pick whichever best supports the claim you want to make.

Start a fresh Orange session. Do not load a previously saved workflow — build the pipeline from scratch. This is part of the assessment.

Datasets

Pick one or both. Download links below.

Reference: data dictionary — Korean→English key for the petition category values, and the era reference for the KJYG era values.

The three sub-tasks

Task A — Did NK economic discourse shift tone across leader eras? Dataset: dataset1_kjyg_sample.csv. Research question: is the sentiment of Kyongje Yongu articles measurably different across the three NK leader eras? A good answer names the direction and rough size of the shift across eras and supports the claim with a visualization.

Task B — What latent topics cut across the petition categories? Dataset: dataset2_bluehouse_petitions_sample.csv. Research question: identify the latent topics in the petitions and look at how they map onto the six official categories. Some topics will line up neatly with one category; others will cross-cut several. A good answer labels two or three of the topics in plain language and identifies at least one category where one of those topics is clearly more or less prevalent than the others.

Task C — What makes each cluster distinctive? (either dataset) Dataset: your choice — either CSV. Research question: cluster the documents into 3–5 groups and characterize what makes each cluster distinctive in vocabulary or tone. A good answer gives each cluster a short label of your own and points to the vocabulary or sentiment evidence behind the label.

What to submit

Push the following to a week11/ folder in your GitHub repository:

File Description
workflow.ows Your Orange workflow
Exported figures (.png) Use each widget’s built-in export option (right-click → Save Image, or the disk/camera icon). Label each clearly, e.g., figure1_sentiment_by_era.png
analysis.md Short write-up — see below

analysis.md should, for each of your two sub-tasks: state which sub-task and on which dataset; refer to your figures by their labels (embedding the figures directly in the markdown is encouraged); answer the research question in 2–4 sentences citing the figure(s); and reflect on the figures themselves — what they show, what stood out, anything surprising or hard to interpret.

Steps:

  1. Add all files to the week11/ folder in your repo
  2. In GitHub Desktop: write a short commit message
  3. Click Commit to main, then Push origin
  4. Confirm your files appear on github.com in your repository

Tips


Grading

Component Scoring Weight
Concepts Quiz (10 questions) 1 point each Weighted to 8 points: (raw / 10) × 8
Application Exercise 0, 1, or 2 points (see rubric) 2 points
Total   out of 10

Application-exercise rubric:

Score Criteria
0 Did not complete, or did not follow directions (e.g., loaded a previous workflow)
1 Attempted but incomplete — missing steps, pipeline errors, or write-up does not answer the question
2 Successful end-to-end pipeline; clear answer to the research question with a labeled figure cited from the write-up

Study Guide

The Week 11 study guide PDF is on the Presentations page: Week 11 Assessment Study Guide. It covers the four second-half methods (clustering, word embeddings, sentiment, LDA), the workflows for each, and the key terms.