Final Assessment

Date: Monday, 4 May 2026 · In class

15:15–15:20 Announcements

15:20–16:20 Final assessment (60 minutes — quiz + application exercise)

16:20–16:30 Break

16:30–17:00 Discussion · introduction to the final paper

You have 60 minutes to complete both parts. The clock starts at 15:20.

The assessment has two parts, both completed during class on your laptop. The quiz covers the second-half methods from Weeks 7–10 (clustering, word embeddings, sentiment analysis, LDA / topic modeling). The application exercise asks you to build an Orange Data Mining pipeline that answers a research question on a new corpus.

Part 1 — Online Quiz (~15 min)

Ten multiple-choice questions covering Weeks 7–10. Auto-graded, one point per correct answer.

Take the Quiz

Please take this on your laptop (not your phone).

10 multiple-choice questions, one per page
Work independently
Closed book, closed notes — you may not look up answers, use notes, or consult any outside resources
Do not leave the survey page until you have completed the quiz

Part 2 — Application Exercise (~45 min)

Build pipelines in Orange Data Mining to answer two of the three sub-tasks below. You may either do both sub-tasks on a single dataset or do one sub-task on each — your choice.

For each sub-task, your answer rests on a visualization. Box Plot, violin plot, dendrogram, LDAvis view, t-SNE scatter, Word Cloud, and bar plot are all fair game — pick whichever best supports the claim you want to make.

Start a fresh Orange session. Do not load a previously saved workflow — build the pipeline from scratch. This is part of the assessment.

Datasets

Pick one or both. Download links below.

dataset1_kjyg_sample.csv — 360 articles from Kyongje Yongu (경제연구), the DPRK economics journal, 1987–2017. Balanced 120 articles per leader era (Kim Il-sung, Kim Jong-il, Kim Jong-un). Useful columns: era, year, issue, title, text. · Download CSV (~3.1 MB)
dataset2_bluehouse_petitions_sample.csv — 360 citizen petitions from the Cheong Wa Dae online platform, 2017–2018, balanced 60 per category. Useful columns: category, year, votes, title, text. · Download CSV (~600 KB)

Reference: data dictionary — Korean→English key for the petition category values, and the era reference for the KJYG era values.

The three sub-tasks

Task A — Did NK economic discourse shift tone across leader eras? Dataset: dataset1_kjyg_sample.csv. Research question: is the sentiment of Kyongje Yongu articles measurably different across the three NK leader eras? A good answer names the direction and rough size of the shift across eras and supports the claim with a visualization.

Task B — What latent topics cut across the petition categories? Dataset: dataset2_bluehouse_petitions_sample.csv. Research question: identify the latent topics in the petitions and look at how they map onto the six official categories. Some topics will line up neatly with one category; others will cross-cut several. A good answer labels two or three of the topics in plain language and identifies at least one category where one of those topics is clearly more or less prevalent than the others.

Task C — What makes each cluster distinctive? (either dataset) Dataset: your choice — either CSV. Research question: cluster the documents into 3–5 groups and characterize what makes each cluster distinctive in vocabulary or tone. A good answer gives each cluster a short label of your own and points to the vocabulary or sentiment evidence behind the label.

What to submit

Push the following to a week11/ folder in your GitHub repository:

File	Description
`workflow.ows`	Your Orange workflow
Exported figures (`.png`)	Use each widget’s built-in export option (right-click → Save Image, or the disk/camera icon). Label each clearly, e.g., `figure1_sentiment_by_era.png`
`analysis.md`	Short write-up — see below

analysis.md should, for each of your two sub-tasks: state which sub-task and on which dataset; refer to your figures by their labels (embedding the figures directly in the markdown is encouraged); answer the research question in 2–4 sentences citing the figure(s); and reflect on the figures themselves — what they show, what stood out, anything surprising or hard to interpret.

Steps:

Add all files to the week11/ folder in your repo
In GitHub Desktop: write a short commit message
Click Commit to main, then Push origin
Confirm your files appear on github.com in your repository

Tips

The preprocessing scripts on the Data & Scripts page are heavily annotated. By now you should be comfortable making at least light modifications — at minimum set TEXT_COLUMN = 'text' (both Week 11 CSVs use that column).
Save your workflow as you go (File → Save As, .ows). If Orange crashes you don’t want to start over.
Look at the data once before you trust the model (Corpus Viewer or Word Cloud after preprocessing). Bad tokenization is easy to spot in 30 seconds.
Pick the sub-task you can finish, not the one that sounds most impressive. A complete Task A beats an abandoned Task B.

Grading

Component	Scoring	Weight
Concepts Quiz (10 questions)	1 point each	Weighted to 8 points: (raw / 10) × 8
Application Exercise	0, 1, or 2 points (see rubric)	2 points
Total		out of 10

Application-exercise rubric:

Score	Criteria
0	Did not complete, or did not follow directions (e.g., loaded a previous workflow)
1	Attempted but incomplete — missing steps, pipeline errors, or write-up does not answer the question
2	Successful end-to-end pipeline; clear answer to the research question with a labeled figure cited from the write-up

Study Guide

The Week 11 study guide PDF is on the Presentations page: Week 11 Assessment Study Guide. It covers the four second-half methods (clustering, word embeddings, sentiment, LDA), the workflows for each, and the key terms.