Final Assessment Exemplars

Three worked examples, one per Final Assessment task. Click a task to expand.

The figures below come from a Python pipeline that uses the same building blocks Orange does (Kiwi morphological tokenization, the KNU sentiment dictionary, scikit-learn LDA, hierarchical clustering with Ward linkage). Your Orange runs will produce close but slightly different numbers because random seeds and the exact preprocessing options differ. Read these as examples of the kind of answer the rubric rewards. An ideal answer states a clear research question, describes the method before showing results, cites a labeled figure from the prose, and ends with a short interpretation that explains the result.

Task A · KJYG sentiment across leader eras

Research question

Is the sentiment of Kyongje Yongu articles measurably different across the three NK leader eras?

Method

Tokenize each of the 360 articles with Kiwi (keep nouns, verbs, adjectives). Match tokens against the KNU sentiment dictionary. Score each article as (positive hits minus negative hits) divided by the total token count. Aggregate by era and run pairwise t-tests.

Findings

Figure 1. Sentiment distribution by era. Boxes show the interquartile range, the black bar is the median, the red diamond is the mean.

Era	N	Mean	Median
Kim Il-sung (1987–1994)	120	+0.011	+0.008
Kim Jong-il (1995–2011)	120	+0.003	+0.003
Kim Jong-un (2012–2017)	120	+0.007	+0.010

The pattern is U-shaped. Kim Il-sung scores highest, Kim Jong-il lowest, Kim Jong-un sits between. The Il-sung vs. Jong-il gap is statistically distinguishable (Welch t = +2.35, p = 0.020); the other two pairs are not.

Figure 2. Mean sentiment by year. Vertical lines mark leader transitions.

The dip is in the late 1990s, around the Arduous March famine years. Kim Jong-un articles are higher on average than Kim Jong-il articles but still lower than the late Kim Il-sung baseline.

Interpretation

Era-level sentiment shifts track historical shocks more than they track regimes. The Kim Jong-il era looks more negative on average because the famine-era articles drag the mean down. By the 2010s the register has shifted toward "our-style economic management" language, which carries higher KNU positivity.

Limitations

KNU is contemporary South Korean, so some North Korean economic vocabulary will be missed. The absolute scores are small (3 to 11 polarity-bearing tokens per 1,000), so read these as relative comparisons across eras.

Task B · Petition topics across categories

Research question

What latent topics run through the Cheong Wa Dae citizen petitions, and how cleanly do those topics map onto the six official policy categories?

Method

Tokenize each of the 360 petitions with Kiwi (keep nouns, verbs, adjectives). Fit LDA at k=8, deliberately more topics than the six official categories. Aggregate the document-topic distribution by category and take the mean topic share within each category.

Findings

Topic	My label	Top words
T0	Civil-service jobs and work hours	공무원 · 의무 · 시간 · 일자리 · 근무
T1	Schools and childcare	아이 · 교육 · 학교 · 유치원 · 선생
T2	Hospitals and medicine	병원 · 환자 · 의료 · 치료 · 인권
T3	Gender and punishment	여성 · 남성 · 사회 · 청소년 · 처벌
T4	Wrongdoing, victims, foreign actors	사람 · 회사 · 일본 · 불법 · 피해자
T5	Inter-Korean and presidential politics	국민 · 북한 · 한국 · 대통령 · 정치
T6	Schoolwork and teachers	학생 · 교사 · 학교 · 채용 · 수업
T7	Pensions and the state	국민 · 국가 · 연금 · 청원 · 대통령

Heatmap of mean LDA topic share by petition category

Figure 1. Mean LDA topic share by petition category. Darker cells: topic words concentrate there.

Five of the eight topics align tightly with one official category. T0 (civil-service jobs) concentrates in Jobs at 0.33. T2 (hospitals) peaks in Health and welfare at 0.23. T3 (gender) is sharpest of all, concentrating at 0.41 in Human rights / gender equality.

Two topics cross-cut. T5 (inter-Korean and presidential politics) sits at 0.27 in Political reform and 0.25 in Foreign / unification / defense. The shared vocabulary (대통령, 정치, 북한) is common to petitions in both categories.

Bar chart showing concentration of T3 by category

Figure 2. Where T3 concentrates. Its mean share in Human rights / gender equality is three to four times the share in any other category.

Interpretation

LDA finds most of the categorical structure the platform's editors put in place. The cross-cutting topics are the more interesting result: petitions about the executive branch and petitions about North-South relations use overlapping vocabulary, even when their official category differs.

Limitations

Topic count is a choice. At k=6 the topics collapse toward the official categories. At k=12 several fragment. Topic labels are also interpretive: "gender and punishment" is one defensible reading of T3.

Task C · Clustering KJYG into four registers

Research question

If we cluster the 360 KJYG articles by the words they use, what makes each cluster distinctive in vocabulary and tone? Do the clusters track the leader eras?

Method

Tokenize each article with Kiwi (keep nouns, verbs, adjectives). Compute TF-IDF. Run hierarchical clustering with Ward linkage. Cut at k=4. Characterize each cluster by its top distinctive terms (mean TF-IDF inside the cluster minus mean outside). Pull in per-document KNU sentiment from Task A so each cluster has a tone reading.

Findings

Ward-linkage dendrogram of KJYG articles

Figure 1. Ward-linkage dendrogram (truncated to 30 leaves). Cutting at k=4 gives the four colored branches.

Cluster	My label	Top distinctive terms	N	Sent.
C1	Anti-imperialist polemic	자본주의 · 미국 · 자본 · 독점 · 제국주의 · 위기 · 시장 · 자본가	63	−0.005
C2	Technical / managerial economics	제품 · 계산 · 정보 · 효과 · 지출 · 수입 · 지표 · 경영	193	+0.008
C3	Songun and heavy industry	국방 · 혁명 · 선군 · 조국 · 공업 · 중공업 · 자립 · 군사	38	+0.013
C4	Mass-line collectivism	주인 · 대중 · 지도 · 사상 · 집단 · 집단주의 · 의식 · 공산주의	66	+0.013

Cluster 1 is the only cluster with negative mean sentiment. The vocabulary explains why: 위기 (crisis), 제국주의 (imperialism), 미제 (US imperialism), 독점 (monopoly). The negative scoring is about capitalism and the United States, not about North Korea itself.

Figure 2. Within-cluster era distribution. Two of the four clusters skew strongly to one era.

Cluster 4 (mass-line collectivism) is 67% Kim Il-sung. Cluster 3 (Songun) is 58% Kim Jong-il, exactly when Songun was the operative state policy. Cluster 2 (technical economics, the largest at n=193) skews plurality Kim Jong-un at 44%, consistent with the post-2012 "our-style economic management" rhetoric. Cluster 1 (anti-imperialist polemic) is roughly even across all three eras.

Interpretation

Kyongje Yongu uses at least four distinct registers. Two are era-specific; two are present in every era. The Task A finding that Kim Jong-il-era articles are slightly more negative on average is partly explained here: the negative scoring is concentrated in cluster 1, which is roughly the same size in every era. What changes between eras is the mix of the other three registers.

Limitations

Cluster sizes are uneven. Cluster 2 contains 54% of all articles. Trying complete linkage or k-means at the same k would help check whether the imbalance is real or an artifact of the chosen distance.

The pattern repeats across all three tasks. A clear research question, the method described before results, a labeled figure cited from the prose, a short interpretation that explains the result, and an honest note on what the method misses. That is the answer at the top of the rubric.