Final Assessment Exemplars
Task A · KJYG sentiment across leader eras
Research question
Is the sentiment of Kyongje Yongu articles measurably different across the three NK leader eras?
Method
Tokenize each of the 360 articles with Kiwi (keep nouns, verbs, adjectives). Match tokens against the KNU sentiment dictionary. Score each article as (positive hits minus negative hits) divided by the total token count. Aggregate by era and run pairwise t-tests.
Findings
| Era | N | Mean | Median |
|---|---|---|---|
| Kim Il-sung (1987–1994) | 120 | +0.011 | +0.008 |
| Kim Jong-il (1995–2011) | 120 | +0.003 | +0.003 |
| Kim Jong-un (2012–2017) | 120 | +0.007 | +0.010 |
The pattern is U-shaped. Kim Il-sung scores highest, Kim Jong-il lowest, Kim Jong-un sits between. The Il-sung vs. Jong-il gap is statistically distinguishable (Welch t = +2.35, p = 0.020); the other two pairs are not.
The dip is in the late 1990s, around the Arduous March famine years. Kim Jong-un articles are higher on average than Kim Jong-il articles but still lower than the late Kim Il-sung baseline.
Interpretation
Era-level sentiment shifts track historical shocks more than they track regimes. The Kim Jong-il era looks more negative on average because the famine-era articles drag the mean down. By the 2010s the register has shifted toward "our-style economic management" language, which carries higher KNU positivity.
Limitations
KNU is contemporary South Korean, so some North Korean economic vocabulary will be missed. The absolute scores are small (3 to 11 polarity-bearing tokens per 1,000), so read these as relative comparisons across eras.
Task B · Petition topics across categories
Research question
What latent topics run through the Cheong Wa Dae citizen petitions, and how cleanly do those topics map onto the six official policy categories?
Method
Tokenize each of the 360 petitions with Kiwi (keep nouns, verbs, adjectives). Fit LDA at k=8, deliberately more topics than the six official categories. Aggregate the document-topic distribution by category and take the mean topic share within each category.
Findings
| Topic | My label | Top words |
|---|---|---|
| T0 | Civil-service jobs and work hours | 공무원 · 의무 · 시간 · 일자리 · 근무 |
| T1 | Schools and childcare | 아이 · 교육 · 학교 · 유치원 · 선생 |
| T2 | Hospitals and medicine | 병원 · 환자 · 의료 · 치료 · 인권 |
| T3 | Gender and punishment | 여성 · 남성 · 사회 · 청소년 · 처벌 |
| T4 | Wrongdoing, victims, foreign actors | 사람 · 회사 · 일본 · 불법 · 피해자 |
| T5 | Inter-Korean and presidential politics | 국민 · 북한 · 한국 · 대통령 · 정치 |
| T6 | Schoolwork and teachers | 학생 · 교사 · 학교 · 채용 · 수업 |
| T7 | Pensions and the state | 국민 · 국가 · 연금 · 청원 · 대통령 |
Five of the eight topics align tightly with one official category. T0 (civil-service jobs) concentrates in Jobs at 0.33. T2 (hospitals) peaks in Health and welfare at 0.23. T3 (gender) is sharpest of all, concentrating at 0.41 in Human rights / gender equality.
Two topics cross-cut. T5 (inter-Korean and presidential politics) sits at 0.27 in Political reform and 0.25 in Foreign / unification / defense. The shared vocabulary (대통령, 정치, 북한) is common to petitions in both categories.
Interpretation
LDA finds most of the categorical structure the platform's editors put in place. The cross-cutting topics are the more interesting result: petitions about the executive branch and petitions about North-South relations use overlapping vocabulary, even when their official category differs.
Limitations
Topic count is a choice. At k=6 the topics collapse toward the official categories. At k=12 several fragment. Topic labels are also interpretive: "gender and punishment" is one defensible reading of T3.
Task C · Clustering KJYG into four registers
Research question
If we cluster the 360 KJYG articles by the words they use, what makes each cluster distinctive in vocabulary and tone? Do the clusters track the leader eras?
Method
Tokenize each article with Kiwi (keep nouns, verbs, adjectives). Compute TF-IDF. Run hierarchical clustering with Ward linkage. Cut at k=4. Characterize each cluster by its top distinctive terms (mean TF-IDF inside the cluster minus mean outside). Pull in per-document KNU sentiment from Task A so each cluster has a tone reading.
Findings
| Cluster | My label | Top distinctive terms | N | Sent. |
|---|---|---|---|---|
| C1 | Anti-imperialist polemic | 자본주의 · 미국 · 자본 · 독점 · 제국주의 · 위기 · 시장 · 자본가 | 63 | −0.005 |
| C2 | Technical / managerial economics | 제품 · 계산 · 정보 · 효과 · 지출 · 수입 · 지표 · 경영 | 193 | +0.008 |
| C3 | Songun and heavy industry | 국방 · 혁명 · 선군 · 조국 · 공업 · 중공업 · 자립 · 군사 | 38 | +0.013 |
| C4 | Mass-line collectivism | 주인 · 대중 · 지도 · 사상 · 집단 · 집단주의 · 의식 · 공산주의 | 66 | +0.013 |
Cluster 1 is the only cluster with negative mean sentiment. The vocabulary explains why: 위기 (crisis), 제국주의 (imperialism), 미제 (US imperialism), 독점 (monopoly). The negative scoring is about capitalism and the United States, not about North Korea itself.
Cluster 4 (mass-line collectivism) is 67% Kim Il-sung. Cluster 3 (Songun) is 58% Kim Jong-il, exactly when Songun was the operative state policy. Cluster 2 (technical economics, the largest at n=193) skews plurality Kim Jong-un at 44%, consistent with the post-2012 "our-style economic management" rhetoric. Cluster 1 (anti-imperialist polemic) is roughly even across all three eras.
Interpretation
Kyongje Yongu uses at least four distinct registers. Two are era-specific; two are present in every era. The Task A finding that Kim Jong-il-era articles are slightly more negative on average is partly explained here: the negative scoring is concentrated in cluster 1, which is roughly the same size in every era. What changes between eras is the mix of the other three registers.
Limitations
Cluster sizes are uneven. Cluster 2 contains 54% of all articles. Trying complete linkage or k-means at the same k would help check whether the imbalance is real or an artifact of the chosen distance.