Sentiment Analysis: Moon Jae-in's Tweets
Explore how dictionary-based sentiment analysis scores 3,148 tweets from President Moon Jae-in's Twitter account (@moonriver365, 2012–2020). Each tweet is tokenized with Kiwi, then matched against the KNU sentiment dictionary.
The Corpus
The @moonriver365 corpus has 3,148 tweets from 2012 to 2020, sorted into three political periods. Before we score anything, here's what the corpus looks like year by year.
Pre-presidency (2012-01 → 2016-11, 1,973 tweets) — Moon as opposition leader: 2012 campaign, Democratic Party chairmanship, legislative politics.
Transition (2016-12 → 2017-05, 393 tweets) — Park Geun-hye impeachment crisis through Moon's early presidential campaign and inauguration (May 10, 2017).
Presidency (2017-05 → 2020-06, 782 tweets) — Moon in office: inter-Korean summits (2018), Japan trade dispute (2019), COVID-19 (2020).
Loading tweet data…
Show R code: load and count the corpusR
library(tidyverse)
library(tidytext)
# Load the tweet corpus
tweets <- read_csv("moon_twitter.csv") |>
filter(!is.na(text)) |>
mutate(
tweet_id = row_number(),
text = text |>
str_remove_all("https?://\\S+") |> # URLs
str_remove_all("@\\w+") |> # @mentions
str_trim()
)
# Tweets per year, split by period
tweets |>
count(tweet_year, period3) |>
ggplot(aes(x = tweet_year, y = n, fill = period3)) +
geom_col() +
scale_fill_manual(values = c(
pre_presidency = "#6366f1",
transition = "#f59e0b",
presidency = "#10b981")) +
labs(title = "Moon Jae-in tweets per year",
x = "", y = "Tweets") +
theme_minimal()
tweet_id for joining later and clean URLs/mentions, then count tweets per year colored by period.Dictionary Scoring
How one tweet becomes a sentiment score: look up each word in the positive and negative lists. Pick a tweet below to see the words that matched the KNU positive or negative list, then compute the score.
Loading examples…
Show R code: tidytext-style sentiment scoring with KNUR
library(tidyverse)
library(tidytext)
library(elbird) # Kiwi wrapper for R (Korean morphological analyzer)
# 1. Build the KNU sentiment lexicon as a tibble
knu <- bind_rows(
tibble(word = read_lines("positive.txt"), sentiment = "positive"),
tibble(word = read_lines("negative.txt"), sentiment = "negative")
) |>
mutate(word = str_trim(word)) |>
filter(word != "")
# 2. Kiwi tokenize tweets, keep content words
tokens <- tweets |>
mutate(toks = map(text, ~tokenize(.x, flatten = TRUE))) |>
unnest(toks) |>
filter(
tag %in% c("NNG", "NNP", "VA", "VV"),
str_length(form) >= 2
) |>
select(tweet_id, period3, tweet_date, word = form)
# 3. tidytext-style sentiment: inner_join + count + pivot
sentiment_scores <- tokens |>
inner_join(knu, by = "word") |>
count(tweet_id, period3, tweet_date, sentiment) |>
pivot_wider(names_from = sentiment, values_from = n,
values_fill = 0) |>
mutate(score = positive - negative)
head(sentiment_scores)
positive.txt and negative.txt into Sentiment Analysis → Custom Dictionary. In R: the code above uses tidytext's inner_join pattern with a simpler positive - negative count. Exact numbers differ but rankings agree.Score Distribution
Distribution of sentiment scores across all tweets. Many tweets have no dictionary matches and score zero; the rest lean positive. Toggle periods to see how presidential communication differs from pre-presidency.
Loading distribution…
Show R code: plot score distributionR
# Histogram of sentiment scores
ggplot(scored, aes(x = score)) +
geom_histogram(binwidth = 1, fill = "#6366f1", alpha = 0.7,
color = "white") +
labs(title = "Sentiment Score Distribution",
x = "Score (positive - negative)", y = "Count") +
theme_minimal()
# Faceted by period
ggplot(scored, aes(x = score, fill = period3)) +
geom_histogram(binwidth = 1, alpha = 0.7, color = "white") +
facet_wrap(~ period3, ncol = 1) +
scale_fill_manual(values = c(
pre_presidency = "#6366f1",
transition = "#f59e0b",
presidency = "#10b981")) +
theme_minimal() +
theme(legend.position = "none")
By Period
Comparing sentiment across Moon Jae-in's three political periods. The box shows the middle 50% of scores, the line inside the box is the median, the diamond marks the mean, and whiskers show the score range.
Loading period data…
Show R code: box plot by periodR
# Box plot comparing periods
scored |>
mutate(period3 = factor(period3,
levels = c("pre_presidency", "transition", "presidency"))) |>
ggplot(aes(x = period3, y = score, fill = period3)) +
geom_boxplot(alpha = 0.7, outlier.alpha = 0.3) +
scale_fill_manual(values = c(
pre_presidency = "#6366f1",
transition = "#f59e0b",
presidency = "#10b981")) +
labs(title = "Sentiment by Political Period",
x = "", y = "Sentiment Score") +
theme_minimal() +
theme(legend.position = "none")
# Summary statistics
scored |>
group_by(period3) |>
summarise(n = n(), mean = mean(score),
median = median(score), sd = sd(score))
period3.Over Time
Sentiment trends across 8 years. The dark trend line is a 120-day rolling average that runs from 2012 to 2020. Hover individual dots to read tweets. Dashed lines mark key events.
Show R code: sentiment over time with trend lineR
# Sentiment over time with LOESS trend
scored |>
mutate(tweet_date = as.Date(tweet_date)) |>
ggplot(aes(x = tweet_date, y = score,
color = period3)) +
geom_point(alpha = 0.15, size = 1) +
geom_smooth(aes(group = 1), method = "loess",
span = 0.15, color = "#001158",
se = FALSE, linewidth = 1) +
scale_color_manual(values = c(
pre_presidency = "#6366f1",
transition = "#f59e0b",
presidency = "#10b981")) +
# Mark key events
geom_vline(xintercept = as.Date("2017-05-09"),
linetype = "dashed", alpha = 0.5) +
geom_vline(xintercept = as.Date("2019-07-04"),
linetype = "dashed", alpha = 0.5) +
labs(title = "Sentiment Over Time",
x = "", y = "Score") +
theme_minimal()
Explore Tweets
Top matched words, plus the most positive and most negative tweets. Browse by sentiment score or engagement.
Loading tweets…
Show R code: explore top words and extreme tweetsR
# Top matched sentiment words, by polarity
tokens |>
inner_join(knu, by = "word") |>
count(sentiment, word, sort = TRUE) |>
group_by(sentiment) |>
slice_max(n, n = 10)
# Most positive and most negative tweets
extreme_tweets <- sentiment_scores |>
inner_join(tweets |> select(tweet_id, text),
by = "tweet_id")
extreme_tweets |> slice_max(score, n = 5)
extreme_tweets |> slice_min(score, n = 5)
# Box plot: sentiment by period
sentiment_scores |>
ggplot(aes(x = period3, y = score, fill = period3)) +
geom_boxplot(alpha = 0.7) +
theme_minimal()