Sentiment Analysis: Moon Jae-in's Tweets

Explore how dictionary-based sentiment analysis scores 3,148 tweets from President Moon Jae-in's Twitter account (@moonriver365, 2012–2020). Each tweet is tokenized with Kiwi, then matched against the KNU sentiment dictionary.

Week 9 Kiwi + KNU sentiment dictionary 3,148 tweets, 3 periods

The Corpus

The @moonriver365 corpus has 3,148 tweets from 2012 to 2020, sorted into three political periods. Before we score anything, here's what the corpus looks like year by year.

The three periods are Dr. Denney's editorial groupings, not an official classification. Another researcher could cut the timeline differently.

Pre-presidency (2012-01 → 2016-11, 1,973 tweets) — Moon as opposition leader: 2012 campaign, Democratic Party chairmanship, legislative politics.
Transition (2016-12 → 2017-05, 393 tweets) — Park Geun-hye impeachment crisis through Moon's early presidential campaign and inauguration (May 10, 2017).
Presidency (2017-05 → 2020-06, 782 tweets) — Moon in office: inter-Korean summits (2018), Japan trade dispute (2019), COVID-19 (2020).

Loading tweet data…

Moon was most active on Twitter during his 2012 presidential campaign. After taking office in May 2017, his tweeting dropped sharply — the official Cheong Wa Dae account took over most communication.

Show R code: load and count the corpusR

library(tidyverse)
library(tidytext)

# Load the tweet corpus
tweets <- read_csv("moon_twitter.csv") |>
  filter(!is.na(text)) |>
  mutate(
    tweet_id = row_number(),
    text = text |>
      str_remove_all("https?://\\S+") |>  # URLs
      str_remove_all("@\\w+") |>             # @mentions
      str_trim()
  )

# Tweets per year, split by period
tweets |>
  count(tweet_year, period3) |>
  ggplot(aes(x = tweet_year, y = n, fill = period3)) +
  geom_col() +
  scale_fill_manual(values = c(
    pre_presidency = "#6366f1",
    transition = "#f59e0b",
    presidency = "#10b981")) +
  labs(title = "Moon Jae-in tweets per year",
       x = "", y = "Tweets") +
  theme_minimal()

This is the R-equivalent of Orange's Corpus widget loading the CSV. We add a tweet_id for joining later and clean URLs/mentions, then count tweets per year colored by period.

Dictionary Scoring

How one tweet becomes a sentiment score: look up each word in the positive and negative lists. Pick a tweet below to see the words that matched the KNU positive or negative list, then compute the score.

Loading examples…

Show R code: tidytext-style sentiment scoring with KNUR

library(tidyverse)
library(tidytext)
library(elbird)  # Kiwi wrapper for R (Korean morphological analyzer)

# 1. Build the KNU sentiment lexicon as a tibble
knu <- bind_rows(
  tibble(word = read_lines("positive.txt"), sentiment = "positive"),
  tibble(word = read_lines("negative.txt"), sentiment = "negative")
) |>
  mutate(word = str_trim(word)) |>
  filter(word != "")

# 2. Kiwi tokenize tweets, keep content words
tokens <- tweets |>
  mutate(toks = map(text, ~tokenize(.x, flatten = TRUE))) |>
  unnest(toks) |>
  filter(
    tag %in% c("NNG", "NNP", "VA", "VV"),
    str_length(form) >= 2
  ) |>
  select(tweet_id, period3, tweet_date, word = form)

# 3. tidytext-style sentiment: inner_join + count + pivot
sentiment_scores <- tokens |>
  inner_join(knu, by = "word") |>
  count(tweet_id, period3, tweet_date, sentiment) |>
  pivot_wider(names_from = sentiment, values_from = n,
              values_fill = 0) |>
  mutate(score = positive - negative)

head(sentiment_scores)

In Orange (what you'll build): load positive.txt and negative.txt into Sentiment Analysis → Custom Dictionary. In R: the code above uses tidytext's inner_join pattern with a simpler positive - negative count. Exact numbers differ but rankings agree.

Score Distribution

Distribution of sentiment scores across all tweets. Many tweets have no dictionary matches and score zero; the rest lean positive. Toggle periods to see how presidential communication differs from pre-presidency.

Loading distribution…

Show R code: plot score distributionR

# Histogram of sentiment scores
ggplot(scored, aes(x = score)) +
  geom_histogram(binwidth = 1, fill = "#6366f1", alpha = 0.7,
                 color = "white") +
  labs(title = "Sentiment Score Distribution",
       x = "Score (positive - negative)", y = "Count") +
  theme_minimal()

# Faceted by period
ggplot(scored, aes(x = score, fill = period3)) +
  geom_histogram(binwidth = 1, alpha = 0.7, color = "white") +
  facet_wrap(~ period3, ncol = 1) +
  scale_fill_manual(values = c(
    pre_presidency = "#6366f1",
    transition = "#f59e0b",
    presidency = "#10b981")) +
  theme_minimal() +
  theme(legend.position = "none")

In Orange: connect Score Documents to a Distributions widget and select the score column.

By Period

Comparing sentiment across Moon Jae-in's three political periods. The box shows the middle 50% of scores, the line inside the box is the median, the diamond marks the mean, and whiskers show the score range.

Loading period data…

Show R code: box plot by periodR

# Box plot comparing periods
scored |>
  mutate(period3 = factor(period3,
    levels = c("pre_presidency", "transition", "presidency"))) |>
  ggplot(aes(x = period3, y = score, fill = period3)) +
  geom_boxplot(alpha = 0.7, outlier.alpha = 0.3) +
  scale_fill_manual(values = c(
    pre_presidency = "#6366f1",
    transition = "#f59e0b",
    presidency = "#10b981")) +
  labs(title = "Sentiment by Political Period",
       x = "", y = "Sentiment Score") +
  theme_minimal() +
  theme(legend.position = "none")

# Summary statistics
scored |>
  group_by(period3) |>
  summarise(n = n(), mean = mean(score),
            median = median(score), sd = sd(score))

In Orange: connect your scored data to a Box Plot widget, set the subgroup to period3.

Over Time

Sentiment trends across 8 years. The dark trend line is a 120-day rolling average that runs from 2012 to 2020. Hover individual dots to read tweets. Dashed lines mark key events.

The visible rise around inauguration (May 2017) reflects the shift to presidential communication. The dip in mid-2019 aligns with the Japan trade dispute. Note: Moon barely tweeted in 2013–2015, so the line in that stretch is an average of much thinner data — the jitter there reflects the thin sample, not a real mood swing.

Show R code: sentiment over time with trend lineR

# Sentiment over time with LOESS trend
scored |>
  mutate(tweet_date = as.Date(tweet_date)) |>
  ggplot(aes(x = tweet_date, y = score,
             color = period3)) +
  geom_point(alpha = 0.15, size = 1) +
  geom_smooth(aes(group = 1), method = "loess",
             span = 0.15, color = "#001158",
             se = FALSE, linewidth = 1) +
  scale_color_manual(values = c(
    pre_presidency = "#6366f1",
    transition = "#f59e0b",
    presidency = "#10b981")) +
  # Mark key events
  geom_vline(xintercept = as.Date("2017-05-09"),
             linetype = "dashed", alpha = 0.5) +
  geom_vline(xintercept = as.Date("2019-07-04"),
             linetype = "dashed", alpha = 0.5) +
  labs(title = "Sentiment Over Time",
       x = "", y = "Score") +
  theme_minimal()

Explore Tweets

Top matched words, plus the most positive and most negative tweets. Browse by sentiment score or engagement.

Loading tweets…

Show R code: explore top words and extreme tweetsR

# Top matched sentiment words, by polarity
tokens |>
  inner_join(knu, by = "word") |>
  count(sentiment, word, sort = TRUE) |>
  group_by(sentiment) |>
  slice_max(n, n = 10)

# Most positive and most negative tweets
extreme_tweets <- sentiment_scores |>
  inner_join(tweets |> select(tweet_id, text),
             by = "tweet_id")

extreme_tweets |> slice_max(score, n = 5)
extreme_tweets |> slice_min(score, n = 5)

# Box plot: sentiment by period
sentiment_scores |>
  ggplot(aes(x = period3, y = score, fill = period3)) +
  geom_boxplot(alpha = 0.7) +
  theme_minimal()

In Orange: connect scored data to Corpus Viewer and sort by the score column. Click any tweet to read the full text.