Word Clouds: TF–IDF vs. Word Embeddings

(Files: cluster_wordclouds_tfidf.png and cluster_wordclouds_wordembeddings.png) Due: Thursday, November 20, 2025 by 17:00

For this task, examine the two provided images:

  1. cluster_wordclouds_tfidf.png — word cloud based on TF–IDF weights.
  2. cluster_wordclouds_wordembeddings.png — word cloud based on word-embedding (vector) similarity.

Your Task

Write a short response that clearly addresses:

  1. What each image is showing.
    Describe what you see in each cloud and what it suggests about the “modern (South) Korea” cluster.

  2. Why the metrics themselves are different.
    Explain, as technically and specifically as you can,

    • what TF–IDF measures,
    • what word embeddings measure,
    • and why these represent fundamentally different mathematical approaches to text (e.g., frequency-based vs. distributional/semantic vector representations).

Keep it concise but technically accurate. The goal is to show that you understand the difference in what the metrics quantify, not just the visual outcome.