Quick Start: Tidyverse & the Pipe Operator

A short primer on the R packages and syntax used in this course

Pre-requisite R + RStudio Read before the weekly exercises

The interactive exercises on this site use tidyverse — a collection of R packages designed for data science. If you have been learning base R through Swirl and DataCamp, the tidyverse code might look a little different. This page covers the essentials: what tidyverse is, how to install it, and the one piece of syntax you really need to know — the pipe operator.

What Is Tidyverse?

Tidyverse is a bundle of R packages that work together. When you run library(tidyverse), you load all of them at once. Here are the ones we use most:

dplyr

Filter rows, select columns, count, group, summarize

tidyr

Reshape data — pivot wider/longer, nest/unnest

readr

Fast CSV reading with read_csv()

ggplot2

Plots and charts — bar plots, word clouds, etc.

stringr

String manipulation — detect, replace, trim text

purrr

Apply functions across lists and columns with map()

# Install tidyverse (only need to do this once)
install.packages("tidyverse")

# Load it at the start of every script
library(tidyverse)

Tidyverse vs. base R: You are not replacing what you learned in Swirl — base R functions like hclust(), dist(), and as.matrix() work exactly the same. Tidyverse just adds convenient tools for reading data, wrangling tables, and making plots.

The Pipe Operator

The pipe takes the result on the left and feeds it as the first argument to the function on the right. Instead of nesting functions inside each other, you read the code left to right, top to bottom — like a recipe.

data |> step_1() |> step_2() |> step_3() → result

R has two pipe operators. They do the same thing — you will see both in examples online:

|> (base R pipe, R 4.1+)

corpus |>
  filter(era == "Colonial") |>
  select(title, year)

%>% (tidyverse/magrittr pipe)

corpus %>%
  filter(era == "Colonial") %>%
  select(title, year)

Which one to use? We use |> (the base R pipe) throughout this course. It is built into R — no extra packages needed. If you see %>% in online examples or tutorials, it works the same way.

Base R vs. Tidyverse: Side by Side

Here are three common tasks written both ways. Neither is wrong — tidyverse just reads more like plain English when you chain multiple steps together.

Read a CSV and look at it:

Base R

corpus <- read.csv("data.csv")
head(corpus)

Tidyverse

corpus <- read_csv("data.csv")
corpus |> glimpse()

Filter rows and select columns:

Base R

sub <- corpus[corpus$era == "Colonial", ]
sub <- sub[, c("title", "year")]

Tidyverse

corpus |>
  filter(era == "Colonial") |>
  select(title, year)

Count words and get the top 10:

Base R

counts <- table(tokens$word)
counts <- sort(counts, decreasing = TRUE)
head(counts, 10)

Tidyverse

tokens |>
  count(word, sort = TRUE) |>
  slice_head(n = 10)

The pattern: Start with your data, then pipe it through a chain of verbs — filter(), select(), count(), mutate(), group_by(), summarize(). Each verb does one thing. The pipe connects them. That is most of what you need.

Key Verbs Cheat Sheet

These are the tidyverse functions that appear most often in our exercises. All come from the dplyr package (loaded automatically with library(tidyverse)).

R — Quick Reference

# ── Select columns ────────────────────────────────────────────────
corpus |> select(book_id, title, era)

# ── Filter rows ───────────────────────────────────────────────────
corpus |> filter(era == "Colonial")

# ── Add or modify a column ────────────────────────────────────────
corpus |> mutate(title_short = str_trunc(title, 20))

# ── Count occurrences ─────────────────────────────────────────────
tokens |> count(word, sort = TRUE)

# ── Group and summarize ───────────────────────────────────────────
tokens |>
  group_by(era) |>
  summarize(total = n())

# ── Sort rows ─────────────────────────────────────────────────────
word_counts |> arrange(desc(n))

# ── Join two tables ───────────────────────────────────────────────
tokens |> left_join(corpus, by = "book_id")

# ── Print more rows ───────────────────────────────────────────────
result |> print(n = 20)