Quick Start: Tidyverse & the Pipe Operator
A short primer on the R packages and syntax used in this course
The interactive exercises on this site use tidyverse — a collection of R packages designed for data science. If you have been learning base R through Swirl and DataCamp, the tidyverse code might look a little different. This page covers the essentials: what tidyverse is, how to install it, and the one piece of syntax you really need to know — the pipe operator.
What Is Tidyverse?
Tidyverse is a bundle of R packages that work together. When you run library(tidyverse), you load all of them at once. Here are the ones we use most:
read_csv()map()# Install tidyverse (only need to do this once)
install.packages("tidyverse")
# Load it at the start of every script
library(tidyverse)
hclust(), dist(), and as.matrix() work exactly the same. Tidyverse just adds convenient tools for reading data, wrangling tables, and making plots.
The Pipe Operator
The pipe takes the result on the left and feeds it as the first argument to the function on the right. Instead of nesting functions inside each other, you read the code left to right, top to bottom — like a recipe.
R has two pipe operators. They do the same thing — you will see both in examples online:
corpus |>
filter(era == "Colonial") |>
select(title, year)
corpus %>%
filter(era == "Colonial") %>%
select(title, year)
|> (the base R pipe) throughout this course. It is built into R — no extra packages needed. If you see %>% in online examples or tutorials, it works the same way.
Base R vs. Tidyverse: Side by Side
Here are three common tasks written both ways. Neither is wrong — tidyverse just reads more like plain English when you chain multiple steps together.
Read a CSV and look at it:
corpus <- read.csv("data.csv")
head(corpus)
corpus <- read_csv("data.csv")
corpus |> glimpse()
Filter rows and select columns:
sub <- corpus[corpus$era == "Colonial", ]
sub <- sub[, c("title", "year")]
corpus |>
filter(era == "Colonial") |>
select(title, year)
Count words and get the top 10:
counts <- table(tokens$word)
counts <- sort(counts, decreasing = TRUE)
head(counts, 10)
tokens |>
count(word, sort = TRUE) |>
slice_head(n = 10)
filter(), select(), count(), mutate(), group_by(), summarize(). Each verb does one thing. The pipe connects them. That is most of what you need.
Key Verbs Cheat Sheet
These are the tidyverse functions that appear most often in our exercises. All come from the dplyr package (loaded automatically with library(tidyverse)).
# ── Select columns ────────────────────────────────────────────────
corpus |> select(book_id, title, era)
# ── Filter rows ───────────────────────────────────────────────────
corpus |> filter(era == "Colonial")
# ── Add or modify a column ────────────────────────────────────────
corpus |> mutate(title_short = str_trunc(title, 20))
# ── Count occurrences ─────────────────────────────────────────────
tokens |> count(word, sort = TRUE)
# ── Group and summarize ───────────────────────────────────────────
tokens |>
group_by(era) |>
summarize(total = n())
# ── Sort rows ─────────────────────────────────────────────────────
word_counts |> arrange(desc(n))
# ── Join two tables ───────────────────────────────────────────────
tokens |> left_join(corpus, by = "book_id")
# ── Print more rows ───────────────────────────────────────────────
result |> print(n = 20)