Opinion articles in The Chronicle
Part 1 - Data scraping
This will be done in the chronicle.R
R script. Save the resulting data frame in the data
folder.
Suggested scraping code can be found here.
Part 2 - Data analysis
Let’s start by loading the packages we will need:
-
Your turn (1 minute): Load the data you saved into the
data
folder and name itchronicle
.
chronicle <- read_csv("data/chronicle.csv")
- Your turn (3 minutes): Who are the most prolific authors of the 100 most recent opinion articles in The Chronicle?
chronicle |>
count(author, sort = TRUE)
# A tibble: 69 × 2
author n
<chr> <int>
1 Anthony Salgado 3
2 Billy Cao 3
3 Community Editorial Board 3
4 Heidi Smith 3
5 Linda Cao 3
6 Luke A. Powery 3
7 Monday Monday 3
8 Sonia Green 3
9 Viktoria Wulff-Andersen 3
10 Abdel Shehata 2
# … with 59 more rows
- Demo: Draw a line plot of the number of opinion articles published per day in The Chronicle.
- Demo: What percent of the most recent 100 opinion articles in The Chronicle mention “climate” in their title?
chronicle |>
mutate(
title = str_to_lower(title),
climate = if_else(str_detect(title, "climate"), "mentioned", "not mentioned")
) |>
count(climate) |>
mutate(prop = n / sum(n))
# A tibble: 2 × 3
climate n prop
<chr> <int> <dbl>
1 mentioned 3 0.03
2 not mentioned 97 0.97
- Your turn (5 minutes): What percent of the most recent 100 opinion articles in The Chronicle mention “climate” in their title or abstract?
chronicle |>
mutate(
title = str_to_lower(title),
abstract = str_to_lower(abstract),
climate = if_else(
str_detect(title, "climate") | str_detect(abstract, "climate"),
"mentioned",
"not mentioned"
)
) |>
count(climate) |>
mutate(prop = n / sum(n))
# A tibble: 2 × 3
climate n prop
<chr> <int> <dbl>
1 mentioned 4 0.04
2 not mentioned 96 0.96
- Time permitting: Come up with another question and try to answer it using the data.
# add code here