Grammar of data wrangling

Lecture 5

Dr. Elijah Meyer + Konnie Huang

Duke University
STA 199 - Fall 2022

September 12, 2022

Checklist

– Clone your ae-04 repo.

– Turn in Lab 1 via Gradescope

– Reach out (OH should all be updated)

– Reminder: AEs due Thursday and Saturday 11:59; Labs due Monday; HWs due in 1 week from assigned

Goals

  • Understand why we need to manipulate data

  • Calculate summary measures for data sets

  • Manipulate the format of data

  • Practice with tidyverse functions

Can’t Commit?

Margins

In addition, the code should not exceed the 80 character limit, so that all the code can be read when you render to PDF. To help with this, you can add a vertical line at 80 characters by clicking “Tools” “Global Options” “Code” “Display”, then set “Margin Column” to 80, and click “Apply”.

Code Chunk Labels

– Informative names can help when navigating code.

– Informative names do not show up in Rendered documents (and that’s okay!)

R4DS: Chp 4 - Data transformation

https://app.sli.do/event/56i17rXu3VTsLVtwRZCX9w

Warm up

What is the difference between long and wide data?

Application exercise

ae-04

  • Go to the course GitHub org and find your ae-04 (repo name will be suffixed with your GitHub name).

  • Clone the repo in your container, open the Quarto document in the repo, and follow along and complete the exercises.

  • Render, commit, and push your edits by the AE deadline – 3 days from today.

Recap of AE

  • We can transform data to learn more about what’s going on

  • Pipe operator allows us to step through the process and combine multiple functions together

  • Data are messy. This are valuable tools to tell the story you want