Welcome to STA 199

Lecture 1

Dr. Elijah Meyer + Konnie Huang

Duke University
STA 199 - Fall 2022

August 29, 2022

Welcome

Syllabus

sta199-f22-2.github.io/

Goals for Day 1

  • Get to know the professor & ta
  • Get to know each other
  • Course overview
  • Register with GitHub & Slack
  • Introduce R + start thinking about data

Who Am I?

 

Who Are You?

Please share with your neighbors:

  • Major
  • Year
  • Why you taking the course
  • What other courses are you taking
  • Anything else

Our Classroom

  • Community
  • Communication
  • Respect

What is data science?

“Data science is a concept to unify statistics, data analysis, machine learning and their related methods in order to understand and analyze actual phenomena with data. It employs techniques and theories drawn from many fields within the context of mathematics, statistics, information science, and computer science.”

-Wikipedia

Examples of data science

  • Identification and prediction of disease
  • Targeted advertising
  • Supply chain optimization
  • Sports recruitment + strategist
  • The list goes on and on…..

Jobs

Jobs

Data literacy

Course objectives

  • Learn to explore, visualize, and analyze data in a reproducible and shareable manner

  • Gain experience in data wrangling, exploratory data analysis, predictive modeling, and data visualization

  • Work on problems and case studies inspired by and based on real-world questions and data

  • Learn to effectively communicate results through written assignments and final project presentation

Some of what you will learn

– Fundamentals of R

– Data visualization

– Web scraping

– Version control with GitHub

– Reproducible reports with Quarto

– Regression and classification

– Statistical inference

R - figures

Figure 1: Example R Figures

R - apps

R

Note

This is a new language

Workflow

Before Class

  • Watch lecture content videos

During Class

  • Warm up question

  • Mix of lecture and interaction

Please bring your laptops if able

Activities and assessments

  • Homework: Individual assignments combining conceptual and computational skills.

  • Labs: Individual or team assignments focusing on computational skills.

  • Exams: Two take-home exams.

  • Final Project: Team project presented during the final exam period.

  • Application Exercises: Exercises worked on during the live lecture session.

  • Statistics Experiences: Engage with statistics outside of the classroom and reflect on your experience.

Lab

  • Focus on computing using R tidyverse syntax

  • Apply concepts from lecture to case study scenarios

  • Work on labs individually or in teams of 3 - 4

Textbooks and readings

  • R for Data Science by Grolemund & Wickham (2nd ed. O’Reilly)

  • Introduction to Modern Statistics by Cetinkaya-Rundel & Hardin (1st ed. OpenIntro)

Where to find help in the course

  • Attend Office hours to meet with a member of the teaching team

  • Email me to set up a time: esm70@duke.edu

Academic Resource Center

The Academic Resource Center (ARC) offers free services to all students during their undergraduate careers at Duke.

Services include:

– Learning Consultations

– Peer Tutoring and Study Groups

– ADHD/LD Coaching, Outreach Workshops and more

Contact the ARC at ARC@duke.edu or call 919-684-5917 to schedule an appointment.

Create a GitHub account (Why?)

GitHub, Inc., is an Internet hosting service for software development and version control.

Create a GitHub account

Please do this before the Getting to know you survey

Go to https://github.com/, and create an account (unless you already have one).

Some tips from Happy Git with R.

– Incorporate your actual name! – Reuse your username from other contexts if you can, e. g., Twitter or Slack. – Pick a username you will be comfortable revealing to your future boss. – Be as unique as possible in as few characters as possible. Shorter is better than longer. – Avoid words with special meaning in programming (e.g. NA).

Slack

Please click the Slack link located here or in Sakai announcements to be a part of the discussions!

R-Studio

– Reserve a STA198-1991 RStudio container – Go to https://vm-manage.oit.duke.edu/containers – Click Reserve Container for the STA198-199 container

Our Turn

Clone a repo so you have access to the qmd

For Wednesday

– We’ll start talking about the computing toolkit

– Watch videos for Wednesday

– Complete Getting to Know You Survey

Please bring laptop to class if able for next time!