Lecture 18
Dr. Elijah Meyer + Konnie Huang
Duke University
STA 199 - Fall 2022
October 31st, 2022
– Clone ae-17
– If you do not have ae-17
go here: https://github.com/sta199-f22-2/ae-17
– Lab Feedback (Check it regardless of your grade)
– HW4 Question 2
– Overfitting
– “New” function in R
Summary on Regression
– Why we model data?
– Can I write out models?
– Can I interpret model output?
– Do I understand the difference between models?
Discuss the difference between the two models below. Which model would you prefer to fit to model these data? Why?
– Overfitting is a concept in data science, which occurs when a statistical model fits exactly against its data.
– This doesn’t make sense if are goal is to predict!
Predict the value of our response
Estimate the effect of some explanatory variable on the response (examine the relationship)
Make inference about some larger population (coming later)
Quantitative Response
Does it make sense
To test hypotheses and make conclusions, certain assumptions need to be met….
– Normality of Residuals
– Linearity
– Independence
– Constant Variance
– Multiple linear regression is used to estimate the relationship between two or more explanatory variables and one response variable.
– we want to predict the value of a variable based on the value of two or more other variables
– “account for X1 and assess the relationship between X2 and Y” - main effects
– “does X1’s relationship with Y change based on X2”
– What if the response variable is categorical?