Logistic Regression

Lecture 19

Dr. Elijah Meyer + Konnie Huang

Duke University
STA 199 - Fall 2022

November 2nd, 2022

Checklist

– Clone ae-18

Announcements

– You have a final-project repo. Clone it before lab tomorrow.

– HW4 extended to Thursday - Check Sakai

Goals

– Testing and Training data

– The What, Why, and How of Logistic Regression

Warm Up

– What is a testing data set?

– What is a training data set?

Warm Up

– What is a testing data set?

“Sandbox” for model building. Build the model on these data.

– What is a training data set?

Held in reserve to test one or two chosen models.

Evaluate the performance

What is Logistic Regreesion

  • Similar to linear regression…. but

  • Modeling tool when our response is categorical

What we will do today

Start from the beginning

Terms

– Bernoulli Distribution

  • 2 outcomes: Success (p) or Failure (1-p)

  • \(y_i\) ~ Bern(p)

  • What we can do is we can use our explanatory variable(s) to model p

2 Steps

– 1: Define a linear model

– 2: Define a link function

A linear model

\(p_i = \beta_o + \beta_1*X_1 + ...\)

  • But we can’t stop here

  • Next, we need a link function that relates the linear model to the parameter of the outcome distribution i.e. transform the linear model to have an appropriate range

Generalized linear model

Goal

– Or…. takes values between negative and positive infinity and map them to probabilities

What’s this look like

Takes a [0,1] probability and maps it to log odds (-\(\infty\) to \(\infty\).)

This isn’t exactly what we need though…..

– Recall, the goal is to take values between -\(\infty\) and \(\infty\) and map them to probabilities. We need the opposite of the link function… or the inverse

– How do we take the inverse of a natural log?

  • Taking the inverse of the logit function will map arbitrary real values back to the range [0, 1]

Generalized linear model

  • logit(p) is also known as the log-odds

  • logit(p) = \(log(\frac{p}{1-p})\)

  • \(logit(p_i)\) = \(\beta_o +\beta_1X1_i + ....\)

So

  • \(logit(p_i)\) = \(\beta_o +\beta_1X1_i + ....\)

  • \(log(\frac{p}{1-p})\) = \(\beta_o +\beta_1X1_i + ....\)

  • Lets take the inverse of the logit function

\(p_i\) = \(\frac{e^{\beta_o + \beta_1X1 + ...}}{1 + e^{\beta_o + \beta_1X1 + ...}}\)

Example Figure:

Takeaways

– We can not model these data using the tools we currently have

– We can overcome some of the shortcoming of regression by fitting a generalized linear regression model

– We can model binary data using an inverse logit function to model probabilities of success