Lecture 12
Dr. Elijah Meyer + Konnie Huang
Duke University 
 STA 199 - Fall 2022
October 5, 2022
– Clone ae-11-ethics-privacy
Lab Group Instructions:
– You can find your groups in the teams repo after class.
– You may only switch groups under extreme circumstances (working with friends does not count)
– View group number before Lab 04. This will make it easier for TAs to seat / group you.
Think
Data Ethics
Data privacy
Bias
Every time we use apps, websites, and devices, our data is being collected and used or sold to others. More importantly, decisions are made by law enforcement, financial institutions, and governments based on data that directly affect the lives of people.
– In 2016, researchers published data of 70,000 OkCupid users—including usernames, political leanings, drug usage, and intimate sexual details
– Researchers didn’t release the real names and pictures of OKCupid users, but their identities could easily be uncovered from the details provided, e.g. usernames
Some may object to the ethics of gathering and releasing this data. However, all the data found in the dataset are or were already publicly available, so releasing this dataset merely presents it in a more useful form.
Researchers Emil Kirkegaard and Julius Daugbjerg Bjerrekær
In analysis of data that individuals willingly shared publicly on a given platform (e.g. social media), how do you make sure you don’t violate reasonable expectations of privacy?
Should you scrape these data?
How do you not violate reasonable expectations of privacy?
– Name
– Age
– Phone Number
– How long you spend on different content
– List of all your private messages (date, time, person sent to)
– Info about your photos (how it was taken, where it was taken (GPS), when it was taken)
– Browsing history
00:30
What is the typical word length in the Gettysburg Address?
– Using R, calculate the mean word length of your 10 words.
Write down the population mean word length
Write down the mean of your 10 words
Were you close? How about the rest of the class?
Are humans bias? How does this activity relate to bias in algorithms?
Bias is a disproportionate weight in favor of or against an idea or thing
We all have bias
Bias can be a part of science and research
Ask questions
Slow down
Think critically
