Lab - Project Proposal
Learning Goals
- Collaborate with your team to outline your project
- Think critically and craft questions that can be answered from your data
Introduction
Pick a data set and do something with it. That is your final project.
The final project for this class will consist of analysis on a data set of your own choosing. The data set may already exist, or you may collect your own data by scraping the web.
Choose the data based on your group’s interests. The goal of this project is for you to demonstrate proficiency in the techniques we have covered, and will be covering in this class (and beyond, if you like!), and apply them to a data set to analyze it in a meaningful way
Data Sources
In order for you to have the greatest chance of success with this project it is important that you choose a manageable data set. This means that the data should be readily accessible and large enough that multiple relationships can be explored. Your data set must have at least 500 observations and at least eight variables (or has been approved by your instructor). The data set should include a rich mix of categorical, discrete numeric, and continuous numeric data.
If you are using a data set that comes in a format that we haven’t encountered in class (for instance, a .DAT file), make sure that you are able to load it into RStudio as this can be tricky depending on the source. If you are having trouble, ask for help before it is too late.
Data sets that can not be used
– Data sets that have been used for class examples or assignments.
– Data sets from Kaggle.
– Data sets analyzed in another course.
There will be limits on the number of groups that can use a given data set, so I encourage you to be creative!
Some resources that may be helpful:
– R Data Sources for Regression Analysis
Additions:
– The National Bureau of Economic Research
– United Nations Statistics Division
– UNICEF
– CDC
All analyses must be done in RStudio, and your final written report and analysis must be reproducible. This means that you must create a Quarto document attached to a GitHub repository that will create your written report exactly upon rendering.
Project proposal
There are two main purposes of the project proposal:
– To help you think about the project early, so you can get a head start on finding data, reading relevant literature, thinking about the questions you wish to answer, etc.
– To ensure that the data you wish to analyze, methods you plan to use, and the scope of your analysis are feasible and will help you be successful for this project.
Include the following in the proposal:
Introduction and Data
(a) Choose three substantially different data sets you are interested in analyzing. You must scrape at least 1 data set for your project proposal. For each, identify the components below.
For each data set, include the following:
– Identify the source of the data,
– When and how it was originally collected (by the original data curator, not necessarily how you found the data)
– A brief description of the observations
Research question
(b) Your research question should contain at least three variables, and should be a mix of categorical and quantitative variables. When writing a research question, please think about the following:
Target population.
Is the question original?
Can the question be answered?
For each data set, include the following
– A well thought out formulated research question. You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.
– Describe the research topic along with a concise statement of your hypotheses.
– Identify the types of variables in your research question. Categorical? Quantitative?
Data set
(c) For each data set, include the following
– Use the glimpse
function to provide a glimpse of the data set.
– Place the file containing your data in the data folder of the project repo
Notes
– It is critical to check feedback on your project proposal. Even if you earn a 10, it may not mean that your proposal is perfect.
– You must use one of the data sets in the proposal for the final project, unless instructed otherwise when given feedback.
Submission
Once you are finished with the lab, you will your final PDF document to Gradescope.
To submit your assignment:
- Go to http://www.gradescope.com and click Log in in the top right corner.
- Click School Credentials \(\rightarrow\) Duke NetID and log in using your NetID credentials.
- Click on your STA 199 course.
- Click on the assignment, and you’ll be prompted to submit it.
- Mark all the pages associated with exercise. All the pages of your lab should be associated with at least one question (i.e., should be “checked”). If you do not do this, you will be subject to lose points on the assignment.
Grading
Component | Points |
---|---|
Introduction/data | 3 |
Research question | 3 |
Data sets | 3 |
Workflow and Formatting | 1 |
Total | 10 |