For the project, you can use any dataset that is publicly available over the internet. If you plan to gather your data by web scraping, it is also fine but note that in this case, you have to provide the code that you used to read the data. The following sources are some of my suggestions to find datasets to work on:
1) UC Irvine Machine Learning Repository
2) Kaggle website
3) Some packages in R with data: faraway, Stat2Data, datasets
4) The world bank datasets
5) Dataset search of Google
Note: These are only some suggestions. Feel free to use any publicly available dataset that you find interesting. If you are interested in a particular topic, you can find data related to it as well. I am happy to help you with selecting an appropriate dataset. So feel free to drop in office hours to chat more about this. The project has two components as follows.
1) Presentation
The presentation dates are December 2 and December 6 during the class time. For the presentation, each group has 5 minutes to show their work. The idea is to give us an overview of your research question, the variables in the dataset, what type of model you are interested to apply on your dataset (SLR, MLR, or any of the selected topics), and if you already have some results, share them with us. Due to time constraint, a maximum of 5 slides presentation should be good to convey the message. Note that for the presentation you do not want to include all details (codes, all outputs, …). These all go to your written report.
2) Written report
I request for no more than 8 pages of written work. The figures, tables, and outputs are NOT counted toward this 8 pages limits. The written work (including your explanation) should be about 8 pages. Format of the written project should be single line spacing using Font Style Times New Roman and Font Size 12. If you need more space, you can reduce the Font Size to 11. You submit the written project online via Crowdmark (PDF or word file are both acceptable). I also ask you to submit the R codes that you used for the project. I see two scenarios here (let me know if there is another thought):
one: You plan to write everything in Rmd file. You use RMarkdown with the same style as your tutorial sessions. You can find the Rmd files on Canvas page to get some idea. In this case, you include your codes in Rmarkdown and no need to submit the codes in another document. You simply compile/knit your Rmd file (into PDF or word file) and upload on Crowdmark.
two: You plan to run your codes in an R script and then copy and paste your results into word file and organize things there. This is also acceptable (although I prefer the previous scenario). In this case, you have to upload your scripts in a separate file in Crowdmark.