10/22/2020 13. Project Deliverable 2
Project
A key feature of this class is applying learning to a real-world dataset. The project requires you to work with a partner (a team of 3 or 4 members for larger classes) to perform all the steps of a typical data analysis project over the course of the term. There are two deliverables for this assignment. Deliverables will include a write-up (in a text editor like Microsoft Word), relevant R code (as .R or .rmd or knit r markdown file), dataset(s), and presentation slides.
DISCLAIMER
STUDENTS TAKING PART IN THIS PROJECT MUST REVIEW AND UNDERSTAND THE
GUIDELINES AND LIMITATIONS ASSOCIATED WITH THE TERMS AND CONDITIONS OF THE COMPANY.
Helpful Tip: Working in Your Group
Before beginning this assignment, you may wish to meet informally in your group homepage to divide up the responsibilities for this assignment. For more information on how to do this, please visit the page called “Working in Your Group”.
Project Deliverable 2
Select one or more of the analytical techniques covered in this class and apply it to address the questions outlined in Deliverable 1. Interpret results from analysis and express conclusions in a form that is understandable to decision makers.
Detailed Descrip on
1. Select Analytical Technique: This course covers a number of analytical techniques ranging from clustering to text mining to spatial analysis. You must select at least one technique from this class to address the problem outlined in Deliverable 1. Of course, you may use more than one technique from this class and you may also combine it with techniques covered in Frameworks and Methods – 1.
2. Additional Data Processing: Analyzing data is a non-linear process, so additional data cleaning and preparation may need to be performed on the cleaned data from Deliverable
https://courseworks2.columbia.edu/courses/107618/assignments/488977?module_item_id=1005768 1/4
10/22/2020 13. Project Deliverable 2
1. Any additional data processing should be described and relevant R code shared.
3. Results: When running multiple analyses on the data, it is not uncommon to get different results. In fact with unsupervised machine learning techniques, there is seldom a unique solution. Furthermore, with the same analysis, indices generated may vary in the information conveyed. It is for you to distill these results into a simple actionable conclusion. In reporting the results, you must document the models run and summarize results obtained from each one.
4. Conclusion: The conclusion represents the ultimate finding of the analysis. This must be expressed in a form that is simple and understandable to decision makers and supported by charts.
Alignment
This assignment aligns with the following objectives:
Design an impactful presentation
Deliver and explain analytical outputs to a general audience
Assessment
Rubric
Criteria Points Selected an appropriate analytical technique 40 Demonstrated knowledge of R tools to implement the analytical techniques used 40 Properly interpreted results from analysis 40 Write-up is concise and persuasive, and suitable for an audience of non-technical decision
40 Conclusions for analysis and recommendations address the question 40
makers
Submission
What to Submit:
Submit the following as part of deliverable 2:
https://courseworks2.columbia.edu/courses/107618/assignments/488977?module_item_id=1005768 2/4
10/22/2020 13. Project Deliverable 2
1. Cleaned dataset: Submit the cleaned dataset submitted along with deliverable 1. If the dataset is too large, post dataset to a cloud drive and share persistent link.
2. (If you did not modify the cleaned dataset, skip steps 2 and 3). R Code (if relevant): Share R code that can be used to obtain the processed dataset from the cleaned dataset. The R code may be submitted as an R script (.R), R Markdown (.rmd), or a knit R Markdown (.html) file.
3. Processed dataset (if relevant): Submit the processed dataset. If the dataset is too large, share a persistent link to a cloud source.
4. Working R code for Analysis: Share R code used to run the various analysis attempted. This block of R code should contain all the different analyses conducted, those that were used for the final conclusions and the ones that were not. The R code may be submitted as an R script (.R), R Markdown (.rmd), or a knit R Markdown (.html) file.
5. Final R Code for Analysis and Conclusions: Share R code that contains only the final analysis. In addition, this should also include code for any charts created to corroborate the conclusions. The R code may be submitted as an R script (.R), R Markdown (.rmd), or a knit R Markdown (.html) file.
6. Write up: Use a text editor (like Microsoft Word) for the write-up. Expected length is 5-10 pages. The write-up should include:
1. Statement of the problem or question(s) being addressed
2. Reasons behind the choice of analytical technique(s). Explain why the technique is suitable for the question(s) being addressed.
3. Discuss the results from the analyses run. Wherever possible use charts to convey results succinctly.
4. Discuss the conclusions from the analysis and offer recommendations in a form that is simple and understandable to decision makers. Support conclusions with relevant charts.
How to Submit:
Only one member of your group need submit this assignment on behalf of the group. To complete your submission,
1. Click the blue Submit Assignment button at the top of this page.
2. Click the Choose File button, and locate your submission. Repeat for each file you
submit.
3. Feel free to include a comment with your submission.
https://courseworks2.columbia.edu/courses/107618/assignments/488977?module_item_id=1005768 3/4
10/22/2020 13. Project Deliverable 2
4. Finally, click the blue Submit Assignment button.
https://courseworks2.columbia.edu/courses/107618/assignments/488977?module_item_id=1005768 4/4