程序代写代做 assembly data science Excel Big Data Analytics:

Big Data Analytics:
Assignment 3 – Your data science project
Neha Gupta neha.gupta@wbs.ac.uk
Adapted from original assignment created by Prof. Suzy Moat and Prof. Tobias Preis.
Data Science Lab, Behavioural Science, Warwick Business School, The University of Warwick
1. Overview
This project is worth 80% of your final marks for this course.
It is due in on Thursday 19th March 2020 by 12 noon.
In this course, you have been learning how our everyday interactions with technology are creating huge amounts of data capturing human behaviour worldwide. You have learned how this sort of data can help data scientists measure what is going on in the world, and even make better predictions about how people might behave in the future.
In this final assignment, you are asked to pose an interesting question that can be answered using these new datasets and the data science skills you now possess. You then need to acquire the relevant data, process it into a form that you can analyse, carry out the statistical analysis, and produce relevant visualisations to illustrate your results. You also need write up your results in a clear and engaging style.
The aim of this project is for you to have an opportunity to apply your skills to a question that you are interested in, and at the same time, produce a document that you can use to demonstrate your skills to future employers. Good luck and have fun!
2. What to submit
i. Please submit your final write-up as a PDF.
ii. Please also submit any datasets you have used in your analysis, and the R code which you
have written to process this data. You should provide clear comments in your code so that it is easy to understand what it does. Save your code in a script. Do not submit your R workspace, your command history or your RStudio project. You should also provide a PDF document explaining what data is contained within the dataset files. Combine all your files (R script, the supporting pdf document, data files) into a zip file for upload to my.wbs. If you expect your zip file to be greater than 20MB, please speak to us about this at least one week in advance.

Big Data Analytics: Assignment 3
iii. Further guidance
Your question
This assignment builds on the project design you carried out for Assignment 2. As such, your question should involve one of the following kinds of online data: Google Trends data, data on Wikipedia page views or data retrieved from the Flickr API. Your question needs to link this online data with another source of data which reflects human behaviour in the offline world: for example, financial data, national statistics, population data or any other data source you find interesting.
You are strongly recommended to use the question you identified in Assignment 2, which you will have received feedback about, with any changes which have been suggested. This will help you avoid encountering difficulty in acquiring or analysing the data, or in motivating the relevance of your question.
If you wish to investigate a different question, please speak to your course tutor first. The course team may be less able to offer you support with new questions.
Your analysis
Your aim is to carry out your analysis in a way that third parties could easily replicate it and verify your findings. You should therefore write your code in as clear a style as possible, with comments to help explain what your code does where necessary. You should also provide clear documentation of the data sets which you have used and which you submit: for example, what data is contained in each file and where this data was acquired or downloaded from.
To develop and demonstrate the skills you have acquired during this course, you should carry out your analysis in R.
Your write-up
Remember that the goal of this assignment is to carry out the analysis required to evaluate an interesting question. As long as your question is well motivated, do not worry if your results do not turn out as you hoped. Just make sure that, in your write-up, you provide a clear motivation for your question; a clear argument for why one might expect to find the result you hypothesised may hold; a clear description of the analysis you carried out; and a clear evaluation of your findings, including why you may not have found what you expected. If there was a good reason to suppose you might find evidence for your hypothesis, it is useful to discover no evidence for the hypothesis too.
Your write-up should be no longer than 3,000 words, and should be structured as follows:
February 2020 2

Big Data Analytics: Assignment 3
• Title
o Your title should convey the main thrust of your analysis and results, but crucially should
also catch the reader’s attention.
o Your title should have a maximum of 15 words (although good titles are normally shorter).
• Abstract
o In your abstract, you should briefly explain the problem your question is addressing, and
the opportunity you have identified to address this problem.
o You should then clearly state what your question therefore is.
o You should give an overview of the analysis you are carrying out to address this question,
and you should then explain the results of your analysis.
o Finally, you should describe the conclusions of your analysis. In other words, what do your
results mean? What is the takeaway message from your analysis? o Your abstract should be no longer than 150 words.
• Introduction
o Themaingoalofyourintroductionistomotivateyourquestionandintroduceyouranalysis. o Youshouldthereforeprovideenoughbackgroundtomakethevalueofyouranalysisclear.
Who does the problem you are addressing affect?
o You should cite between 5 and 10 scientific papers that are related to your analysis. (For
example, there are a number of papers on the course reading list that explore the relationship between online data and offline behaviour, or that give a broader background to analyses of human behaviour with big data. We have discussed a number of them in the lectures.)
o You should then clearly explain what your analysis sets out to do. What is your question? What do you expect to find? Why do you expect to find this?
o You may wish to give an initial indication of the results you uncover, but this is a stylistic decision.
o There is no word limit for your introduction, but make sure your writing style is concise. • Methods and results
o Inthemethodsandresultssection,youshouldveryclearlyexplainwhatanalysisstepsyou carried out, and what the results were.
o As a guide to the level of detail required, you should include enough information in this section to enable someone else to reproduce your analysis without access to your code or the data you downloaded.
February 2020 3

Big Data Analytics: Assignment 3
o To achieve this, you should make the source of your data clear, including providing
references for websites from which you have downloaded the data. You should also clearly describe any calculations you carried out on the raw data you downloaded to reach your final results. You do not need to refer to the specific R functions that you used to do this, however.
o Allstatisticaltestsshouldbereportedappropriately,includingatleastdetailsofthesample size (or degrees of freedom), the value of the test statistic calculated and, where calculated, the p-value.
o You should also describe any assumptions of the analyses you carried out (e.g., should your data be normally distributed?) and show how you checked that these assumptions hold.
o Youshouldprovideatleasttwofiguresthatvisualiseyourfindings.Wewillgiveyou20%of your marks for visualisation as detailed below.
o Figures should always have appropriately labelled axes, with the units of measurement specified. Legends should be provided to explain different colours or line types used, and font sizes should not be too small. As a guide, ensure that any text in your figures is at least as big as text used in the body of your assignment. Check that this is still the case when you have included the figure in your assignment. Make sure that your figure does not get stretched horizontally or vertically when you add it to your assignment.
o If appropriate, you can provide up to four figures. (Do not provide more than four figures.) You can also construct figures which contain multiple subfigures. However, only include important figures which help you tell your story. You need to be as concise with your figures as you are with your words.
o Under each figure, provide a caption which clearly outlines to the reader what data the figure shows, and what patterns the reader should note in the data. Each caption should be no longer than 350 words.
o To capture the attention of busy readers and to help them understand your analysis, you should produce figures and figure captions that convey the basic story of your analysis on their own.
o There is no word limit for your methods and results, but make sure your writing style is concise.
• Discussion
o The discussion should briefly summarise what you have done, and discuss what your
findings mean.
February 2020 4

Big Data Analytics: Assignment 3
o To make your document as accessible as possible to busy readers, it is a good idea to
ensure that your discussion would make sense if the reader had not read the rest of the
document.
o You may wish to begin by briefly summarising the motivation for your study once again.
What is the problem you are addressing and what is the opportunity you have identified to
address it? You can then restate your research question.
o Next, give a brief indication of the nature of your analyses and summarise what your
analyses found.
o Indicate which answer to your research question your findings provide support for. Is this
what you expected?
o Try to offer a potential explanation for your findings. If you have found the pattern you
expected, you may have already hinted towards this explanation in your introduction. If you
did not find what you expected, why do you think this is?
o It is not a problem if you are not sure why you found a particular pattern – simply suggest
some possible ideas. It is very important that you are careful not to overstate your case. In particular, be aware that most investigations do not “prove” anything on their own, but you may have found new strong or weak support for a given idea.
o Indicatewhattheimplicationsofyourinvestigationare.Forexample,haveyouhighlighted a new opportunity to use a certain dataset to measure or forecast a certain type of behaviour? Have you provided evidence of an interesting behavioural pattern? Have you helped explain a previously observed behavioural pattern? Have you provided evidence that a particular line of enquiry may not be worth following further? What might people be able to do once they have read your results that they might not have been able to do before?
• References
o You should provide full references for all papers you have cited. Please use the Harvard
iv.
style of referencing for this assignment. You can find more guidance here:
https://www2.warwick.ac.uk/services/library/students/referencing/referencing-styles
How marks will be allocated
You will receive marks for the following:
• Quality of question
o This area is worth 20% of your final mark for the module.
o You will be awarded marks for choosing a question which was interesting and feasible to
answer.
February 2020 5

Big Data Analytics: Assignment 3
o You can emphasise how interesting your question is by stating your question clearly and
motivating it well in the abstract and introduction. Who would be interested in the answer, and why? You may be able to provide more evidence of the value of your question in the discussion as well.
o Again,ifyouhaveprovidedagoodmotivationforwhyyourquestionwasworthinvestigating and why you believed you might find an interesting answer, do not worry if your results do not turn out as you hoped.
o Youcanemphasisehowfeasibleyourquestionwastoanswerbycompletinganappropriate analysis in the methods and results, and crucially, not overstating your findings in the discussion. Your assignment as a whole need to provide clear evidence that the question you proposed could be answered from the data you identified and the analysis methods you chose, without a leap of faith.
• Quality of analysis
o This area is worth 20% of your final mark for the module.
o Youwillbeawardedmarksforchoosingananalysismethodappropriateforansweringyour
question; verifying that assumptions made by this analysis method hold (e.g., should your data be normally distributed?); carrying out the analysis correctly; and correctly interpreting the results of the analysis.
o You will also be assessed on whether you have motivated any pre-processing steps well (e.g., you have not left out half of your dataset without explaining why).
o Finally, you will be awarded marks for clearly documenting your code, and providing clear pointers to where the data you analyse can be obtained, in order to support replication of your study.
o Youcanmakeiteasierforyouranalysistobecorrectlyassessedbyprovidingaclearand concise description of your analysis in the methods and results.
• Quality of visualisation
o This area is worth 20% of your final mark for the module.
o Crucially, you should provide visualisations which tell the story of your analysis in a clear,
concise and engaging fashion.
o You will be awarded marks for choosing appropriate visualisations for your data and
analysis. Remember, you should only include the visualisations which help tell your story. Do not simply include every possible visualisation you can think of. Make sure you include at least two figures and no more than four.
February 2020 6

Big Data Analytics: Assignment 3
o You will be awarded marks for providing legible visualisations, and labelling your
visualisations well (e.g., all axes are labelled, including units of measurements, legends are
provided to explain different colours or line types used, and font sizes are not too small).
o You will be awarded marks for creating an attractive visualisation. The base level of plots generated by the ggplot2 library is good, but it will also allow you to change many different aspects of your visualisation where you feel this is appropriate, from colours, to line
thickness, to font used, and more.
o For the purposes of this assignment, please make all changes to your figures by writing
code in R, apart from assembly of multi-panel figures which you can do in an external program (e.g., Word). You should not postprocess your figures in Adobe Illustrator or similar programs.
o You will also be awarded marks for good figure captions. Do your figure captions meet the specification detailed in the structure above, describing the data shown in the figure and highlighting the key patterns that readers should note in the data? Do your figures and figure captions together successfully tell the main story of your analysis?
• Quality of written description
v.
o o o
o o
o o
This area is worth 20% of your final mark for the module. Youshouldprovideaclear,conciseandengagingwrittendescriptionofyourinvestigation.
You will be awarded marks for using the structure described above and covering all the points highlighted in the structure description.
Within individual sections, you will be awarded marks for structuring your writing well, to make your arguments and descriptions easy to follow. Youwillbeawardedmarksforthestyleofyourwriting.Isitclear,concise,andengaging? Have you kept your sentences short where possible? Have you used correct grammar and appropriate vocabulary? (Simple vocabulary is often easier to understand – do not use complicated words for the sake of it.)
You will be assessed on whether you have correctly observed conventions for reporting statistical results, including formatting. Finally,youwillbeassessedonwhetheryouhavecorrectlyintegratedreferencesintoyour writing, and listed all references correctly at the end of your assignment. This will again include the formatting of your references.
Final note
Please make sure you observe the WBS plagiarism guidelines to ensure you do not needlessly lose marks. You can see these in full on the next page.
February 2020 7

Big Data Analytics: Assignment 3
In particular, it is extremely important that you do not copy text from existing sources or your classmates. For this assignment, you are also strongly recommended to avoid including any quotes – this should not be necessary. Write everything in your own words and provide clear references where you refer to ideas and results you have read about elsewhere.
We have seen some great work and great questions on this course. We are looking forward to you submitting some excellent data science projects!
WBS Plagiarism Policy
Please ensure that any work submitted by you for assessment has been correctly referenced as WBS expects all students to demonstrate the highest standards of academic integrity at all times and treats all cases of poor academic practice and suspected plagiarism very seriously. You can find information on these matters on my.wbs, in your student handbook and on the University’s library web pages:
https://warwick.ac.uk/services/library/students/referencing
The University’s Regulation 11 (see link below) clarifies that “…’cheating’ means an attempt to benefit oneself or another by deceit or fraud. This includes reproducing one’s own work…” It is important to note that it is not permissible to reuse work which has already been submitted by you for credit either at WBS or at another institution (unless you have been explicitly told that you can do so). This is considered self- plagiarism and could result in significant mark reductions.
Upon submission of assignments, students will be asked to agree to one of the following declarations:
Individual work submissions:
“I declare that this work is entirely my own in accordance with the University’s Regulation 11 and the WBS guidelines on plagiarism and collusion. All external references and sources are clearly acknowledged and identified within the contents. No substantial part(s) of the work submitted here has also been submitted by me in other assessments for accredited courses of study, and I acknowledge that if this has been done it may result in me being reported for self-plagiarism and an appropriate reduction in marks may be made when marking this piece of work.”
Group work submissions:
“I declare that this work is being submitted on behalf of my group, in accordance with the University’s Regulation 11 and the WBS guidelines on plagiarism and collusion. All external references and sources are clearly acknowledged and identified within the contents. No substantial part(s) of the work submitted here has also been submitted in other assessments for accredited courses of study and if this has been done it may result in us being reported for self-plagiarism and an appropriate reduction in marks may be made when marking this piece of work.”
By agreeing to these declarations, you are acknowledging that you have understood the rules about plagiarism and self-plagiarism and have taken all possible steps to ensure that your work complies with the requirements of WBS and the University.
You should only indicate your agreement with the relevant statement, once you have satisfied yourself that you have fully understood its implications. If you are in any doubt, you must consult with the NIE of the relevant module, because once you have indicated your agreement it will not be possible to later claim that you were unaware of these requirements in the event that your work is subsequently found to be problematic in respect to suspected plagiarism or self-plagiarism.
Regulation 11: http://www2.warwick.ac.uk/services/gov/calendar/section2/regulations/cheating February 2020 8