Big Data Analytics:
Assignment 2 – Asking good questions
Data Science Lab, Behavioural Science, School, The University of Warwick https://www.wbs.ac.uk/about/person/suzy-moat/ https://www.wbs.ac.uk/about/person/tobias-preis/
This coursework is due in on Thursday 24th February 2022 by 12:00 UK time, along with Assignment 1. The marks available in this assignment add up to 10% of your final marks.
Copyright By PowCoder代写 加微信 powcoder
You will submit this assignment with Assignment 1. Assignment 1 will help you improve your programming skills and make the most of the learning support we are offering on this course. If all of your functions behave correctly, you will score full marks on Assignment 1.
However, to give you some reassurance if you encounter any difficulties with Assignment 1, your overall mark for this pair of assignments will be the higher of the two following calculations:
• Your mark for Assignment 1 (10%) + your mark for Assignment 2 (10%)
• Your mark for Assignment 2 (10%) + your mark for Assignment 2 (10%)
In other words, your overall mark will not be lower than your mark for Assignment 2 multiplied by 2.
In this course, we have been showing you how our everyday interactions with technology are creating huge amounts of data capturing human behaviour worldwide. We have begun to outline how this sort of data can help us measure what is happening in the world, and even make better predictions about how people might behave in the future.
In this exercise, you will draw on your business school education and the insights you have gained from this course to try and identify a good question for your final Big Data Analytics project, which illustrates how online data can provide insights into human behaviour in the offline world.
You should start by considering what data is available. Crucially, you should try to develop a question that strikes a good compromise between being interesting and being feasible to answer. An interesting question will have the potential to uncover novel findings that you can argue would be of value to the world. Who would be excited about your dream result, and why? A feasible question will be possible to answer with data you have access to and statistical methods you are able to use. Make sure you are aware of any limitations of the data or the statistical methods you choose.
Your question should involve one of the following kinds of online data: Google Trends data, data on Wikipedia page views or data retrieved from the Flickr API, so that you can think about what can be done with these new sources of data, and apply and develop the knowledge that you have acquired in the R seminars.
Your question needs to link this online data with another source of data which reflects human behaviour in the offline world: for example, financial data, national statistics, or any other data source you find interesting.
For assessment, please submit a PDF providing answers to the questions on the next page.
You should keep your answers to under 2 pages of A4, with borders of 2.54cm and a font size of 11pt. Read through all the questions first, as answering some of the later questions might make you realise you should modify your answers to earlier questions.
Big Data Analytics: Assignment 2 and
1) What is the question you wish to ask? (1%)
In one sentence, state the question you wish to ask.
We will give you marks for stating an interesting question clearly.
2) What would your dream result be? (1%)
Imagine you have finished your analysis, and you have found the best result you could dream of. Describe your dream result in two simple sentences that a member of the general public would understand.
Often, if your question is not interesting enough, it becomes difficult to summarise the findings in so few words – so we want to make sure you can!
3) What reason do you have to believe you might find this result? (1%)
In two to three sentences, explain why you might expect to find this result.
In your final project, you will not lose marks if you do not find a significant result (nor gain marks if you do). However, you do need to provide a good explanation as to why you might expect to find the result you describe – otherwise it wouldn’t be worth your time looking into this idea.
4) What data will you use? (1%)
Explain what Google Trends, Wikipedia page view or Flickr data you will use.
Provide a link to the source of the data that is not from Google Trends, Wikipedia or Flickr. Please ensure that this link will take us directly to the data when we click on it. If there is a very good technical reason for which you cannot do this, explain what this reason is and clearly describe the source of the data.
5) How will you read the data into R for analysis? (1%)
Outline what steps would be required to read both your online data and other data into R and carry out any pre-processing required before your statistical analysis. We are looking for a good understanding of the basic steps you would need to carry out – you do not need to write code at this stage.
6) What statistical method will you use to analyse the data? (1%)
Describe the statistical approach you will use to answer your question. Describe any assumptions of this analysis.
Make sure that the approach you describe is capable of delivering an answer to your question in line with the dream result you described in question 2.
7) Which R functions will you use to carry out the statistical analysis? (1%)
Name the R functions that will allow you to carry out the statistical analysis (not the data pre- processing), and that will allow you to check any assumptions of your statistical analysis. If these R functions have not been used in the module, specify the R package that they are in.
8) Describe the form of the data. Do you have enough data? (1%)
Is the data daily, weekly, monthly, something else? How many data points will you be able to analyse? Given the statistical approach you have described, is this sample size large enough to give you a chance of uncovering a significant result? (If not, you need to rethink your question!)
9) How would you describe your dream result to a professional audience? (2%)
Imagine you have finished your analysis and you have found your dream result. You have been asked to write an executive summary of your results for a professional audience. What would you write? Give some background motivation to your question, briefly describe your finding, and then indicate what this finding might mean. Keep your summary under 125 words.
February 2022 2
Big Data Analytics: Assignment 2 and
Further guidance on developing your question
In this assignment, we want to give you a chance to demonstrate your awareness of both what can be done and what is valuable, and use this to design an interesting project with which you can demonstrate your newly developed data science skills.
Try brainstorming some ideas, and ask for each idea – can I find a question that is just as easy to answer, but even more interesting? Can I find a question which is just as interesting, but even easier to answer?
It is likely that your question will fall into one of three categories. These are as follows:
• Nowcasting offline behaviour with online data
Tip: See “Faster measurements of human behaviour with Big Data” in the expert insights section in Week 3, as well as the skills walkthroughs in Weeks 3 and 4. Make sure that you have understood what nowcasting is, and when nowcasts are useful.
• Predicting offline behaviour with online data
Tip: Be careful to ensure that the statistics you are proposing will genuinely allow you to make predictions. For example, remember that the skills walkthroughs in Weeks 3 and 4 are about nowcasting. You would need to modify the approach from these walkthroughs to make predictions.
• Measuring offline behaviour with online data, where the offline behaviour was previously difficult or impossible to measure
Tip: These can be very tricky questions to set up correctly. Be careful to ensure that it really makes sense to use Google Trends, Wikipedia page views or Flickr data to measure the offline behaviour you are interested in.
First, check that there is not a more obvious offline alternative that you could use. (If there is, you probably need to think of another question, as you do need to propose a question using Google Trends, Wikipedia page view or Flickr data.)
Second, check that you can make a convincing argument that the measurements produced with online data are likely to be valuable, and not either too noisy or too biased in some way.
What if my analysis doesn’t lead to my dream result? lose marks in my final project?
No, you won’t lose a single mark for this, and we won’t consider your question less interesting. What matters is that you propose a question where there is a good reason to suppose that you might find your dream result. This involves thinking about how the data is generated, what it represents, and any limitations of the data or the statistical methodology you are proposing.
Make sure you do not propose a strange analysis simply because you have already taken a look at the data and have found a statistically significant result. You are much more likely to lose marks for this, both in this assignment and in your final project, because you will have more difficulty in explaining why it made sense to carry out this analysis in the first place.
February 2022 3
Big Data Analytics: Assignment 2 and
Finding inspiration for your question
In the videos and discussions, we have been looking at a number of examples of analyses using online data to provide insight into human behaviour in the real world.
Here’s another example from the Bank of England, where they use data on Google searches to nowcast unemployment rates and house prices:
Using internet search data as economic indicators
https://www.bankofengland.co.uk/-/media/boe/files/quarterly-bulletin/2011/using-internet-search-data- as-economic-indicators.pdf
You might also be interested in this paper looking into the relationship between Bitcoin prices and Google Trends and Wikipedia page view data:
BitCoin meets Google Trends and Wikipedia: Quantifying the relationship between phenomena of the Internet era
https://www.nature.com/articles/srep03415
You can find more examples on the course reading list:
https://rl.talis.com/3/warwick/lists/16C9B425-F6E7-D057-08EB-1DDF31DCD446.html
Taking existing analyses into account
To ensure that the question you propose is interesting, note that it should not simply replicate an analysis that you are aware of, or it will be difficult to argue for the value of your analysis.
Do take a look through the titles of papers on the reading list so that you can check any papers that look like they might be close to your idea. This will help make sure that you are well placed to argue for the novelty of your suggestion.
If you are keen to propose an analysis that is similar to a previous analysis, make sure your answer to question 2 makes it clear what insights your dream result would provide, beyond the knowledge we already have from the previous analysis.
If your project is in any way inspired by an analysis that you have read about – whether it’s in a paper listed on the reading list or not – please make sure you pay very careful attention to WBS plagiarism guidance in completing your assignment. In addition, follow our advice above to ensure that your proposed analysis differs to the analysis that you have read about.
If your project is not inspired by an analysis that you have read about, then you only need to check the reading list for any potential overlap with papers covered in our course, so that you can position your idea well. We don’t expect you to be familiar with the literature beyond the reading list.
February 2022 4
Big Data Analytics: Assignment 2 and
Data source suggestions
One of the challenges you have to address in this assignment is to find a source of offline data you can use in your project. There are many sources of fascinating real-world data available online. Here are a few suggestions – but feel free to find others!
Open administrative data – UK
• data.gov.uk
o UKGovernmentprojecttomakenon-personalUKgovernmentdataavailableasopen
https://data.gov.uk/
• London Datastore
o Officialsiteprovidingfreeaccesstoanumberofdata-setsfromtheGreaterLondon
• Office for National Statistics
o AccesstoeconomicandsocialdatafortheUK,includingCensusdata
https://www.ons.gov.uk/
Open administrative data – USA
• US Census data
o Dataoneconomy,populationandsocietyatnationalandlocallevel.Summariesand
detailed data releases are published free of charge.
https://www.census.gov/
• data.gov
o OfficialU.S.governmentsiteprovidingincreasedpublicaccesstofederalgovernment
• NYC Open Data
o NYCOpenDatamakesthewealthofpublicdatageneratedbyvariousNewYorkCity
agencies and other City organizations available for public use
Open administrative data – world and Europe
• World Bank Open Data
o Freeandopenaccesstodataaboutdevelopmentincountriesaroundtheglobe
https://data.worldbank.org/
• CIA World Factbook
o U.S.governmentprofilesofcountriesandterritoriesaroundtheworld.Informationon
geography, people, government, transportation, economy, communications…
https://www.cia.gov/the-world-factbook/
• Eurostat
o DetailedstatisticsontheEUandcandidatecountries
https://ec.europa.eu/eurostat/
February 2022 5
Big Data Analytics: Assignment 2 and
WBS Plagiarism Policy
Please ensure that any work submitted by you for assessment has been correctly referenced as WBS expects all students to demonstrate the highest standards of academic integrity at all times and treats all cases of poor academic practice and suspected plagiarism very seriously. You can find information on these matters on my.wbs, in your student handbook and on the University’s library web pages:
https://warwick.ac.uk/services/library/students/referencing
The University’s Regulation 11 (see link below) clarifies that “…’cheating’ means an attempt to benefit oneself or another by deceit or fraud. This shall include reproducing one’s own work…” It is important to note that it is not permissible to reuse work which has already been submitted by you for credit either at WBS or at another institution (unless you have been explicitly told that you can do so). This is considered self-plagiarism and could result in significant mark reductions.
Upon submission of assignments, students will be asked to agree to one of the following declarations:
Individual work submissions:
“I declare that this work is entirely my own in accordance with the University’s Regulation 11 and the WBS guidelines on plagiarism and collusion. All external references and sources are clearly acknowledged and identified within the contents. No substantial part(s) of the work submitted here has also been submitted by me in other assessments for accredited courses of study, and I acknowledge that if this has been done it may result in me being reported for self-plagiarism and an appropriate reduction in marks may be made when marking this piece of work.”
Group work submissions:
“I declare that this work is being submitted on behalf of my group, in accordance with the University’s Regulation 11 and the WBS guidelines on plagiarism and collusion. All external references and sources are clearly acknowledged and identified within the contents. No substantial part(s) of the work submitted here has also been submitted in other assessments for accredited courses of study and if this has been done it may result in us being reported for self- plagiarism and an appropriate reduction in marks may be made when marking this piece of work.”
By agreeing to these declarations you are acknowledging that you have understood the rules about plagiarism and self-plagiarism and have taken all possible steps to ensure that your work complies with the requirements of WBS and the University.
You should only indicate your agreement with the relevant statement, once you have satisfied yourself that you have fully understood its implications. If you are in any doubt, you must consult with the NIE of the relevant module, because once you have indicated your agreement it will not be possible to later claim that you were unaware of these requirements in the event that your work is subsequently found to be problematic in respect to suspected plagiarism or self-plagiarism.
Regulation 11: https://warwick.ac.uk/services/gov/calendar/section2/regulations/academic_integrity/
February 2022 6
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com