程序代写 Assignment 1

Assignment 1
Introduction
This assignment is composed of three parts:
1. Resale flat prices analysis — a data analysis in line with what we did during the labs.

Copyright By PowCoder代写 加微信 powcoder

2. Recreating a new plot type — understanding a new package and method on your own, slightly
outside the scope of the sessions.
3. Freeform task — choosing a topic and dataset on your own, and carrying out a data analysis
focusing on an aspect you personally find of interest, using the knowledge attained so far, and at the same time giving an opportunity to explore further topics on your own.
All three parts must be completed.
Learning goals of this assignment
This assignment is intended to demonstrate problem solving skills and knowledge of what we have covered in the class until the deadline (that is, in the period to and including Week 6). It assesses the capabilities of applying the learned techniques on new datasets. Further, it gauges the foundation and capability to explore a concept not covered in the class, e.g. deriving new types of insights, new types of plots, and using new R packages we have not used in the class. More specifically, you should be able to demonstrate the ability to do the following:
• Read and preprocess a new dataset.
• Perform data transformation.
• Basic and intermediate data visualisation.
• Come up with an idea for analysis and demonstrate that you can carry it out. • Understand and handle a large dataset, and reduce it to the relevant aspects. • Being able to explore a new package and new topics.
Page 1 of 8

• Describe the analysis and results well in form of a report.
The assignment seeks creativity, independency, and curiosity expected at the graduate level.
Much of the assignment is open ended, giving you some freedom to explore things you may find personally interesting. The grading is rather based on the goals listed above.
The deliverable of the assignment is a report made using R Markdown, containing code, descriptions, plots, and your conclusions, all in a single report.
The lectures and labs in the first six weeks should give a solid and sufficient understanding for solving this assignment. If needed, for further reading, please refer to the references in the slides.
• Each R package is well documented online.
• The powerful and versatile capabilities of R make it impossible to cover everything in one
course. Feel free to search solutions online about some details we might not have covered. It’s okay to be inspired by online resources and seek solutions online, but don’t just copy paste the code without understanding it.
• Write a proper report. Write about the data and your analysis, don’t just create a bunch of plots without an explanation and accompanying text. Explain, lead the reader. Be structured.
• Write critically about data and your analysis. This is a learning process and data is rarely perfect, so do not be shy about the work not being perfect. Not to mention that we are only halfway through the semester, so you are not expected to know everything.
• Feel free to search online for similar analyses, and be inspired with how others analysed and visualised a similar dataset.
• Stuck with getting an idea what to do exactly? Start with the exploratory data analysis, and it will come to you.
• If you run into a particular error, you are advised to search about it online.

Page 2 of 8

The three parts of the assignment are as follows.
1. HDB RESALE FLAT PRICES
Download the Resale Flat Prices dataset from https://data.gov.sg/dataset/resale-flat-prices
The task is to provide a data analysis of resale flat prices in Singapore, through two subtasks (one is short and specified, the other one is longer and flexible). Do make sure that you understand the structure of the dataset well as it is also used in the Task 2.
(a) Replicating findings from an ST article published in January 2022 (i.e. fact-checking)
The first subtask will get you familiarised with this dataset through a set of specific things to do, warming you up for the other subtask.
On 6 January 2022, The Straits Times published the article “Record 261 million-dollar HDB flats in 2021; resale prices rise in December as volume dips” (link: https://www.straitstimes.com/ singapore/housing/record-261-million-dollar-hdb-flats-in-2021-resale-prices-rise-in-december-as- volume-dips). This article is one of many similar articles published periodically in Singaporean media about real estate price trends, and so on. Your job is to use the latest available dataset on HDB resale flat prices, and verify the claims in the article.
Using R and the dataset, verify the following claims in the article:

1. In 2021, there have been 261 HDB flats transacted at or more than $1m.
2. HDB resale prices rose 0.8 per cent in December 2021 from the previous month.
3. HDB resale prices in December 2021 were 13.6 per cent higher than a year ago.
4. There were 2,429 HDB flats sold in December 2021.
5. Replicate the first plot in the article (“HDB resale volume”, the one with the blue columns),
including the approximate style (it does not have to be exactly the same).
In some cases, you will not get the exact numbers (in fact, I could not replicate all of them exactly on my own, some have a discrepancy that is larger than a few percents). What is important here is the approach you come up with and to understand the structure of this dataset.
If you have developed the correct procedure, but get a different results from the article, you will still score full points.
The code and the number/figure you get must be visible in the report you submit.
Page 3 of 8

(b) Further data analysis based on your interest
Now that you have loaded, preprocessed, and acquainted yourself with the dataset, the next step is to conduct a basic data analysis with no particular questions, i.e. based on your interests.
The assignment does not prescribe specific questions to answer or a particular aspect to focus on. After a general, exploratory data analysis of the dataset, feel free to focus on an aspect you find interesting, e.g. how does the remaining lease impact the price, how does the price of the same type of flat differ among different towns, did the price in all locations and for all types of flat change in the same manner through time… To give you some guidance and an indication about the expectation of the size and scope of the analysis:
1. You must include the description of the data and an exploratory data analysis.
2. You must give an impression that your analysis gave you a good understanding of this topic
and the data.
3. Your analysis has to result in at least 5 meaningful insights. Examples of such are: “What is
the trend of resale prices?”, “Are similar flats sold across Singapore at the same time?”, “Does the storey level have much influence on the price?”, “Is there a difference in the flats sold years ago and today?”, “What is the influence of the lease time? How was it before?”. Describing each of these in one paragraph is sufficient.
4. You must include at least 5 meaningful plots or tables, to accompany the aforementioned insights/conclusions. The plots that you generated in the exploratory data analysis count towards these 5 plots as long as they are meaningful and look sensible. You can go beyond 5 plots, but I prefer quality over quantity, together with meaningful insights you can deduce.
– Check what we did in the labs for some general inspiration.
– The order of the two subtasks does not have to be sequential. You can first conduct the
exploratory data analysis, then include subtask (b), and finish with subtask (a), up to you.
– Real estate is a popular topic in Singapore and in many other places around the world, and
therefore it is often a topic in the media. As you see from subtask (a), articles about real estate often contain a data analysis and a couple of visuals. Feel free to get inspired by such resources, and redo such analysis.
– Depending on what you decide to do, you may need to deal with the data type of dates because one of the columns contains the year and month of the transactions. In that case you may want to check the lubridate package, which is also covered by one of the provided cheatsheets. Optionally, you can read also the Chapter 16 of the book R for Data Science (https://r4ds.had.co.nz/dates-and-times.html).
Page 4 of 8

2. RECREATING A NEW PLOT TYPE
Your task is to recreate the plot included below, using the HDB Resale Flat Prices dataset (the same as in the previous task). This type of plot is called the ridgeline plot (you may find it in literature also under the alternative name joyplot), and it is becoming increasingly popular. A ridgeline plot shows the distribution of a numeric value for multiple groups in the data. Ridgelines in R are made with the package ggridges, which has not been part of the class. For that you need to explore the package and figure out how it works. This task requires also a degree of data transformation to get the data in the form needed to accomplish such a plot. In addition, if you come up with something that communicates the same data in a better way — you’re more than welcome to submit that as well, on top of this plot. Please write one paragraph of conclusions/ observations.
You may struggle a bit with some details (e.g. sorting the towns by average price), but that’s normal and part of the learning process. Keep in mind that some of the code you develop in this assignment is yours and can be reused later (e.g. you can use it to visualise this particular plot type in Assignment 2 and the group project if applicable).
Page 5 of 8

3. FREEFORM TASK
The final task is to find a topic and a dataset on your own, and provide an analysis that is of about half of the scope of the one in the task 1b (i.e. resulting in 2-3 insights and about 2-3 plots).
This is an open-ended task. Be creative and do something interesting. Use this opportunity to focus on a topic you personally find compelling and want to explore.
The data does not have to necessarily be from Singapore. It can be from anywhere. Preferably it should be related to the urban context, but that is not a must (e.g. it can be a dataset about movies, politics, music, economics, food, covid-19, …), as long as you can demonstrate a solid grasp of what we have done so far during the sessions and that you are capable of using the attained transferrable skills to explore new topics on your own.
Don’t know where to get interesting data? There are numerous options besides data.gov.sg, e.g.
– Type data() in the console, R comes with some datasets that you can use right away.
– Check out this collection: https://data.fivethirtyeight.com from the popular FiveThirtyEight news
site (it is also available as an R package called fivethirtyeight).
– Explore questions and answers on the Open Data StackExchange (https:// opendata.stackexchange.com), which contains some interesting surprises.
– If you have lived somewhere else besides Singapore, how about checking the open data portal of the government of that city or country?
While you have substantial freedom in this task, don’t do something overly simple. Do something you personally find interesting. Find an interesting dataset that we have not used. Feel free to use this opportunity to try out new types of plots and new packages.
I noticed some common shortcomings and points for improvement in the last instances of this module, so I prepared a list of things that you may want to ask yourself before the submission, to help you avoid issues that I have observed so far, and score a high mark:
• Did you structure the document well?
• Did you make sure that the plots and maps are well styled and annotated, and easy to
understand?
• Did you describe the conclusions from the visuals, helping the reader to interpret them?
• Did you check the licence and quality of the data, and described sufficiently additional datasets
that you may have used?
• Did you check the metadata of the datasets you used?
• Did you remove unnecessary warnings and messages that clutter the report in R Markdown?
• Is your report free of grammar error and written in clear English?
• Does your report have a clear conclusion?
• Did you refer to all external materials you used?
• Did you attribute properly all the data?
• Does your report contain sufficient text to lead the reader?
Please submit the assignment by Monday 28 February at 09:00 (morning). 10% is deducted from the mark for each 24 hours late.
Page 6 of 8

Submission
Please submit your report on LumiNUS, a folder has been created for the submissions.
 Do not submit by email.
Use R Markdown in preparing the report (your report needs to be produced out of RStudio by knitting the Rmd notebook; don’t use Word and similar software). Please submit your report in PDF or HTML. The Rmd file does not have to be submitted. Neither you have to submit the data you used or generated. The report in PDF or HTML suffices.
There are no strict rules about the report, except that the code you wrote has to be visible in the report. There is no minimum or maximum number of pages. However, don’t exaggerate with the length of the report, but neither submit a report that contains only plots without any explanation.
When submitting the report please make sure that:
– When you generate the PDF or HTML, make sure that is nicely formatted (e.g. there is text with
headings and a structure), and there are no dangling lines (long code that does not fit in the
– Your name is somewhere at the top of the report next to the title.
– Do not include errors/warnings/lots of data — you can clean these easily. Check out the
cheatsheet for R Markdown, or my Rmd notebooks used in the class.
You’re graded based on your understanding of the topics so far, quality of the report, creative thinking, criticism and honesty, ideas, depth of the analysis, and interpretation of the data and results. You are free to focus on any aspect you find interesting, and your choice of the topic won’t have an influence on your mark as long as it makes sense. The quality of the work is what matters.
Brownie points for:
– Thinking outside the box. 🧠
– Going beyond the scope of the class, e.g. finding new functions and using plot types not
covered in the class, i.e. demonstrating the capability to find new solutions.
– Coming up with interesting conclusions and producing interesting plots.
Questions?
If you have questions please use MS Teams (private message or group chat). I’ll be happy to help within the weekdays and usual working hours, but please spend some time first on finding a solution on your own. It’s perfectly fine to get stuck, and debugging is part of the learning experience.
If you have a personal matter feel free to get in touch by email Please do not use email if you have technical questions, it’ll be easier and more efficient for both of us to use Teams.
Page 7 of 8

Question: Why is the deadline on XXX? 

Answer: There will never be a deadline that works for everyone. Some students work late hours and are okay with deadlines that are in the morning, some students have the opposite habits and schedule, etc. This issue is compounded by the fact that this module is open to students from multiple study programmes, which have different schedules.
In this case, the deadline is set on Monday morning. If you do not prefer to work during weekends, you can work on it during the week and submit it on Friday. The deadline is given well in advance to enable everyone to plan their schedule as they wish.
Question: Will there be extensions?
Answer: Individual extensions are in principle not granted unless for medical, mental health, emergencies, and similar reasons, to be fair to the entire class. Again, the assignment is given well in advance to help you plan your schedule. If the deadline clashes with another module, do plan ahead.
Rules and policy
This course operates under the NUS Code of Student Conduct.
The assignments are strictly individual. Your work must be solely your own.
Consultation with others is of course allowed and encouraged, but it must be limited to discussion, not common solving of assignments. Sharing assignments and other materials is a violation of academic honesty policies.
You may be aided by online resources (and this is encouraged, widening your knowledge, not to mention that most programmers spend half their time googling), and you may even copy code fragments from online (e.g. from StackExchange) as long as you understand what you are doing, that you do so to a reasonable extent, and that you acknowledge these sources.
– HDB for making the resale flat transaction data available openly. – Photo in the header by on Unsplash
What’s the next assignment about?
In the next assignment, you will be using machine learning and other techniques to explore a dataset on grocery purchases in London, UK. The next assignment will be a tad more advanced in terms of structure and presentation — e.g. a research question must be formulated.
Happy coding! Wishing you a nice recess week.
Page 8 of 8

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com