程序代写代做 flex Excel case study html data mining Case Study – Airbnb and Inside Airbnb

Case Study – Airbnb and Inside Airbnb
Airbnb – Holiday Lets, Homes, Experiences & Places (airbnb.co.uk)
Airbnb is an online marketplace for arranging or offering lodging i.e. temporary accommodation, primarily homestays, or tourism experiences. It was founded in August 2008 and has 12,736 employees as of 2019.
Service overview: Airbnb provides a platform for hosts to accommodate guests with short-term lodging and tourism-related activities. Guest can search for accommodation using filters such as location, price, and specific types of homes. Before booking, users must provide personal and payment information. Some hosts also require a scan of government-issued identification before accepting a reservation. Hosts provide prices and other details for their rental or listing e.g. number of guests included in the price, type of property, type of room, number of bathrooms, number of bedrooms, number of beds and type of bed, minimum number of nights for a reservation, and amenities. In addition, Airbnb also provides a review system where hosts

and guests can leave reviews about their experience, and rate each other after a stay. By October 2019, two million people were staying with Airbnb each night.
Cancellation policy: Airbnb allows hosts to choose between five types of cancellation policies, made to protect both hosts and guests. Options include:
https://www.airbnb.co.uk/home/cancellation_policies for definition for each categories)
Security Deposits: some reservations include a security deposit, which can be required by either Airbnb or the host. This helps build trust for both guests and hosts. Some hosts require a security deposit for their listing. If you are a guest and you are booking a listing with a host with host-required security deposit, you will be shown the amount before you make your reservation. The amount is set by the host, not Airbnb. In this case, no authorisation hold will be placed, and you will only be charged if a host makes a claim on the security deposit.
(see https://www.airbnb.co.uk/help/article/140/how-does-airbnb-handle-security-deposits Sources: Wikipedia, Airbnb.co.uk
Further information of Airbnb, please visit: https://www.airbnb.co.uk/
Inside Airbnb – adding data to the debate (http://insideairbnb.com/index.html)
Inside Airbnb is an independent, non-commercial set of tools and data that allows an individual to explore how Airbnb is really used in cities around the world. It was set up by Murray Cox and John Morries in 2016.
Airbnb claims to be part of the “sharing economy” and disrupting the hotel industry. However, data shows that the majority of Airbnb listings in most cities are entire homes, many of which are rented all year round – disrupting housing and communities. For example, local residents and governments are more concerned with people who are not present when the rental takes place and those who have multiple listing on the site, as opposed to a user who is renting a spare room.
By analysing publicly available information about a city’s Airbnb’s listings, Inside Airbnb provides filters and key metrics so user can see how Airbnb is being used to compete with the residential housing market. With Inside Airbnb, user can ask fundamental questions about
moderate, flexible, super_strict_30, super_strict_60.
strict_14_with_grace_period,
(see

• whether the listing is licensed
The Inside Airbnb tool or data can be used to answer some of these questions. Some understanding of how the Airbnb platform is being used will help clear up the laws as they change.
Source: insideairbnb.com
Further information of Inside Airbnb, please visit: http://insideairbnb.com/index.html
Airbnb in Greater Manchester, UK
Dataset: Airbnb_man_reduced.csv (available to download on blackboard), two additional datasets man_reviews.csv, and man_calander.csv are also provided for information only.
Description of the dataset: The Airbnb data for Greater Manchester is made available by Inside Airbnb. The original data set was downloaded from the website in November 2019. The number of variables however is reduced from the original data set. There are 4,848 listings in the data set with a total of 57 variables. Each row represents a single listing and contains information about the host of the property, the property’s characteristics and overall rating of the property, and its associated features by guests. Table 1 shows the name, description, and type of the 57 variables.
Table 1: variable name and description of the variable for the dataset.
#
Variable Name
Description
Variable Type
1.
listing_id
Unique identifier for each Airbnb listing
Numeric
2.
listing_url
url of the listing
Text
3.
description
Description of the listing
Text
4.
house_rule
Description of house rules
Text
5.
host_id
Unique identifier of the host
Numeric
6.
host_url
url of the host
Text
7.
host_name
Name of the host
Text
8.
host_since
Date since the host is a member
Date
9.
host_about
Description of the host
Text
10.
host_response_time
How quickly the host responds to inquiries. 5 categories: within a day, with an hour, a few days or more, within a few hours, N/A
Categorical
11.
host_response_rate
Rate at which host responded to inquiries (percentage value)
Numeric
12.
host_is_superhost
Is the host a superhost (1 = Yes, 0 = No)
Binary
13.
host_identity_verified
Whether the host is verified or not (1 = Yes, 0 = No)
Binary
14.
neighbourhood_cleased
Name of the neighbourhood (41 categories)
Categorical
15.
borough
Name of the borough (10 categories)
Categorical
16.
property_type
Type of the property (30 categories)
Categorical
17.
room_type
Type of the room. 4 categories: Entire home/apt, Private room, shared room, hotel room
Categorical
18.
accomodates
Number of people that can be accommodated
Numeric

19. bathrooms
20. bedrooms
21. beds
22. bed_type
23. amenities
24. price
25. weekly_price
26. monthly_price
27. Security_deposit
28. cleaning_fee
29. guest_included
30. extra_people
31. minimum_nights
32. maximum_nights
33. calendar_updated
34. has availability
35. availability_30
36. availability_60
37. availability_90
38. availability _365
39. number_reviews
40. first_review
41. last_review
42. review_scores_rating
43. review_scores_accuracy
44. review_scores_cleanliness
45. review_scores_checkin
46. review_scores_communication
47. review_scores_location
48. review_scores_value
49. instant_bookable
50. cancellation_policy
51. require_guest_profile_picture
52. require_guest_phone_verificati on
53. host_listings_count
Number of bathrooms
Number of bedrooms
Number of beds
Type of bed. 6 categories: Real Bed, Pull-out Sofa, Futon, Couch, Airbed
List of amenities included
Price per night (in GBP)
Price per week (in GBP)
Price per month (in GBP)
Amount of host-required security deposit.
Numeric
Numeric
Numeric
Categorical
Text
Numeric
Numeric
Numeric
Numeric Numeric
Numeric Numeric Numeric
Numeric Categorical Binary Numeric Numeric Numeric Numeric
Numeric Date/Time Date/Time Numeric
Numeric Numeric
Numeric Numeric
Numeric Numeric Binary
Categorical
Binary Binary Numeric
One-time fee charged by host to cover
the cost of cleaning their space.
Number of quests included in the price
Additional charge per person (GBP)
Minimum number of nights for a
reservation
Maximum number of nights for a
reservation
Calendar last updated by the host (70
categories)
Weather the host has availability or
not (1 = Yes, 0 = No)
Number of days available for the next
30 days
Number of days available for the next
60 days
Number of days available for the next
90 days
Number of days available for the next
365 days
number of reviews in total
Date of first review
Date of last review
Overall rating of the property
(percentage value)
Rating for the accuracy of the
description
Rating for the cleanliness of the
property
Rating for the check in experience
Rating for the host communication
with guests
Rating for the location of the property
Rating for the value of the property
Whether the property can be booked
in an instance (1 = Yes, 0 = No)
The cancellation policy for the host. 5
categories:
strict_14_with_grace_period,
moderate, flexible, super_strict_30,
super_strict_60
Whether guest profile picture is
required or not (1= Yes, 0 = No)
Whether guest phone verification is
required or not (1= Yes, 0 = No)
The number of listings of the host

54.
host_listings_count_entire_ho mes
The number of listings of the entire home
Numeric
55.
host_listings_count_private_ro oms
The number of listings of private rooms
Numeric
56.
host_listings_count_shared_roo ms
The number of listing of shared rooms
Numeric
57.
reviews_per_month
Number of reviews per month for the property
numeric
The local government and residents would like to know how Airbnb is used in the region and seek your help on this. They would particularly like to know how many of the listings/hosts are offering lodging and not running as a business i.e. temporary accommodation, primarily homestays, or tourism experiences and, as opposed to hosts offering long term let with multiple listing with no owner present (likely to be running a business) which could be illegal. You goals are to:
a) identify clusters of listings based on different (or a combination) set of variables e.g. host’s characteristics, listings/property’s characteristics and availability, and reviews from guests so as to provide insights to the local government and residents.
Note: The are many measurements could be used to differentiate the two e.g. single listing vs multiple listings although a host may list separate rooms in the same apartment, or multiple apartments or entire homes. Availability is another measure, likewise, occupancy. You are asked to justify the variables/measurements used for your clustering tasks. Greater Manchester uses the following parameters for the measurements:
• a high availability metric and filter of 60 days per year
• a frequent rented filter of 60 days per year
• a review rate of 50% for the number of guests marking a booking who leave a
review
• an average booking of 3 nights unless a higher minimum nights is configured
for a listing
• a maximum occupancy rate of 70% to ensure the occupancy model does not
produce artificially high results based on the available data (see
http://insideairbnb.com/greater- manchester/?neighbourhood=&filterEntireHomes=false&filterHighlyAv ailable=false&filterRecentReviews=false&filterMultiListings=false
b) select what you think is the best segmentation/clustering based on the results obtained in a) and comment on the characteristics. E.g. clusters that best separate between those are genuine lodging vs those could be illegal i.e. running as a business.
c) develop a classification model to identify those are genuine listings/host vs those could be considered illegal based on your results obtained in b).
Useful information/websites:
• Clampter (2014) Airbnb in NYC: The Real Numbers Beind the Sharing Story – available
at https://skift.com/2014/02/13/airbnb-in-nyc-the-real-numbers-behind-the-sharing-
story/
• Inside Airbnb http://insideairbnb.com/index.html

What to deliver in the final report:
You report should include the following sections:
1. Introduction: This should include background of Airbnb and Inside Airbnb, opportunities and challenges of the sharing economy to the business (Airbnb), home owners (hosts), local residents and governments, and guests/tourists, and how business intelligence and data mining could be used to address the opportunities and challenges for the various stakeholders. It should also outline how the report is structured. Justify your answer with examples/data and findings from literature and related work in this area.
2. Model building and Results Discussion
a) Identify clusters of listings
In this section, you should discuss the purpose of the data mining tasks, the data mining process, including data exploration and data preparation/preprocessing, and approaches taken e.g. variables used for the clustering. You are expected to justify and discuss any action/decision you made during the data mining process and models building, make references to your output in SAS Enterprise Miner within your report where necessary.
Note: In deciding what k to use (and also how many variables to include), the following factors should be considered: How distinct are the clusters? Is good separation achieved? How consistent are they? If cluster#1 shows low values on one measure, does it also show low value on other measures. How simple are they to describe? Simple clusters are more interpretable by domain knowledge experts, easier to take action on, and are more likely to be statistically stable and not the result of random chance.
b) Discuss what is the best segmentation/clustering based on the results obtained from the process in a). You should discuss what you think is the best segmentation and comment on the characteristic of these clusters. Consider how this information could be used by local government and residents. Use screenshots and/or make references to your output in SAS Enterprise Miner to illustrate important and interesting findings where necessary.
c) Develop a classification model that classify the data into these segments.
In this section, you should discuss the purpose of the data mining, including the target segment/cluster, the data mining process, including data preparation/preprocessing, and rationale and approaches taken e.g. variables used for the model building. You are expected to justify and discuss any action/decision you made during the data mining process and models building, as well as model evaluation, make references to your output in SAS Enterprise Miner within your report where necessary.
3. Conclusion, critical evaluation and suggestion for improvement
In this section, you are required to conclude and provide a summary of your key findings, and discuss the limitations of your data models/mining/analyses and suggestion for improvement by taking into consideration current research issues in data mining.

The criteria used for grading assignment:
Aspects/Criteria
% Range
Descriptors
Introduction (ILO-1, ILO3, ILO5)
70% and above
A highly effective introduction, setting context and indicating content that will follow.
Wide background reading; novel examples and use of relevant literature/sources in supporting the arguments/viewpoints.
60-69%
A very good introduction, setting context and indicating content that will follow.
Good background reading; generally very good use of examples and relevant sources/literature in supporting the arguments/viewpoints.
50-59%
Adequate introduction incorporating one or more of the above, yet lacking in clarity in some area(s). Good use examples and sources/literature in supporting the arguments/viewpoints.
49% and below
A basic introduction with a narrow or limited reference to defining the area, setting the context and indicating content that will follow.
Little evidence of appropriate reading or ability to synthesise information. No or little examples given.
Model Building, Results Discussion and Model Evaluation
(ILO2, ILO3, ILO4, ILO6)
70% and above
Novel and originality. A coherent, well focused, original approaches in the model building, entirely relevant to the tasks with excellent support and justifications for the variables, techniques used for the modelling. Excellent discussion and interpretation of the obtained results/analysis with original insights.
Excellent model evaluations and comparisons provided with clear evidence of critical analysis of findings.
60-69%
A generally clear and coherent discussion with good support or justification for the model building, which is directly relevant to the tasks. Clear rationale for the approaches taken.
Very good discussion and interpretation of the obtained results/analysis.
Very good model evaluations and comparisons provided with some critical analysis of findings.
50-59%
Reasonable attempt of the modelling but prone to being descriptive or narrative; little rationale for the approaches taken or justification of the variable used. Generally relevant to the stated tasks.
Reasonable discussion and interpretation of the obtained results/analysis.
Reasonable discussion of model evaluations and comparisons though with little evidence of critical analysis of findings.
49% and below
Little discussion and evidence of model building. Failure to understand the purpose of the task. Little discussion and interpretation of the obtained results/analysis.

Little or no discussion of model evaluations and comparisons
Conclusion, critical evaluation and future improvements (ILO1, ILO5 and ILO6)
70% and above
Comprehensive and extremely well discussed with original insights drawing from the analyses conducted and suggestion for future improvements.
69-69%
Very well discussed with interesting insight, drawing from the results/analyses conducted. Very good critical evaluation and suggestion for future improvement.
50-59%
Reasonably discussed but prone to being descriptive with little critical analysis based on the results/analyses conducted. Generally relevant to the stated tasks. Some critical analysis but prone to being descriptive or narrative; evidence supports the conclusion, but not always very directly /clearly. The question is not fully addressed.
49% and below
Largely descriptive. The discussion is limited in scope and/or relevance. The question is only partially addressed.