程序代写代做 chain Excel graph html INFS5710

INFS5710
C.L. Tseng
Homework 3
The total mark for this assignment is 100.
This homework assignment accounts for 5 of the total marks for this course.
This homework is due in Week 8 in the lecture.
This homework prepares you to do Project 2 including using the data visualisation functions
provided by Enterprise Guide. Please follow the instructions carefully. Mac users may use
Excel for data visualisation.
In some questions below, you will be asked to graph some results. You may use the graphic functions of SAS Enterprise Guide or Excel to graph. Provide the screenshot of the graphs if asked.
Please refer to the document Homework Submission Requirements to prepare your submission.
The following questions refer to the data set given by hw3transactions.csv file size 1.4 GB. In the file, you see detailed anonymised transaction data from thousands of shoppers in Jan 2013Feb 2013. It contains more than 2 million rows of transaction data, which cannot be completely open via Excel.
To begin, you need to use SAS Enterprise Guide to open the file by using File Import Data. You will then be asked to specify data; please try to understand what each step means. In this case, you may keep clicking next and finish. After the file is loaded to SAS, you need to save it as a SAS data file e.g., hw3transactions in your OrionDB folder to be linked by your queries.
The definition of each column is given below.
id: a unique id representing a customer
chain: an integer representing a store chain
dept: an aggregate grouping of the category e.g. water
category: the product category e.g. sparkling water
company: an id of the company that sells the item
brand: an id of the brand to which the item belongs
date: the date of purchase
productsize: the amount of the product purchase e.g. 16 oz of water
productmeasure: the units of the product purchase e.g. ounces
purchasequantity: the number of units purchased
purchaseamount: the dollar amount of the purchase
To interpret the transaction data, for example rows 1 10 describe what Customer id 86246 bought at Store id 205 on 2012013. This particular customer bought 10 different items. The exact items purchased by this customer are unknown, but we know their product categories, brands, and product companies. We also know what department section in a supermarket where each item is displayed. Since the only time stamp that we have from this data set is the date when a shopper shops, we assume in this homework that each shopper shops at one place no more than once per
1

day. This is a rich data set and there are a lot to explore from it. To limit our scope, we only focus on some basic analyses in this homework.
Given a data file opened in SAS Enterprise Guide, you can see some analysis and visualisation functions available from the tool bar below.
Most functions are straightforward to use. Graphs can be found under Graph; some useful analysis tools can be found under Analyze in the tool bar. You are expected to try them by yourself.
Note that the data visualisation functions only apply to a SAS data file only. When you write a query, before you can graph the table of the query outcome, you need to save the result table as a SAS data file. There are three ways of doing it.
Method 1: Create a table in your query. For such a big data set with raw data, executing a query may take a while. Sometimes it may be beneficial to consolidate data into new tables for analysis the same idea for having a data warehouse. You can create a table by having the first row in your query as follows:
Create table orion.yourfilename as
Select x as namex, y as namey, z as namez From …
Where …
This saves your query result as a data file. Later, you can even retrieve data from this new table by Select namex, namey, namez
From Orion.yourfilename
Where …
Method 2: copy your output table to a spreadsheet and import it as a data file. This, however, cannot work on large tables, but should be good enough for your output tables. To know how to plot a graph in Excel, there are many tutorials on the Internet, e.g., click this link. This approach will not be further addressed here.
Method 3: incorporate plotting functions in your query, which is too advanced for this course. You are recommended to use Method 1 or Method 2 for this homework and your Project 2. Question 1 25
This question focuses on shoppers.
a 15 Write a query to list the total number of shoppers who spent more than 100 each day in these two months. Your result should look like the following:
Date
Number of Shoppers
01Jan2013
02 Jan2013
2

:
:
28Feb2013
List the first and last 10 rows of the table. Furthermore, graph the output using a line plot with date as the horizontal axis and the number of shoppers as the vertical axis.
Provide the screenshot of the line chart.
b 10 Write a query to list the total number of shopper visits by weekdays. In SAS, weekday 1 for Sunday, 2 for Monday, etc. Your solution should look like the following table.
List the whole table. Graph the output using a bar chart with weekday as the horizontal axis and the number of shoppers as the vertical axis.
Weekday
Number of Shoppers
1
2
:
:
7
Provide the screenshot of the bar chart.
Question 2 25
This question analyses shopping behaviors of customers id.
a 10 Write a query to list the average amount of money each customer spent in each visit. Your result should look like the following table.
id
Avg. Amount
xxxxx.xx
3

Sort the data by amount in descending order. List the first and the last 10 rows of the table.
b 15 Each row displayed in the transaction data represents a purchased item. Write a query to find the average number of items purchased by each customer in a visit. List only those who bought on average less than 100 items in a visit. Your result should look like the following one.
Sort the table by the number of items in descending order. List the first and last 10 rows of the table. Furthermore, plot the histogram probability distribution of average number of items purchased in a visit. If you are not familiar with the concept of histogram, please read the following site about histogram. You may view the following link to learn how to plot a histogram using Excel. https:support.office.comenusarticlecreateahistogram85680173064b4024b39d 80f17ff2f4e8
To plot a histogram, choose Bar Chart Wizard. In Step 2 out of 4, choose Percentage for the Bar height. Provide the screenshot of the histogram. What can you observe from the histogram?
Question 3 25
This question analyses the correlation between total money spent and the total number of visits by each customer. If you are not familiar with the concept of correlation, please read it from the following site about correlation. Basically, if their correlation is positive or negative, we can say that a customer who visits the supermarket more frequently tends to spend more or less money there. It is possible that you may also find that these two things have no correlation or correlation close to 0, which means the total money spent and the number of visits are independent.
Write a query to list the total amount money spent and the number visits for each customer. Your result should look like the following table.
We shall focus on those customers whose total amount spent is positive and is less than 1,000 i.e., to excllude returnsrefunds and corporate customers. Sort your data based on id in ascending order. List the first and last 10 rows of the table. Furthermore, plot a 2D scatter chart for the Amount horizontal axis and the No. of Visits vertical axis.
id
Avg. No. of Purchased Items
xxx.xx
id
Amount
Visits
xxxxx.xx
4

Is there a correlation revealed from the scatter chart? Calculate the exact correlation between Amount and Visits of the result table. What is the Pearson correlation calculated by SAS? Provide a screenshot of the result.
Drag Amount and Visits from the left pane to the right pane.
PC Excel users, you will need the Analysis ToolPak addin to use the correlation function. Please see the following links to install the tool pack and learn how to calculate a correlation.
https:www.exceleasy.comdataanalysisanalysistoolpak.html
https:www.exceleasy.comexamplescorrelation.html
Mac Excel users, I believe Mac of a recent version has this addin already
https:support.office.comenusarticleloadtheanalysistoolpakinexcel6a63e598cd6d42e3 93176b40ba1a66b4. Otherwise, you can download a similar, free addin StatPlus:Mac LE https:www.analystsoft.comenproductsstatplusmacle
5

Question 4 25
Given transactions data, marketing people are interested in identifying very valuable customers who visit frequently with a certain level of spending. Here we focus on those customers who have spent in total between 500 and 1000 and have visited the stores over 6 times during these two months. We want to find out the most preferred product identified by both category and brand for each such customer and the total amount of money that the customer spends for this product during these two months. Your result should look like the following. You may write multiple queries for solving this question.
Sort the table by money in descending order. List the first and last 10 rows and detail how you obtain the result.
Customer id
Category
Brand
Money Spent
xxxxx.xx
6