程序代写代做代考 algorithm graph Boston College

Boston College
∥ Woods College of Advanced Studies ∥ Summer 2020 ∥ ADEC 7910 Software Tools Instructor: Anatoly Arlashin
R Homework 3
Draft due online through Canvas on Aug 2 Final version due online through Canvas on Aug 5
Ob jective
This assignment is about practicing data visualization in R using ggplot2 package. You goal will be to reproduce as close as possible pictures in the questions below.
Instructions
1. You should submit this assignment as a single R script 􏰁le that produces all the necessary output as required by the questions below. Important: your entire code should run without any errors! Having an error that breaks code execution, even if it is a small typo, will result in zero points for your assignment.
2. Make sure to use the template R script 􏰁le from Canvas as a starting point. In particular, to ensure that the data is imported into R in a proper way, make sure to use the read.csv() function in the way it is used in the template 􏰁le.
3. Use comments inside your 􏰁le to explain what you are doing with each part of each question. Make sure to comment out any code that is redundant, i.e. View() calls and calls to display contents of objects.
4. Important: all questions below must be completed using base R functions, with the excep- tions for ggplot2 , , repel and gridExtra packages. In other words, do not load any other external packages except for the ones loaded by template 􏰁le in general section.
5. All questions are extensions of similar issues discussed in class (see class script/video for details) and can be solved in the same way with a few tweaks.
6. Make sure there is no redundant/unused code in your 􏰁les. Your scripts should only in- clude your comments and code that must be executed to obtain answers. If you create any temporary objects (variables, datasets, etc), make sure to remove them at the end of your code.
lemon ,
scales
1

7. Each question outlines what data should be used and what graphics features should be present in each chart (i.e. chart type, chart title, legend position). All other visual aesthetics are optional and can be set to your liking (i.e. font color, line type, group colors, etc.). General guidance on how each chart should look like can be found in the examples at the end of this 􏰁le.
8. Your R script should produce 7 ggplot objects inside R environment and 7 .png 􏰁les as the exported versions of your charts. See template 􏰁le for details.
Data
The 􏰁les r.hw3.sales.csv” and “r.hw3.items.csv”, available through Canvas, contain a subset of public data on all registered sales in Iowa liquor stores in 2015. The 􏰁rst 􏰁le contains sales data (dates, quantities sold, prices, etc.) and the second 􏰁le contains liquor descriptions (category, subcategory, etc.). You will need to import both 􏰁les into R and merge them together to form a dataframe with the following variables from both 􏰁les:
Variable
Description
date
Date of sale
category
Category of the liquor ordered
subcategory
Subcategory of the liquor ordered
item.name
Name/description of the speci􏰁c liquor product ordered
volume
Volume of each individual liquor bottle in ml
price
The amount the store charges for each bottle of liquor sold
sale.bottles
The number of bottles of liquor sold
sale.volume
Total volume of liquor sold in liters
sale.dollars
Total cost of liquor sold in dollars
Note that items data contains info on all liquor items ever sold in Iowa liquor stores, but only some of those items were sold in 2015. Make sure to do a proper merge to avoid having missing data in the merged dataframe.
2

Questions
1. Calendar Heatmap. The chart should be a calendar heatmap similar to the one we did in class, showing total dollar sales of all liquor in Iowa per each calendar day in 2015. The required features of the chart are as follows:
• 6 color scale of daily total sales 1;
• legend at the bottom, as a single row, without title; • Weekday labels under the month labels;
• No grid lines for non-existing days in a month;
2. Price per liter and volume of sales. In this question you will need to visualize how price per liter price.per.l di􏰀ers across categories, and how the sales volume relates to average price per liter.
The 􏰁rst chart should show the distribution of bottom 95% values of prices per liter across 10 categories2 using box-and-whisker plots. Required features of the chart are as follows:
• 10 color scale for categories;
• horizontal box-plots spaced vertically across categories, colored according to category; • outliers in box-plots must be colored according to a category;
• legend on the right, no title;
• x-axis has breaks every $2;
The second chart should be visualizing the relationship between two aggregate measures for each subcategory: total volume sold and average weighted price per liter. Required features of the chart are as follows:
• a scatter plot with points being colored circles;
• points correspond to subcategories, but are colored according to category;
• several most prominent points are labeled according to the subcategory using ggrepel package, using subcategory name and weighted price per liter value as label3;
1
Asuggestedwaytodoitistouse cut() functiontocreateanewcategoricalvariablethatmaps sale.dollars into 6 intervals
2
Because it takes ggplot a while to generate boxplots for entire sample of sales, it is recommended to do a random sample of 10% of your data after you 􏰁lter out top 5% sales, and only plot that random sample. The picture at the end of this 􏰁le shows one such example, but in your case distributions may look di􏰀erently due to
randomization of sampling algorithm.
3
See chart example at the end of this 􏰁le for guidance and this tutorial on how to use ggrepel
3

• legend inside the chart area, top right corner, no title, transparent background; • x-axis has breaks every $2;
• y-axis is measured in thousands of liters sold, has breaks every 250 thousands;
Weighted average price per liter price.per.l.w for subcategory j should be calculated as price per liter price.per.l for each sale instance i in subcategory j weighted with a ratio of liters sold for this instance relative to total liters sold for subcategory:
􏰇 􏰄 sale.volume 􏰅 􏰆 􏰂price.per.l · sale.volumeij 􏰃
price.per.l.wj = price.per.lij 􏰆
i i sale.volumeij
3. Sales dynamics. The chart in this question should show the dynamics of total sales per category across months and weekdays. Required features of the chart are as follows:
• two line graphs placed next to each other (one row, two columns);
• each chart shows share of sales per month/weekday relative to total sales across all month-
s/weekdays;
• legend is between charts, no title;
• both charts use the same 10 color scale;
• y-axis is measured in percentages for both charts;
• each line connection has a circle with a black outline and 􏰁ll color equal to that of the line
Note: the % of sales should be calculated per category, not per time unit, e.g. “how much of total whiskey sales is done on Mon, Tue, Wed,…”, not “how much of Mon sales is whiskey vs all others”. In other words, percentages must sum up to 100% over the time for each category, not over all categories for each date.
4. Category rankings. Last chart is about ranking liquor categories based on total sales in dollars, liters and volume. Required features of the chart are as follows:
• no axis or gridlines;
• line segments for ranks are colored based on category, but category labels are not colored; • three ranking metrics are displayed both at top and bottom of the chart;
• chart has wide empty gaps on left and right, making it appear centered and narrow;
• ranks are shown as colored circles with white numbers inside them;
ij i ij
= 􏰆
i sale.volumeij
4