代写 graph V Your Lab Task

V Your Lab Task
Give a data sets (the same datasets available eon ICE) is One month records in 31 CVS files with the same format. It is records of users’ clicks on advertisements shown on a web page. Each row in a file represents a single user’s record in one day, See following table.
Age
Gender
Impressions
Clicks
Signed_In
36
0
3
0
1
73
1
3
0
1
0
0
3
1
0
49
1
3
0
1
19
1
11
2
1

Number in impressions in the raw data file refers to numbers of ads have been request by a web page to server. You can see it as number of advertisements shown or number of times a user see advertisement.
The project is asking you to do the following:
• For a single day:
• Plot the distributions of number impressions and click-through-rate (CTR=# clicks/# impressions) for the 5 age categories as “<30", "30-39", "40-49", "50-59", "60+".  • Explore the data and make visual (plotting) and quantitative comparisons (table) across user demographics <18-year-old males versus < 18-year-old females. • Create metrics that summarize the data. The metrics should include CTR, quantiles, mean, variance, and max, and these can be calculated across the various user segments. (Quantiles is each of any set of values of a variate which divide a frequency distribution into equal groups, each containing the same fraction of the total population.) • Extend your analysis across 10 days (select any ten days). Visualize the distributions of the metrics you have created on 1.C over 10 days. • Write a report • Describe ONE pattern you found. • Provide data evidence: by source code (which has to be working) and data evidence (table) and graph evidence (plots) • Source code: Organize your whole source code in a file (source code) with comments “#” explain the process you have be through (understand data, explore data, finding problems, clean the wrong data, your key analysis, report (table) • Graph evidence: each plot you provided should have annotation with your interpretation and pattern you have found (over 10 days period). Submission: Put everything in a single word document with cover page (provided) and submit BOTH e-copy on ICE and hard (printed) copy to my pigeon hole (located on the 4th floor of SD building by the left) Deadline 17:00pm on the 11th Of October, 2019 The sample template as follows: • What you have found: (20 marks) I find “the user see and click advertisement changes over time, it has the peak on weekend and lowest interest on Monday”. … (not more than 200 words) • Here my running source code with comments shows what I did in each step of BDA process (50 marks) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 # this is my source code to demonstrate the BDA process # my process has six steps: data acquisition, explore data, … ## Read day1 data sourceData1comes <- read.csv("nyt1.csv", colClasses = "character") ## explore data by finding errors states <- unique(outcomes$State) ## clean data by re-assign data to a new data frame and ignore errors if (is.element(state,states)) { } else { stop("invalid state") } . . . ## this is my results in table to shows quantities evidence if (is.element(outcome,c("heart attack","heart failure","pneumonia"))) { if (outcome == "heart attack") {comparator = "Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack"} if (outcome == "heart failure") {comparator = "Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure"} if (outcome == "pneumonia") {comparator = "Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia"} } else { stop("invalid outcome") } ## this is plot to show graphic evidence all_values <- outcomes[[comparator]] if (num == "best") { selected_value <- min(all_values[outcomes$State == state], na.rm=TRUE) chosen_indices <- (all_values == selected_value) chosen_indices <- intersect(which(chosen_indices),which(outcomes$State == state)) hospitals_chosen <- outcomes$Hospital.Name[chosen_indices] return(min(hospitals_chosen,na.rm=TRUE)) } if (num == "worst") { selected_value <- max(all_values[outcomes$State == state], na.rm=TRUE) chosen_indices <- (all_values == selected_value) chosen_indices <- intersect(which(chosen_indices),which(outcomes$State == state)) hospitals_chosen <- outcomes$Hospital.Name[chosen_indices] return(min(hospitals_chosen,na.rm=TRUE)) } if (is.numeric(num)) { indices_of_state <- which(outcomes$State == state) values_for_state <- all_values[indices_of_state] ordered_values <- values_for_state[order(values_for_state)] selected_value <- ordered_values[num] # See how many of these values there are, and which one of those the one we picked is ## Check that the patterns are valid over 10 days ## Here are the plot produced from the code with explanation (30 marks) This graph shows the comparison ….