程序代写代做代考 data science Programming Exercises for Descriptive Statistics and Probability

Programming Exercises for Descriptive Statistics and Probability
__Please don’t use any external libraries to solve for the question. No built-in functions to calculate probability or entropy from R should be used for this part, the only help you can get from R should be dataframe manipulation. All answers for probability calculations need to be up to 2 decimal places – this is an instruction not a request. You need to follow coding standards for showing enough comments for your code as well as making sure that there is no duplication of codes (Write external functions to reduce duplications if required). You can check for the coding standards from your previous programming unit – As the prerequisites for this unit, you have already done some sort of programming units so saying that you don’t know coding standards is not an excuse and we will not accept that. Please show all working including code and presentation for this question.__
Sports analytics (i.e., the application of data science techniques to competitive sports) is a rapidly growing area of data science. In this question we will look at some very basic analytics applied to the outcomes of consecutive games of English Premier League (EPL). The file EPL.csv contains a record of the outcomes of games of EPL played by Premier League Teams in the seasons from 1993 to 2018. The data is sequential, in the sense that each row recorded the result whether the home team wins (H), the away team wins (A), or there is a draw (D).
20 Teams per season, each team will play 38 games in total, 19 games at Home and 19 Games away
Results of 4560 Premier League matches – 380 matches over 12 seasons from 2006/2007 to 2017/2018
In [ ]:
# Read the data
EPL <- read.csv("EPL.csv") In [ ]: # Get the summary summary(EPL) home_team away_team home_goals Arsenal : 228 Arsenal : 228 Min. :0.000 Chelsea : 228 Chelsea : 228 1st Qu.:1.000 Everton : 228 Everton : 228 Median :1.000 Liverpool : 228 Liverpool : 228 Mean :1.543 Manchester City : 228 Manchester City : 228 3rd Qu.:2.000 Manchester United: 228 Manchester United: 228 Max. :9.000 (Other) :3192 (Other) :3192 away_goals result season Min. :0.000 A:1288 2006-2007: 380 1st Qu.:0.000 D:1164 2007-2008: 380 Median :1.000 H:2108 2008-2009: 380 Mean :1.144 2009-2010: 380 3rd Qu.:2.000 2010-2011: 380 Max. :7.000 2011-2012: 380 (Other) :2280 Take a look at entries number 3, the home team is Everton and the Away team is Watford, the result was "H" indicating that Everton won the match, or you can compare the home_goals and away_goals In [ ]: # Inspect the data head(EPL) home_team away_team home_goals away_goals result season Sheffield United Liverpool 1 1 D 2006-2007 Arsenal Aston Villa 1 1 D 2006-2007 Everton Watford 2 1 H 2006-2007 Newcastle United Wigan Athletic 2 1 H 2006-2007 Portsmouth Blackburn Rovers 3 0 H 2006-2007 Reading Middlesbrough 3 2 H 2006-2007 Part 1: Statistics for the whole dataset Part 1.1. (2 Marks) Which season has the highest number of goals? In [ ]: highest <- function(dat = EPL){ # Name of the highest season seasonH <- NULL ## Solution: # Printing out the max season cat("Highest goal scoring season is: ", seasonH) } highest() Highest goal scoring season is: Part 1.2. (2 Marks) Which season has the lowest number of goals? In [ ]: lowest <- function(dat = EPL){ # Name of lowest season seasonL <- NULL ## Solution # Printing out the min season cat("Lowest goal scoring season is: ", seasonL) } lowest() Lowest goal scoring season is: Part 1.3. (2 Marks) which team has the highest average goals per season $\frac{\text{Total goals score}}{\text{seasons played}}$? In [ ]: # Defining a function to calculate the goals score by each team team.goals <- function(dat = EPL){ return() } In [ ]: team.highest <- function(dat = EPL){ # name of the team: teamN <- NULL goals.ratio <- 0 ## Solution # Printing out the team: cat("The team with the highest goal score is: ", teamN, "the goals ratio is", round(goals.ratio,2)) } team.highest() The team with the highest goal score is: the goals ratio is 0 Part 1.4. (2 Marks) Which team concedes the most average goals per season $\frac{\text{Total goals concedes}}{\text{seasons played}}$? In [ ]: team.lowest <- function(dat = EPL){ # name of the team: teamN <- NULL goals.ratio <- 0 ## Solution # Printing out the team: cat("The team with the lowest goal score is: ", teamN, "the goals ratio is", round(goals.ratio,2)) } team.lowest() The team with the lowest goal score is: the goals ratio is 0 Part 2: Statistics for the individual team (Manchester United) (27 Marks) In [ ]: # Lower case all team name for ease of performing the task EPL$home_team <- tolower(EPL$home_team) EPL$away_team <- tolower(EPL$away_team) In [ ]: head(EPL) home_team away_team home_goals away_goals result season sheffield united liverpool 1 1 D 2006-2007 arsenal aston villa 1 1 D 2006-2007 everton watford 2 1 H 2006-2007 newcastle united wigan athletic 2 1 H 2006-2007 portsmouth blackburn rovers 3 0 H 2006-2007 reading middlesbrough 3 2 H 2006-2007 Part 2.1. (2 Marks) Find out the probabilities P(Manchester United Wins), P(Manchester United Loses), and P(Manchester United Draws). This includes all the results both home and away. In [ ]: Task2.1 <- function(team = "manchester united", dat = EPL){ # Set up the variables P.W <- NULL P.L <- NULL P.D <- NULL ## Solution # Print out the results: cat("The probability that", team, "wins is: ", P.W, '\n') cat("The probability that", team, "loses is: ", P.L, '\n') cat("The probability that", team, "draws is: ", P.D, '\n') } Task2.1() The probability that manchester united wins is: The probability that manchester united loses is: The probability that manchester united draws is: Part 2.2. (2 Marks) Find out the conditional probabilities: 1. P(Man Utd Wins| Playing at Home) 2. P(Man Utd Wins| Playing away) 3. P(Man Utd Draws| Playing at Home) 4. P(Man Utd Draws| Playing away) 5. P(Man Utd Loses| Playing at Home) 6. P(Man Utd Loses| Playing away) Please make comparison and a general conclusion. In [ ]: Task2.2 <- function(team = "manchester united", dat = EPL){ # Set up the variables P.W.H <- NULL P.W.A <- NULL P.D.H <- NULL P.D.A <- NULL P.L.H <- NULL P.L.A <- NULL # Solution # Print out the results cat("P(", team, "Wins|Playing at Home) = ", P.W.H, '\n') cat("P(", team, "Wins|Playing away_team) = ", P.W.A, '\n') cat("P(", team, "Draws|Playing at Home) = ", P.D.H,'\n') cat("P(", team, "Draws|Playing away_team) = ", P.D.A, '\n') cat("P(", team, "Loses|Playing at Home) = ", P.L.H,'\n') cat("P(", team, "Loses|Playing away_team) = ", P.L.A,'\n') } Task2.2() P( manchester united Wins|Playing at Home) = P( manchester united Wins|Playing away_team) = P( manchester united Draws|Playing at Home) = P( manchester united Draws|Playing away_team) = P( manchester united Loses|Playing at Home) = P( manchester united Loses|Playing away_team) = Part 2.3. (3 Marks) What is the probability that Man Utd will win a game given that they won their previous game? In [ ]: # Optional function to make the whole thing easier, count joint probability win, loss for a team (This is a suggestion, not a requirement) count.win <- function(team, dat){ ww <- 0 wl <- 0 lw <- 0 ll <- 0 return(c(joint.ww, joint.wl, joint.lw, joint.ll)) } In [ ]: Task2.3 <- function(team = "manchester united", dat = EPL){ # Set up the variable P.W.W <- NULL ## Solution # Print out the results return(P.W.W) } cat("P(manchester united Wins| winning the previous game) = ", Task2.3(), '\n') P(manchester united Wins| winning the previous game) = Part 2.4. (3 Marks) What is the probability that Man Utd will win a game given that they didn't win their previous game? In [ ]: Task2.4 <- function(team = "manchester united", dat = EPL){ # Set up the variable P.L.W <- NULL ## Solution # Print out the results return(P.L.W) } cat("P(manchester united Wins| not winning the previous game) = ", Task2.4(), '\n') P(manchester united Wins| not winning the previous game) = Part 2.5. (3 Marks) Calculate the probability of Man Utd not winning their next two games given that they won their previous game. In [ ]: Task2.5 <- function(team = "manchester united", dat = EPL){ # Set up the variable P.W.L.L <- NULL ## Solution # Return return(P.W.L.L) } cat("P(not winning their next two games | winning the previous game) = ", Task2.5(), '\n') P(not winning their next two games | winning the previous game) = Part 2.6. (3 Marks) Given that a win is three points, a draw is one point and a loss is 0 point. Which season Man Utd receive the highest point tally. In [ ]: # optional support function to create points gained by a team agg.support <- function(team, dat){ return() } In [ ]: Task2.6 <- function(team = "manchester united", dat = EPL){ # Set up the variable S.H <- NULL H.P <- NULL ## Solution # Return the result cat("The season that", team, "achieved the highest points is", S.H, " in which they achieved", H.P, "points") return(S.H) } Task2.6() The season that manchester united achieved the highest points is in which they achieved points NULL Part 2.7. (3 Marks) Given that a win is three points, a draw is one point and a loss is 0 point. Which season Man Utd receive the lowest point tally. In [ ]: Task2.7 <- function(team = "manchester united", dat = EPL){ # Set up the variable S.L <- NULL L.P <- NULL ## Solution # Return the result cat("The season that", team, "achieved the highest points is", S.L, " in which they achieved", L.P, "points") return(S.L) } Task2.7() The season that manchester united achieved the highest points is in which they achieved points NULL Part 2.8. (3 Marks) Printing out the same statistics from Task 1 to Task 5 but only for the two seasons that they achieved the highest and lowest score (please also print out the name of the season) In [ ]: # Optional support function to print the result based on the task print.tasks <- function(team, dat){ } In [ ]: Task2.8 <-function(team = "manchester united", dat = EPL){ S.H <- NULL cat('\n') S.L <- NULL cat('\n') ## Solution # For Highest season cat('\n') cat("The statistics for highest points season for", team,",", S.H,",are as followed:") cat('\n') cat("The statistics for lowest points season for", team,",", S.L,",are as followed:") } Task2.8() The statistics for highest points season for manchester united , ,are as followed: The statistics for lowest points season for manchester united , ,are as followed: Part 2.9. (3 Marks) Writing a function that take the argument such as "2006-2007" and print out the result from Task 1 to Task 5 as well as the total points they received. Make sure that your function can handle errors. In [ ]: # Please write your own function here, we will provide you the frame but that is all Task2.9 <- function(team = "manchester united", dat = EPL, S = "2006-2007"){ } Task2.9() NULL Part 2.10. (2 Marks) Polish the above function by taking in a vector of seasons, instead of just 1 season, print out the results from Task 1 to Task 5 and also their total points tally. Make sure that you can handle errors and duplications In [ ]: # Please write your own function here, we will provide you the frame but that is all Task2.10 <- function(team = "manchester united", dat = EPL, S = c("2006-2007", "2008-2009", "2012-2013")){ } Task2.10() NULL Part 3: Statistics for the individual team - automated (5 Marks) Writing a function that takes two inputs, the first one is the team name/vector of team names (compulsory input) e.g. It can be "chelsea", "Manchester united", " Arsenal", etc. The second one is the season/vector of seasons (optional). Printing out the statistics from Part2. Task 1 to Task 7. Make sure that the function can handle errors, you can assume that if we put "chelsea" should be the same with "Chelsea". Printing out the results in a way that we can understand the statistics. Make sure that you can present your result in a way that is easy to understand. In [ ]: # Please write your own function here, we will provide you the frame but that is all Task3 <- function(Team = c("Chelsea", "Manchester City", "Arsenal"), dat = EPL, S = c("2006-2007", "2008-2009", "2012-2013")){ } Task3() NULL References