程序代写代做代考 Homework 3 Solutions

Homework 3 Solutions

Homework 3 Solutions
Gabriel Young (gjy2107)

March 22, 2018

i.

Open the link http://www.espn.com/nba/team/schedule/_/name/BKN/seasontype/2. Display the source
code and copy and paste this code into a text editor. Then save the file as NetsSchedule1718 using a .html
extension. Once the file is saved, check that you can open the file and it displays the 2017-2018 Brooklyn
Nets Regular Season Schedule.

ii.

setwd(“~/Desktop/Data”)
nets1718 <- readLines("NetsSchedule1718.html") ## Warning in readLines("NetsSchedule1718.html"): incomplete final line found ## on 'NetsSchedule1718.html' The number of lines in the file corresponds to the length of the vector nets1617. length(nets1718) ## [1] 828 I can find the number of characters in each line of the file by running nchar(nets1617) since nchar() vectorizes. This will return a vector of length 9306 with each element telling the number of characters in the corresponding line of the file. Then we can take a sum of these values to give the total number of characters. sum(nchar(nets1718)) ## [1] 129185 Finally, I can use the max() command, with nchar(nets1617) as its input, to find the maximum number of characters in any line of the code. max(nchar(nets1718)) ## [1] 9736 iii. In the first game of the regular season, the Nets are playing the Indiana Pacers in Indianapolous Wednesday, October 18 at 7:00PM. In the last game of the season, the Nets are playing the Boston Celtics in Boston on Wednesday, April 11 at 8:00PM. iv. The 321th line corresponds to the first game of the regular season and the 402rd line corresponds to the last game of the regular season. v. I use a regular expression to search for a capital letter, followed by two lowercase letters, a comma, a space, a capital letter, two lowercase letters, a space, and then one or more digits. This regular expression is found in date_exp. Then I use grep() to search nets1718 for lines with dates in them. These lines are stored in game.lines. Looking at the first and last values of game.lines I see information on the first and last games. 1 http://www.espn.com/nba/team/schedule/_/name/BKN/seasontype/2 date_exp <- "[A-Z][a-z]{2},\\s[A-Z][a-z]{2}\\s[0-9]+" game.lines <- grep(date_exp, nets1718) game.lines[1] ## [1] 321 nets1718[game.lines[1]] ## [1] "\t\t\t\t

game.lines[length(game.lines)]

## [1] 402

nets1718[game.lines[length(game.lines)]]

## [1] “

v. gregexpr() returns the starting locations and the lengths of each of the game dates, then we can actually
extract the information using regmatches(). Since the output of regmatches() is a list, we use the unlist()
command to turn it into a vector.

date.locations <- gregexpr(date_exp, nets1718[game.lines]) date <- regmatches(nets1718[game.lines], date.locations) date <- unlist(date) vi. Extracting the game times is similar to extracting the dates, but now my regular expression searches for one or more digits followed by a colon, 2 digits, a space, and then either AM or PM. time_exp <- "[0-9]+:[0-9]{2} (PM|AM)" time.locations <- gregexpr(time_exp, nets1718[game.lines]) time <- regmatches(nets1718[game.lines], time.locations) time <- unlist(time) vii. In my solution, I use the fact that in each line, the string

  • appears
    before the home or away information. So my regular expression searches for

  • followed by ‘@’ or
  • followed by ‘vs’. As in part (v) and (vi) I use
    gregexpr() and regmatches() to actually extract the strings which match the regular expression. Since
    these strings include

  • before ‘@’ or ‘vs’, I then use the substr() command
    just the ‘@’ or the ‘vs’. Finally, I create the home vector from this information.

    away_exp <- "

  • @|

  • vs”
    away.locations <- gregexpr(away_exp, nets1718[game.lines]) away <- regmatches(nets1718[game.lines], away.locations) away <- substr(away, 25, nchar(away)) home <- rep(1, length(away)) home[away == "@"] <- 0 2 viii. In my solution, I use the fact that in each line, the string
  • ̈ appears before the
    opponent and afterwards. So my regular expression searches for

  • ̈ followed
    by anything inside ‘<’ and ‘>’, letters or space, and . gregexpr() and regmatches() are used to
    actually extract the strings which match the regular expression. Since these strings include extra
    information, I use another regular expression to search for the opponent’s name recognizing that this
    will take the form of letters or spaces coming after ‘>’ and before ‘<’. Finally, we extract just the opponent names using substr(). opponent_exp <- "
  • <.+>([a-zA-Z]|\\s)+”
    opponent.locations <- gregexpr(opponent_exp, nets1718[game.lines]) opponent <- regmatches(nets1718[game.lines], opponent.locations) opponent <- unlist(opponent) name_exp <- ">([a-zA-Z]|\\s)+<" name.locations <- gregexpr(name_exp, opponent) name <- regmatches(opponent, name.locations) opponent <- substr(name, 2, nchar(name)-1) ix. schedule <- data.frame(date, time, opponent, home) schedule[1:10,] ## date time opponent home ## 1 Wed, Oct 18 7:00 PM Indiana 0 ## 2 Fri, Oct 20 7:30 PM Orlando 1 ## 3 Sun, Oct 22 3:30 PM Atlanta 1 ## 4 Tue, Oct 24 7:00 PM Orlando 0 ## 5 Wed, Oct 25 7:30 PM Cleveland 1 ## 6 Fri, Oct 27 7:30 PM NY Knicks 0 ## 7 Sun, Oct 29 6:00 PM Denver 1 ## 8 Tue, Oct 31 7:30 PM Phoenix 1 ## 9 Fri, Nov 3 10:30 PM Los Angeles 0 ## 10 Mon, Nov 6 9:00 PM Phoenix 0 3

  • 2018 Regular Season Schedule
    OCTOBER OPPONENT TIME (ET) TV TICKETS
    Wed, Oct 18 7:00 PM   3,436 available from $24
    Wed, Apr 11 8:00 PM