程序代写代做代考 Homework 1 Solutions

Homework 1 Solutions

Homework 1 Solutions
Gabriel Young (gjy2107)

Febuary, 8th 2018

Part 1

i. Looking at Titanic.txt I see that the data is tab delimited and therefore I use read.table(). Since the
first row of data in the text file holds the column names, I use header = TRUE.

setwd(“~/Desktop/Data”)
titanic <- read.csv("Titanic.txt", header = TRUE, as.is = TRUE) ii. The function dim() provides the dimension of its input object. dim(titanic) ## [1] 891 12 str(titanic) ## 'data.frame': 891 obs. of 12 variables: ## $ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ... ## $ Survived : int 0 1 1 1 0 0 0 0 1 1 ... ## $ Pclass : int 3 1 3 1 3 3 1 3 3 2 ... ## $ Name : chr "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ... ## $ Sex : chr "male" "female" "female" "female" ... ## $ Age : num 22 38 26 35 35 NA 54 2 27 14 ... ## $ SibSp : int 1 1 0 1 0 0 0 3 0 1 ... ## $ Parch : int 0 0 0 0 0 0 0 1 2 0 ... ## $ Ticket : chr "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ... ## $ Fare : num 7.25 71.28 7.92 53.1 8.05 ... ## $ Cabin : chr "" "C85" "" "C123" ... ## $ Embarked : chr "S" "C" "S" "S" ... iii. There are multiple ways to do this. In the following, I add a new column called Survived.Word with each entry equal to “survived”. Then I reassign the values in the variable Survived.Word to “died” in the rows where Survived equals 0. titanic$Survived.Word <- "survived" titanic$Survived.Word[titanic$Survived == 0] <- "died" Alternatively, we could use ifelse() to complete this. The first argument is a logical vector that’s TRUE when Survived equals 1 and FALSE when Survived equals 0. titanic$Survived.Word <- ifelse(titanic$Survived == 1, "surived", "died") 1 Part 2 i. To solve this problem we create a sub-matrix of the variables of entry and then pass this sub-matrix into the apply() command. sub_mat <- titanic[, c("Survived", "Age", "Fare")] apply(sub_mat, 2, mean) ## Survived Age Fare ## 0.3838384 NA 32.2042080 The mean of “Survived” is the proportion of passengers that survived the disaster. The mean of the Age variable is NA, because some of the passenger’s ages are unknown (i.e. Age also has some missing values). ii. As in the last question, we can calculate the proportion of survivors by taking the mean of the Survived variable, but here we filter to only include female passengers. round(mean(titanic$Survived[titanic$Sex == "female"]), 2) ## [1] 0.74 iii. To answer this question we create a sub-matrix survivors which only includes the rows of titanic corresponding to those surviving the disater. Then we calculate the proportion as the number of female passengers in the survivors matrix divided by the total number of people in the survivors matrix. survivors <- titanic[titanic$Survived == 1, ] proportion <- sum(survivors$Sex == "female")/length(survivors$Sex) round(proportion, 2) ## [1] 0.68 Alternatively, we can use the table() command. survivors <- titanic[titanic$Survived == 1, ] proportion <- table(survivors$Sex)/length(survivors$Sex)[1] round(proportion, 2) ## ## female male ## 0.68 0.32 iv. classes <- sort(unique(titanic$Pclass)) Pclass.Survival <- vector("numeric", length = 3) names(Pclass.Survival) <- classes for (i in 1:3) { thisclass <- titanic[titanic$Pclass == i, ] Pclass.Survival[i] <- round(mean(thisclass$Survived), 2) } 2 v. Pclass.Survival2 <- round(tapply(titanic$Survived, titanic$Pclass, mean), 2) Pclass.Survival == Pclass.Survival2 ## 1 2 3 ## TRUE TRUE TRUE vi. Pclass.Survival ## 1 2 3 ## 0.63 0.47 0.24 There does appear to be a relationship between survival and class. We can see from the previous question that the survival rate decreases with ticket class, meaning fewer members of the lower class survived than members of the upper class. 3 Part 1 Part 2