Name: KaiChen Sun
Due: August 27th by 11:59pm.
MTH-245 Homework 1
Instructions: Save this .Rnw file in your homework folder on your computer; call it
“Last Name MTH 245 HW 1.” Unless otherwise specified, all work (e.g., interpretations, calculations, etc.) should be done using R Sweave. Be sure to print both your R code and output. Upon completion of this assignment (prior to the due date), upload your .pdf and .Rnw files to the “Homework Assignments” folder within your individual MTH-245 Box. I should be able to replicate your .pdf file by running your .Rnw file without issue. Please come see me if you have any questions! Enjoy.
Grading: A random subset of problems will be selected for grading; solutions will be posted to the course Box.
1. Calculate the square of the value 10.
2. Calculate the square root of the value 10.
3. Calculate 34 divided by 32.
4. Print “We love beautiful R!”
5. Print the first 50 odd integers using the “seq()” function.
6. Print the cube root of the first 50 odd integers.
10^2
## [1] 100
sqrt(10)
## [1] 3.162278
(3^4)/(3^2)
## [1] 9
print(“We love beautiful R!”)
## [1] “We love beautiful R!”
seq(1, by = 2, len = 50)
## [1] 1 3 5 7 91113151719212325272931333537394143454749 ## [26] 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99
seq(1, by = 2, len = 50)^(1/3)
## [1] 1.000000 1.442250 1.709976 1.912931 2.080084 2.223980 2.351335 2.466212
## [9] 2.571282 2.668402 2.758924 2.843867 2.924018 3.000000 3.072317 3.141381
## [17] 3.207534 3.271066 3.332222 3.391211 3.448217 3.503398 3.556893 3.608826
## [25] 3.659306 3.708430 3.756286 3.802952 3.848501 3.892996 3.936497 3.979057
## [33] 4.020726 4.061548 4.101566 4.140818 4.179339 4.217163 4.254321 4.290840
## [41] 4.326749 4.362071 4.396830 4.431048 4.464745 4.497941 4.530655 4.562903
## [49] 4.594701 4.626065
7. Print the 1st, 20th, and 45th odd integer values.
f<-c(seq(1, by = 2, len = 50))
f[c(1, 20, 45)]
1
##[1] 13989
8. Write a function that will square an input value. Test your function by making sure 52 is 25.
9. Write a function that will sum the squares of two input values. Test your function by making sure the sum of the squares of the 11th even integer and the 31st odd integer is 4121.
10. Write a function that will convert temperatures in degrees Fahrenheit to temperatures in degrees Celsius. Test your function by making sure it converts 32 degrees Fahrenheit to 0 degrees Celsius.
11. Print the first 6 rows and last 6 rows of the “Salaries” dataset in R. Start with “data(Salaries).” Note: It will be helpful to read over the data description by telling R “?Salaries.”
library(tidyverse)
## -- Attaching packages --------------------------------------------------------------------------
tidyverse 1.3.0 --
## v ggplot2 3.3.2
## v tibble 3.0.3
## v tidyr 1.1.1
## v readr 1.3.1
v purrr 0.3.4
v dplyr 1.0.1
v stringr 1.4.0
v forcats 0.5.0
## -- Conflicts ----------------------------------------------------------------------------------- tidyverse conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library("carData")
data("Salaries")
Salaries %>% slice_head(n=6)
##
## 1 Prof B19
## 2 Prof B20
## 3 AsstProf B4
## 4 Prof B45
## 5 Prof B40
## 6 AssocProf B6
rank discipline yrs.since.phd yrs.service sex salary
18 Male 139750
16 Male 173200
3 Male 79750
39 Male 115000
41 Male 141500
6 Male 97000
Salaries %>% slice_tail(n=6)
## rank discipline yrs.since.phd yrs.service sex salary
2
square<-function(x)
return(x^2)
square(5)
## [1] 25
f1<-c(seq(0, by = 2, len = 11))
psquare<-function(x,y)
return(x^2+y^2)
psquare(f1[c(11)], f[c(31)])
## [1] 4121
dc<-function(x)
return((x-32)*(5/9))
dc(32)
## [1] 0
##1 Prof A 30
##2 Prof A 33
##3 Prof A 31
##4 Prof A 42
##5 Prof A 25
##6 AsstProf A 8
19 Male 151292
30 Male 103106
19 Male 150564
25 Male 101738
15 Male 95329
4 Male 81035
12. Use two different functions to print the column names of this dataset.
13. How many variables are being treated as factors?
Salaries %>% colnames()
## [1] “rank”
## [5] “sex”
names(Salaries)
## [1] “rank”
## [5] “sex”
“discipline”
“salary”
“discipline”
“salary”
“yrs.since.phd” “yrs.service”
“yrs.since.phd” “yrs.service”
class(Salaries$rank)
## [1] “factor”
class(Salaries$discipline)
## [1] “factor”
class(Salaries$yrs.since.phd)
## [1] “integer”
class(Salaries$yrs.service)
## [1] “integer”
class(Salaries$sex)
## [1] “factor”
class(Salaries$salary)
## [1] “integer”
3
## [1] 3
14. How many individuals are Assistant Professors?
15. How many individuals are from applied departments?
table(Salaries$rank)
##
## AsstProf AssocProf Prof
## 67 64 266
67
## [1] 67
?
## Error:
## 1: ?
3
## ^
16. How many individuals make more than the average salary?
17. Print the rows of the individuals who have the highest and lowest salaries.
18. What is the difference in median salary between males and females?
19. Based on your previous answer, are we able to conclude that males working in colleges and universities from around the world, overall, make higher salaries as compared to females? Explain your reasoning.
20. What are the 0th, 25th, 50th, 75th, and 100th percentiles in salary?
21. Let
“Very Low” denote an individual who makes between the 0th and 25th percentile in salary
“Low” denote an individual who makes between the 25th and 50th percentile in salary
“High” denote an individual who makes between the 50th and 75th percentile in salary
“Very High” denote an individual who makes between the 75th and 100th percentile in salary.
Create and attach a new column called “salary.status” based on the above designations. Print the first 10 rows of the new dataset. How many individuals are classified in each salary status designation (i.e., Very Low, Low, High, and Very High)? Note: There might be more than one correct solution.
22. Calculate the mean, median, standard deviation, interquartile range (IQR), minimum, and maximum of the salaries for each rank.
23. Refer to the “Seatbelts” dataset from Activity 1, and answer the questions that follow. I suggest attempting parts (a) − (c) using functionality from the tidyverse package, including the pipe operator; additional recommended functions are provided. However, creativity is valued; therefore, students should feel free to use alternative approaches.
(a) On average, what is the difference between drivers killed or seriously injured sitting in the rear versus the front? Recommended function: summarise at().
Salaries %>% summarise(mean=mean(salary))
## mean
## 1 113706.5
length(which(Salaries$salary>113706.5))
## [1] 168
Salaries %>% summarise(Lowest=min(salary),
Highest=max(salary))
## Lowest Highest
## 1 57800 231545
quantile(Salaries$salary)
## 0% 25% 50% 75% 100%
## 57800 91000 107300 134185 231545
Salaries$status<-merge(
## Error:
## # A tibble: 6 x 3
##
##
## 1
## 2
## 3
## 4
## 5
## 6
id variable value
1 var1 1
1 var2 0.2
1 var3 0.3
1 var4 0.6
2 var1 1.4
NA var2 1.9
(a) Manipulate the dataset to “wide format;” i.e., make “var1,” “var2,” “var3,” and “var4” their own columns. Print out the full dataset to make sure your code worked as intended. Recommended function: spread().
(b) It is pretty obvious that the two NA values in the fake.data dataset are supposed to be 2 and 4, respectively; make these changes. Recommended function: fill().
(c) Suppose variable 1 was measured on August 1, 2020, variable 2 on August 2, 2020, variable 3 on August 3, 2020, and variable 4 on August 4, 2020. Create and add a “date” variable. Print out
5
the full dataset to make sure your code worked as intended. Recommended functions: mutate(), rep(), and full seq().
(d) If, in the previous part, your printed dataset has correctly replaced the “NA” values in the id column and has a new “date” variable, then great; you are finished. If this is not the case, please continue to achieve this goal!
6