程序代写代做代考 —


title: “In-class Exercise 4”
author: “Your Name Here”
output:
html_document:
highlight: tango
theme: paper
toc: yes
toc_depth: 3

##### Remember to change the `author: ` field on this Rmd file to your own name.

### 1. Writng a data summarization function

**(a)** Rewrite the code you provided for question 5(a) in the 3rd in-class exercise as a function with the name `groupwise.sum.fun` and the parameter list `(data, columns, grouping)`.

It should allow users to change the data frame, the columns, the grouping rule to calculate groupwise sums for specified columns from call to call. More specific requirements are as follows:

1. The parameter `columns` supports indexing with integer, Boolean, and character vectors.
2. Calling the function returns a matrix with rows labeled with group number and columns labeled with variable names.
3. Matrix rows is sorted by group number in ascending order.

**Tips**:

– `unique()` can return distinct values in the `grouping` vector, but it won’t reorder them;
– use `sort()` to to sort distinct values in ascending order.

“`{r}
# Edit me

“`

If you run the following code with your `groupwise.sum.fun` function

“`{r eval=F}
pros.dat <- read.table("pros.dat") # the data file used for the 3rd in-class exercise set.seed(0) grouping <- sample(1:10, size = 97, replace = T) groupwise.sum.fun(data = pros.dat, columns = c(2, 8, 3, 6), grouping = grouping) ``` The output should look like: ``` ## lweight pgg45 age lcp ## 1 29.18716 195 500 1.6724007 ## 2 25.71542 110 432 -5.9904884 ## 3 25.88829 249 444 1.6974904 ## 4 21.74769 75 384 -7.1421928 ## 5 32.04308 200 555 3.2147904 ## 6 46.44447 346 850 -4.4274876 ## 7 50.51550 380 904 0.2620661 ## 8 29.21921 160 490 -1.1274941 ## 9 40.30110 260 700 -6.4797308 ## 10 50.94550 390 936 0.9221852 ``` The content of the `grouping` vector should be the following in the rendered html file (note directly run the sampling code in console or in the code chunk above will give you a different sequence; so **verify your output by knitting this rmd file**). ``` ## [1] 9 4 7 1 2 7 2 3 1 5 5 10 6 10 7 9 5 5 9 9 5 5 2 10 9 ## [26] 1 4 3 6 10 10 6 4 4 10 9 7 6 9 8 9 7 8 6 10 7 3 10 6 8 ## [51] 2 2 6 6 1 3 3 8 6 7 6 8 7 1 4 8 9 9 7 4 7 6 1 5 6 ## [76] 1 9 7 7 3 6 2 10 10 7 3 2 10 1 10 10 8 10 5 7 8 5 ```

**(b)** Define default values for the `columns` and `grouping` parameters for the `groupwise.sum.fun` function in (a).

1. The default value to `columns` should be a Boolean indexing vector that selects all numeric (integer inclusive) columns in the order they show in the data frame.

2. The default value to `grouping` should group all observations into a single group.

**Tips**:

1. define default value in terms of other parameters;
2. `rep.int`. try `rep.int(1, times = 3)` to see how it works;
3. `sapply(data, is.numeric)` finds all numeric columns.

“`{r}
# Edit me

“`

If you run the following code with your revised `groupwise.sum.fun` function:

“`{r eval=F}
library(MASS)
groupwise.sum.fun(UScereal)
“`

The output look like:

“`
## calories protein fat sodium fibre carbo sugars shelf
## 1 9711.537 239.4408 92.46499 15459.49 251.6049 1297.895 653.3047 141
## potassium
## 1 10342.78
“`

**(c)** Revise the `groupwise.sum.fun` function in (b) to a new function named `groupwise.fun` which allows us to further vary built-in aggregation functions (e.g., `mean`, `sum`, `median`, etc.) across different calls. Set the default aggregation function to `mean`

The new parameter list should look like this `(data, columns, grouping, agg_fun)`.

“`{r}
# Edit me

“`

**(d)** Use `…` to revise the `groupwise.fun` function in (c) to allow us to use the `na.rm` option of built-in aggregation functions to handle missing values when needed.

“`{r}
# Edit me

“`

### 2. Free variables

Compare the following two code snippets and predict what would be the result of calling `f(5)` in both cases. Explain why.

“`{r eval=F}
y <- 3 f <- function (x) { y <- 1 g <- function (x) { (x + y) / 2 } g(x) } ``` ```{r, eval=F} y <- 3 f <- function (x) { y <- 1 g(x) } g <- function (x) { (x + y) / 2 } ``` > Replace this with your answer (keep the greater-than sign)

### 3. Function factories

Define a “constructor” function named `NegLogLik` for some numerical optimization problems as follows:

“`{r}
NegLogLik <- function(data, fixed=c(FALSE, FALSE)) { params <- fixed function(p) { params[!fixed] <- p mu <- params[1] sigma <- params[2] a <- -0.5 * length(data) * log(2 * pi * sigma^2) b <- -0.5 * sum((data - mu)^2) / sigma^2 - (a + b) } } ``` Then use it to create a function named `nLL` with a data frame named `a_data_frame`. ```{r } nLL <- NegLogLik(a_data_frame) nLL ``` What names exist in the **enclosing environment** of `nLL`? Explain why. > Replace this with your answer (keep the greater-than sign)

### 4. Function counters

We define a function factory for creating counters that record the number of times a function has been called

“`{r}
new_counter <- function() { i <- 0 function() { i <<- i + 1 i } } ``` Then we create two counters by using this function factory as follows: ```{r} counter_1 <- new_counter() counter_2 <- new_counter() ``` Run the following R code ```{r} counter_1() counter_1() counter_2() ``` Please explain how these function counters work. **Tips**: refer to the notes on `<<-` > Replace this with your answer (keep the greater-than sign)

### 5. Optional practice

Try re-running as much of the lecture code as you have time for, taking time to understand what is happening in each line of code.

Instead of copying and pasting code, practice typing it yourself. This will help you to learn the syntax.

“`{r}
# Edit me (optional)

“`