程序代写代做代考 Excel ____________________Name

____________________Name

_______________________Name

Fall 2018

1. (10 pts) With multi-category data, we often have to choose the type of correlation matrix

we generate to describe relationships. In a British Journal of Mathematical and Statistical

Psychology, Conor Dolan indicated that a critical cutoff for determining whether a Pearson

correlation is appropriate occurs when you have a variable with around 5 ordinal categories

(Dolan, 1994). Given that, I would like you to consider the data set, factdata.xls which includes

data in which students answered questions related to food preferences using four different

scalings approaches (true/false, Likert from Strongly disagree to Strongly agree, semantic

differential, Likert from Disagree to Agree). If you’re having trouble reading in an EXCEL

spreadsheet, the first Likert scale necessary for this question is in the raw data file, fLikert.dat.

The header includes the variable names. I’ve included the original questionnaire so you can see

what the questions looked like.

For the first type of Likert variables flkrt1 to flkrt20 (on the second page of the questionnaire), I

would like you to pick variables from one of the following five subgroups:

Seafood I: flkrt1-flkrt5

Fast food: flkrt6-flkrt10

Challenging food: flkrt11-flkrt15

Seafood II: flkrt16-flkrt20

and create a function to generate descriptive statistics appropriate to an interval level variable:

mean and standard deviation, and statistics appropriate to an ordinal level variable: median,

minimum, maximum, and range, along with the N.

I would like you to put these statistics into an object similar to descripstat2() in the program

scndprog.cowdata.R. Make sure to return a matrix and label the dimensions of the matrix

appropriately. I’m including the formula for the median and some other statistics we will need

later in the semesters in a file called, add.stats.R. When calculating the median, I want you to

use this computational formula, not the median() function. The same for the other descriptive

statistics. Please use computational formulas, not the functions that are preprogrammed.

Compare your results to describe() in the “psych” package.

Now, also in R, I would like you to create a table including three kinds of correlations: Pearson,

Spearman, and Kendall correlations. You can do create this table by stacking the correlation

matrices. Once you have all of the correlations in a single table, you will have to rename the

dimensions (rows and column) to let the reader know what is what.

The difference between the Spearman and Kendall coefficients involves assumptions regarding

the underlying distributions of the variables. Spearman ρ assumes that the ranks are interval

scales, while Kendall τ. So, does it matter here? Are the correlation coefficients different?

What about the central tendency, does the median differ from the mean? What do you

conclude?

Dolan, C. V. (1994). Factor analysis of variables with 2, 3, 5 and 7 response categories: A

comparison of categorical variable estimators using simulated data. British Journal of

Mathematical und Statistical Psychology, 41, 309-326

2. (6 pts) I would like you to write a program for intensive regression to simultaneously

regress dep4 of the portroy data onto dep1, dep2, and dep3. You can pull numbers representing

regression weights from a random uniform distribution. Make sure to vary each coefficient

between -1 and +1. Remember that you can place restrictions on the coefficients within the

runif() function. Use the lm() function to check your results. Note that if you are using three

variables to predict a fourth, the regression function would be:

lm(dep4 ~ dep1 + dep2 + dep3)

The standardized regression would be:

lm(scale(dep4) ~ scale(dep1) + scale(dep2) + scale(dep3))

3. (6 pts) For the 20 flkrt items, either factdata.xls or fLikert.dat, take the wide data set and

create a long data set creating the dependent variable flkrt. There will be 20 observations per

person, on four types of food: seafood_1, fast_food, challenging_food, and seafood_2.

Create a new indexing variable which identified which type of food each item is measuring.

So, the long data set should have

id food_type flkrt

This requires one trick that we didn’t discuss in class. The varying variables are going to be all

20 items. Pick a good v.name. I picked flkrt. I called the timevar variable food_type.

Rather than times, you want to provide the levels (values) for the food_type variable. There are

20 of them with 5 of each type. You can create that variable using

times = c(rep(1,5),rep(2,5),rep(3,5),rep(4,5)),

One you create this long data set, print it. You will see that the data are sorted by food type and

not by id. While the data are sorted by food_type, calculate the mean for each food type

pooling across items of that food type and individuals. You can use the indexing ability in R

and the mean() function to calculate the mean of the first 135 lines (food_type 1), the next 135

lines (food_type 2), etc to get the means for the four different food types. What are the means?

Which one is smallest, which one is largest?

Finally, sort the data by individual id making all of each individuals data contiguous. Make

sure to include your R output to show that you have done all of this successfully.

4. (5 pts) I would like you to write a function that will calculate a running sum for these two

series. You will initialize each sum at the value of the first number in each series, then start the

loop counter for each loop at 2 [Note that is a hint]. If you look at the two series, you will see

that they diverge. One simple way to show that they diverge is to calculate the difference

between the two series and show that the difference increases. I would like you to write a

function that returns the running sum for each series (the string of sums, not just the final sum),

and the running set of differences. You can then look at the differences and see that they

increase. Use the following strings:

First: 1 2 3 5 4 3 6 4 3 5 7 7 9

8

Second: 2 4 5 8 7 10 10 11 11 14 17 18 21 24

Make sure to subtract First string from the second, so that the differences are positive.

There are many different ways to do this. Any one that gets the correct answer is ok.