R语言代写 MAST90044 Assignment 1

MAST90044 Thinking and Reasoning with Data Semester 1 2018
Assignment 1
Due: 5pm, Thursday 29 March

Instructions

  • Assignments are to be placed in the appropriate subject and lab box located just inside the north entrance to the Peter Hall Building. Assignments must be stapled.
  • Please label your assignment with the following information:

    – your name;
    – your student number;
    – your lab class;
    – your tutor’s name (Sandy, Steve or David).

  • You must sign the plagiarism ideclaration. The link is available on the LMS.
  • Your assignment should show all working and reasoning, as marks will be given for method as well as

    for correct answers. Please spell check your document.

  • Paste any R code and output into the appropriate places so that it can be seen easily along with your other work. Graphics from R can be resized within your document; make them smaller as necessary.
  • Assignments count for 50% of the assessment in this subject. This one is worth 15%, and covers the work done in weeks 1 to 3.
  • Tutors will not help you directly with assignment questions. However, they may give some help with R.
  • Solutions to the assignment questions will be made available later.
  • When constructing a panel of graphs with multiple plots, it is good to use the R command par(mfrow = c(nrows,ncols)) where nrows is the number of rows and ncols the number of columns in the panel. The default is (1,1).

MAST90044 Thinking and Reasoning with Data Assignment 1

Q.1. The data set unesco.csv, available on the LMS, contains demographic and economic information from the 1990 UNESCO yearbook on about half the world’s countries. Definitions of the variables in the data set are as follows:

• Birth rate per 1,000 of population
• Death rate per 1,000 of population
• Infant deaths per 1,000 of population
• Life expectancy at birth for males
• Life expectancy at birth for females
• Gross National Product (GNP) per capita • Geopolitical group

1 Eastern Europe (former Soviet Satellite) 2 South America and Mexico
3 Western Europe, North America, Japan 4 Middle East

5 Asia

6 Africa • Country

Ignoring geopolitical group:

  1. (a)  Summarise the GNP values using summary statistics and two graphical tools. Briefly describe any obvious features of the distribution.
  2. (b)  Use two graphical tools to compare the observed distribution of infant deaths with a normal distribution. Briefly comment.
  3. (c)  Graphically examine the relationship between the infant death rate and GNP. Calculate the cor- relation coefficient between the two variables. Comment on how useful it is in this situation.
  4. (d)  Graphically examine the relationship between life expectancy at birth for females and the birth rate. Comment on the strength or otherwise of the relationship. Formulate a statistical model to describe the relationship. Graphically fit the model, and use it to roughly estimate one of the parameters in the model (excluding σ).

Taking geopolitical group into account:

  1. (e)  Use two graphical tools to examine the relationship between life expectancy at birth for males and geopolitical group. Use suitable R functions to calculate the mean and standard deviation for each group, and the number of countries in each group. Comment on any obvious differences between the groups and identify any clear outliers.
  2. (f)  Write a statistical model to describe the relationship between life expectancy at birth for males and geopolitical group. Estimate one of the parameters in the model using the results in (e).
  3. (g)  Calculate the net population growth rate per 1000 of population (we will call this “net growth”). Type library(lattice) in R to ensure that the xyplot() function is available. Use xyplot to examine the relationship between net growth and GNP for each geopolitical group separately. Note that in the matrix of plots, group 1 will be placed in the bottom left hand corner, and you proceed across the row of plots. Comment on what the plots show in regard to the relationship, and any limitations of this type of plot here.
  4. (h)  Create a plot of net growth vs GNP for group 2 on its own. Calculate the correlation coefficient, and comment on the strength and direction of the relationship.

2

MAST90044 Thinking and Reasoning with Data Assignment 1

  1. Q.2.  The data in count10.csv [2, 3, 3, . . . , 0] were obtained as counts of the number of items in batches of ten, which had a particular characteristic.
    1. (a)  Describe the data (including appropriate descriptive statistics and plots).
    2. (b)  Show that for any binomial distribution, var(X) 􏱃 E(X).
    3. (c)  A binomial distribution would be appropriate for such data if the items were independent and each was equally likely to have the characteristic. Explain why these data are apparently incompatible with the binomial distribution.
    4. (d)  The following proposals have been put forward to explain the failure of the binomial distribution to describe these data.

      i. The batches are from different sources.
      ii. The proportion with the characteristic changes over time.

      Discuss briefly (a sentence or two at most) each proposal, indicating whether it could result in data like those obtained; and how it might be checked.

  2. Q.3.  The chi-squared distribution, denoted by X ∼ χ2ν, is used a great deal in statistics and science, and we will meet it again later. The exact shape of the distribution depends on the degrees of freedom (ν) and smaller values of ν result in greater skewness, and therefore stronger departure from the normal distribution. Here we will examine how quickly the sampling distribution of the sample mean taken from a X ∼ χ2 distribution converges to normality (or at least to symmetry).
    1. (a)  Take a large sample from the X ∼ χ2 distribution and test its departure from normality using two graphical tools. You will need the R function rchisq. Comment on the result.
    2. (b)  Examine the sampling distribution of the sample mean from samples of size 5, by generating 1000 such samples and looking at a plot of the density (make a comment).
    3. (c)  Compare the sampling distribution of the sample mean for a range of sample sizes (e.g. 1, 5, 10, 20, 40, 80), and use your results to suggest how large the sample size needs to be for adequate convergence. The mean of a X ∼ χ2ν distribution is ν.

3