STAT 4110/7110 Final Take-home Exam Spring 2018
Lee Due: Friday, May 4, 2018, 11:59 p.m. (CST)
Name ____________________________ Student ID _______________________
Instructions:
I. The final exam is open book/note.
50 points
- Variable names are written in Courier New font in the problems.
- The level of significance of testing procedure is 0.05.
- Answer the following questions based on your output and comment if necessary. You do notneed to print out and submit the output.
- When you finished programming, save both SAS (first_liast.sas) and R program (first_last.R)separately and submit the program file in the Exams link in Canvas.
- To get a full credit you should submit both programs and the written exam.
- Good luck!
1. (SAS: 20 points total) The following is the list of the variables in the dataset prdsal2 in the
SASHELP library.
Alphabetic List of Variables and Attributes
# Variable
4 ACTUAL 1 COUNTRY 3 COUNTY
- 10 MONTH
- 11 MONYR
- 5 PREDICT
- 6 PRODTYPE
- 7 PRODUCT
9 QUARTER 2 STATE
8 YEAR
Type Len
Num 8 Char 10 Char 20 Num 8 Num 8 Num 8 Char 10 Char 10 Num 8 Char 22 Num 8
Format Informat
DOLLAR12.2 $CHAR10. $CHAR20. MONNAME3. MONYY. MONYY. DOLLAR12.2
$CHAR10. $CHAR10. 8. $CHAR22. 4.
Label
Actual Sales Country County Month Month/Year Predicted Sales Product Type Product
Quarter State/Province Year
- (1) (2 pts.) Determine the mean and the standard deviation of the actual sales amount (=ACTUAL) in each country.
- (2) (5 pts.) Create a new dataset to include only the actual sales amount greater than $700 and then compute the mean the standard deviation of the actual sales amount of Canada.
Mean =
Standard deviation =
Country |
Mean |
Standard deviation |
U.S.A. |
||
Mexico |
||
Canada |
Page 1 of 4
(3) (3 pts.) Test if there is any interaction between country (=COUNTRY) and product type (=PRODTYPE) in actual sales amount (=ACTUAL) in the dataset prdsal2.
Hypotheses H0:________________________ vs. HA:________________________ P-value =
(Circle one): Do not reject H0 Reject H0
(4) (5 pts.) Test if there are any differences among the countries in actual sales amount. And then determine which countries are significantly different with others if there is any in the dataset prdsal2.
Hypotheses H0:________________________ vs. HA:________________________ P-value =
(Circle one): Do not reject H0 (Circle one):
U.S.A. vs. Mexico: U.S.A. vs. Canada: Mexico vs. Canada:
Reject H0
Different Different Different
Not different Not different Not different
(5) (5 points.) Fit a regression model to predicted sales amount (=PREDICT) using the actual sales amount (=ACTUAL). Discuss the normality assumption of the model in the dataset prdsal2.
Page 2 of 4
2. (Both SAS & R: 5 points each.) Follow the steps below to complete the problem. Create a program using both SAS and R.
- (1) Write a program that would generate s=100 samples of size m=10 from a Binomial random variable having number of trials equal to n=15 and probability of success p=0.4.
- (2) Create a new variable that contains the average of each of the 100 samples.
- (3) Plot it in a histogram.
3. (R: 20 points.) The alaska.pipeline in UsingR package data frame with 107 observations on the following 3 variables.
- field.defect: Depth of defect as measured in field
- lab.defect: Depth of defect as measured in lab
- batch: One of 6 batches
A
measurements of the depths of defects in the Alaska pipeline. The depth of the defects were then
(1) (5 pts.) Test if there is any difference among different batches (=batch) in the depth of defect as measured in field (=field.defect). Assume the equal variances of the depth of defect in field among the batches.
Hypotheses H0:________________________ vs. HA:________________________ P-value =
(Circle one): Do not reject H0 Reject H0
(2) (5 pts.) Create a boxplot of the depth of defect in field (=field.defect)by different batches (=batch). Add the main title “Comparison of Depth of defect in field by Batch” on the top of the plot and label for x- and y-axis, “Batch”, and “Depth of defect in field” , respectively.
Page 3 of 4
consists of in-field ultrasonic
re-measured in the laboratory. These measurements were performed in six different batches.
(3) (5 pts.) Fit a linear model for the lab-defect size (=lab.defect ) as modeled by the field- fefect size (=field.defect) and discuss the appropriateness of the model.
Fitted model: _________________________________________________
(4) (5 pts.) A log-transformation of each variable from Problem (2) seems to provide better linear model for the data. Fit the model log(lab.defect) = β0 + β1 log(field.defect) + ε and determine whether there is any violations of the assumptions.
Fitted model: _________________________________________________
Page 4 of 4