程序代写代做 graph University of Western Ontario

University of Western Ontario
CS2035 (Jan-May 2020)
Assignment 3
Due Date: 2020-03-20 (Friday) at 11:55pm
Marking Scheme
This assignment will count towards 9% of the total mark of the course. The table below shows the percentage contributed by each exercise of this assignment:
Contribution Exercise 1 3% Exercise 2 3% Exercise 3 3% Total 9%
Submitting your assignment
Submit your assignment on OWL in the form of well commented Matlab script files in the “Assignments” section. The file names must follow this convention:
A3 STUDENTNUMBER EXn.m where STUDENTNUMBER is your 9-digit student number and n represents the exercise number (hence n is either 1, 2 or 3). Functions (if any) can either be in the same script (“all-in-one” style) or in separated scripts (with the script name the same as the function name).
In any case, make sure you submit all the necessary files such that the grader can run your programs. This includes the csv data files (the ones provided with this assignment). Your grade will be based on what the grader can run. A program that does not run (i.e., stops because of any error) will be graded zero. One way to check this is to copy what you plan to submit in a new folder and run your scripts from this new folder (and/or send to a friend and ask her/him to run it).
While you can work with other students, your submission must reflect you own work. Programs that are suspiciously similar will be graded zero.
Page 1 of 4

University of Western Ontario CS2035 (Jan-May 2020)
Exercise 1 – Chicks Growth
You have been hired as a data analysis consultant by a large poultry farm. This farm has recently run an experiment to determine if the protein diet currently given to their chicks can be improved. The experiment consisted in weighting the chicks at birth and every second day thereafter until day 20. A last measurement was also done on day 21. The chicks were randomly divided in four distinct groups and each group received a different diet.
The company sent you the file chicks.csv that contains the results of this experiment. The variable weight is the chick weight in grams, Time is the day of the weighting, Chick is the identification number of the weighted chick and Diet is the diet this chick was fed with. Diet 1 is the current protein diet and diets 2,3,4 are the prospective ones.
The farm sells chicks for consumption when they are either 12 or 21 days old. The heavier the chick, the higher its price, hence the farm seeks to maximize the chick weight at age 12 and 21 days old.
The company would like you to determine if one (or more) of the three prospective diets could potentially replace the existing one in order to increase the chicks weight. Write a short paragraph with your final recommendations to the poultry farm. Write a Matlab script that generates a rigorous statistical analysis, figures (no more than three) and a summary table to support you recommendations. The figure(s) and summary table should convey as clearly as possible your conclusion(s) to the client.
Page 2 of 4

University of Western Ontario CS2035 (Jan-May 2020)
Exercise 2 – Diabetes in Pima Women
You are the data analyst of a scientific team in the US National Institute of Diabetes and Digestive and Kidney Diseases. Your team investigates diabetes risk among Native Americans. The epidemiologists of your group have just finished collecting data on a population of adult women of Pima Indian heritage living near Phoenix, Arizona. In this dataset, the women were tested for diabetes according to World Health Organization criteria and the following six biological variables were also recorded:
• the number of pregnancies (npreg)
• plasma glucose concentration in an oral glucose tolerance test in mg/dL (glu) • diastolic blood pressure in mm Hg (bp)
• triceps skin fold thickness in mm (skin)
• body mass index in kg m−2 (bmi)
• age in years (age)
The file diabetes-pima.csv contains the collected data, and the variable diabetes codes whether a woman is diabetic (Yes) or not (No).
Your clinical collaborators would like to infer from this dataset the probability that a women of Pima Indian heritage and not already present in this data, is diabetic when considering each of the six variables above independently.
a) Provide your collaborators a Matlab function that calculates the probability that a woman of Pima Indian heritage is diabetic given any one of the six biological variables (and independently of the five other).
b) Generate one figure that compares how that probability varies across a relevant range of values for all six biological variables (again, independently from one another).
c) One of your collaborator wants to know if the patients represented in this dataset are homogeneous when we consider the six biological variables. To address her concern visually, perform a classical multi-dimensional scaling map in two dimensions where each data point is coloured according to the patient’s diabetic status. What is your conclusion?
Page 3 of 4

University of Western Ontario CS2035 (Jan-May 2020)
Exercise 3 – Consumer Cars Market
The file cars.csv contains some features of consumer cars produced in the 1970s / early 1980s. We call the “engineering” variables all the variables of this dataset except the first three variables: manufacturer (Mfg), model and year of the model (Model year).
a) Perform a principal component analysis (PCA) on the engineering variables of this dataset and provide a visual representation of the PCA. Justify why we should use, or not, inverse-variance weights for this PCA. Quantify how much this PCA explains the variance of the data on its first two principal components. Do you think this is enough?
b) Using a biplot, interpret how the first two principal components of the PCA attempts to segregate the data.
c) Perform a classical multi-dimensional scaling (MDS) on the engineering variables of this data set.
d) Show by simply creating two figures that the MDS data do not cluster by manufacturer nor by the year of the model.
e) Perform and plot a clustering analysis on the MDS data using the k-means method with three clusters. For each cluster on the plot, annotate 5 data points chosen randomly with their model names.
f) Perform and plot a clustering analysis on the MDS data using two agglomerative methods: “single” and “centroid”. Is the clustering similar between these two methods? Why?
END OF ASSIGNMENT 3
Document saved on 2020-03-06 17:10:17-05:00
Page 4 of 4