程序代写代做代考 In [ ]:

In [ ]:
library(MASS)
library(testthat)
library(tidyverse)
library(gclus)
library(GGally)

# Plot size deppening on your screen resolution to 4 x 3
options(repr.plot.width=4, repr.plot.height=3)

Assignment 1¶

For this assignment, you have to analyse two different data sets. For each plot, add appropriate titles and labels, do not leave the default one. Your submission should run without errors on jupyterhub when you execute: Kernel -> Restart & Run All. For each data-set and task, assign the resulting plot or data frame to the specified variable. e.g. if it says p_3 <- ... this means that the plot has to be assigned to variable p_3. Dataset01 (3.5 points)¶ The dataset birthwt is in the package MASS, and will be available when you load the package. Have a quick look at the meaning of the variables in the data set birthwt.We will use smoke and bwt. 1. First create a new facotr variable called smoke and suplly labels for the levels of the variable smoke (Note that 0 means nonsmoker, 1 means smoker.) 2. Create a new data frame called ds_a.df with the factor smoke and the variable bwt (Hint the name should just be bwt) 3. Create a boxplot for the birthweights of the two groups: smokers and non-smokers. 4. Also provide a bar chart showing the number of smokers and non-smokers For both graphs, include a heading and appropriate axis labels. From the Help file:
Risk Factors Associated with Low Infant Birth Weight Description The birthwt data frame has 189 rows and 10 columns. The data were collected at Baystate Medical Center, Springfield, Mass during 1986. In [ ]: ## Dataset01 - Task 1 code here # your code here fail() # No Answer - remove if you provide an answer In [ ]: # this is the first public test to check that you assigned the correct variable, the rest is hidden test_that("Creating a factor variable", { expect_is(smoke,"factor") }) In [ ]: ## Dataset01 - Task 2 code here # your code here fail() # No Answer - remove if you provide an answer In [ ]: # This is the first public test to check that you assigned the correct variable, the rest is hidden test_that("Creating a data frame", { expect_is(ds_a.df,"data.frame") }) In [ ]: ## Dataset01 - Task 3 code here # p_3 <- ..... # p_3 # your code here fail() # No Answer - remove if you provide an answer In [ ]: # This is the first public test to check that you assigned the correct variable, the rest is hidden test_that("Boxplot", { expect_is(p_3,"ggplot") }) In [ ]: In [ ]: In [ ]: In [ ]: In [ ]: ## Dataset01 - Task 4 code here # p_4 <- ..... # p_4 # your code here fail() # No Answer - remove if you provide an answer In [ ]: # This is the first public test to check that you assigned the correct variable, the rest is hidden test_that("Bar plot", { expect_is(p_4,"ggplot") }) In [ ]: In [ ]: In [ ]: In [ ]: Dataset02 (6.5 points)¶ Consider the data set bank in package gclus. In order to find information about this data set, look up the helpfile ?bank You will see that it has a variable Status, and six dimension variables. The dataset must be loaded: with the command data(bank) 1. Load the data and make the column Status into a factor with 0 = Genuine and 1 = Counterfeit. (Note: modify the data frame) 2. Create a boxplot for each dimension (Length, Left,Right,Bottom, Top and Diagonal). Based on the boxplots, choose two variables that you think likely to give the clearest differentiation between forged and genuine notes and put the graphs object named p_var1 and p_var2 in the designated cell. 3. Create a scatter plot of the choosen dimension and colour them. Use the function geom_segment to add a seperating line between the coloured dots. Note: This is a new function, the goal of this excercise is that you make yourself familiar with how to use the help function. 4. Using a grammar of graphics command, create a scatterplot matrix in which the points representing the forgeries and the genuine notes have different colours. Also had a title to your plot.
 5. Using a grammar of graphics command, create a scatterplot matrix for the combined sample of 200 notes, including the overall distribution for each variable provided along the diagonal, and correlations in the upper panel. Also had a title to your plot.
 6. Use ggcorrplot to create an appropriate correlation plot for the combined sample. Also had a title to your plot. Don't forgett to include the library installation and loading comands.
 From the Help file:
Swiss bank notes data Description Data from "Multivariate Statistics A practical approach", by Bernhard Flury and Hans Riedwyl, Chapman and Hall, 1988, Tables 1.1 and 1.2 pp. 5-8. Six measurements made on 100 genuine Swiss banknotes and 100 counterfeit ones. In [ ]: ## Dataset02 - Task 1 code here # your code here fail() # No Answer - remove if you provide an answer In [ ]: # this is the first public test to check that you assigned the correct variable, the rest is hidden test_that("Creating a factor variable within a existing data frame", { expect_is(bank,"data.frame") }) In [ ]: ## Dataset02 - Task 2 goes here # p_var1 <- .. # p_var2 <- .. # your code here fail() # No Answer - remove if you provide an answer In [ ]: # This is the first public test to check that you assigned the correct variable, the rest is hidden test_that("Boxplot", { expect_is(p_var1,"ggplot") expect_is(p_var2,"ggplot") }) In [ ]: ## Dataset02 - Task 3 code here # p_3 <- ..... # p_3 # your code here fail() # No Answer - remove if you provide an answer In [ ]: In [ ]: In [ ]: ## Dataset02 - Task 4 code here # mat <- ..... # mat # your code here fail() # No Answer - remove if you provide an answer In [ ]: # This is the first public test to check that you assigned the correct variable, the rest is hidden test_that("Plot", { expect_is(mat,"ggplot") }) In [ ]: In [ ]: ## Dataset02 - Task 5 code here # mat_all <- ..... # mat_all # your code here fail() # No Answer - remove if you provide an answer In [ ]: # This is the first public test to check that you assigned the correct variable, the rest is hidden test_that("Plot", { expect_is(mat_all,"ggplot") }) In [ ]: In [ ]: ## Dataset02 - Task 6 code here # corr <- ... (The correlation matrix) # p_cor <- ..... (The plot) # your code here fail() # No Answer - remove if you provide an answer In [ ]: # This is the first public test to check that you assigned the correct variable, the rest is hidden test_that("Scater Plot", { expect_is(p_cor,"ggplot") }) In [ ]: In [ ]: In [ ]: