CS570 Biomedical Science & Health IT
CS544 D1
Foundations of Analytics
Lecture 6
Guanglan Zhang
1
1
Bivariate Data – Un-summarized Data
If the data for the two categorical variables is given in the unsummarized form (the actual values), the contingency table can be created.
2
2
Bivariate Data – Summarized Data
Sometimes data comparing two variables is available in summarized form
The distribution of each variable separately is the marginal distribution of that variable.
In a two-way table, adding the rows or the columns gives the marginal distribution of the corresponding variable.
Graphical Summarization of Two-way Tables
The mosaic plot is a graphical display showing the relationship among two or more categorical variables.
The bar plot can also be used for the graphical presentation of the two-way data.
3
3
Bivariate Data – Relationships in Numeric Data
Graphical Representation
A scatterplot is used to visualize the relationship between two numerical variables.
Use plot() function to draw the scatterplot
> plot(data$explanatoryvariable, data$responsevariable)
Use main, xlab, and ylab to label the picture appropriately
Use xlim and ylim to control x and y axises
Change the type of point using pch
and/or the color of the point using col
4
4
Multivariate Data
Three-way or (n-way) contingency tables show the relationships among three (or n variables using multiple two-way tables.
Graphical Summarization
The box plot can be used to show graphical representation of independent samples.
The scatterplot matrix shows the pair-wise relationships between the given variables using a scatter plot for each pair.
The bar plot and the mosaic plot can be used to show graphical representation of summarized data.
5
5
Handling null values
R supports: NULL, NA, NaN, Inf/-Inf
NULL – It is a reserved word. It is returned when an expression or function results in an undefined value
NA – a logical constant of length 1 indicating a missing value
NaN – stands for Not A Number.
Inf / -Inf – stands for infinity or negative infinity. It is a result of storing a large number or a product of division by zero.
6
6
In-class quizzes
Go to https://b.socrative.com
Enter classroom: ZHANG6334
7
7
/docProps/thumbnail.jpeg