Statistical Inference STAT 431
Lecture 3: Summarizing Data – Multiple Variables
Multiple Numerical Variables: Scatter Plots
• Goal: To visualize relationship between a pair of numerical variables
• Example: Pearson’s Father-Son Height Data
●
●● ●●●
●●●● ●
●●● ●
● ● ●
●
● ● ●●●● ● ●● ●● ● ●●●●●●●●●● ● ● ●
● ●●
●
● ●●
●● ●●●●●●●●●●
●
●●● ●
●
● ●●● ●●
●●
●●●●●● ●●●●●
● ●●●●●●●● ● ●●●●● ●
●● ●●
● ● ●●●●●●●● ● ●●● ● ●●●●●●●●●●● ● ●●
●●●
●●●● ● ● ● ● ● ● ● ●● ●
●●●●●●● ● ●●●●●● ●●
● ●
●
● ● ●●●●●●●●●●●●●● ●● ●
●● ●●●●●● ●
● ● ●●●●●● ●●●●● ●
● ●● ●●●● ● ●●●● ●●●
● ●
●●●● ●●●● ●● ● ● ●● ● ●●●●●●● ●
●●●●● ●●●●●●● ● ● ●●●● ●●●●●●●
● ●
●
●●● ●● ● ● ● ●● ●● ●● ● ● ●●● ●●●●●●●●
●
● ● ●●
●● ●●●●●●●● ●●●●● ●●●●● ● ●●●●● ●●●●●●●●●●●●●
●● ● ●● ●●● ●●● ●●●●●● ●●●●●●● ● ● ● ● ●● ● ●●● ●●● ●●● ● ●● ●
● ●●● ●●●●● ●●● ●● ● ● ● ●●●● ●● ● ● ●● ●● ●●●
● ● ●●● ● ● ●●●●●●●●●●●● ●● ●● ● ●● ● ● ●●●●●●●●●●● ●●●● ●
● ●
● ●
● ●●●● ●●●●● ●●● ● ● ●●●●●●●●●●● ●●
●
● ●
●
●
● ●● ● ● ● ●● ● ● ●
● ● ●●●●●●● ●●● ● ● ● ●●●●● ●● ●● ●●● ●
● ●●●●● ● ● ●
● ● ●●●●●● ●●●●●●●●● ●●● ● ●●●
●●● ●●●●●●●●● ●●●
● ● ●●●● ●● ●●● ●● ●●●● ●●●●●●●●
●
●
● ●●●●●● ●
● ●●●●●●●●● ●●●●●●●
●
● ● ●●●●●● ●
● ●● ●●●●●●●●●● ●●●●●●● ●
●●●●●●●
● ● ●●●●● ●●●●● ●●●
●
●
● ●●●●●● ●● ●●
● ●●●●●●●●● ●
●●●●●● ● ●●● ● ● ●
●●●●●●●
● ●●●●
●●●●●
●
●
●●● ● ●●●●●●●● ●●●●●
●●● ● ●●●●● ●●●●●●●●●
●
● ●●●●● ● ● ●●●●●
●
Father’s height (inch)
65.05 63.25 64.95 65.75
Son’s height (inch)
59.78 63.21 63.34 62.79
…… 71.33 68.27 71.78 69.31 70.74 69.30 70.31 67.02
60 65 70 75 Father’s height
STAT 431
2
Son’s height
60 65 70 75
Example: House Prices
• 439 House Prices of 2003 in ZIP 30062
• Two numerical variables of interest: Building Area (in SQFT), Price (in $1000)
●●
●
● ●●●
● ●●● ●●●
●●
●
●
● ●●●●● ●
●
●●● ●●
●
● ●●
●
● ● ●●●●
●●● ● ●●●●● ●
● ●●●●●● ●
●● ●● ● ●
●
●
●
●●●●● ●●●● ● ●●●●
●● ●● ● ● ●●●● ●●
● ●● ●● ● ●● ● ●●●●●●●
●●●●●● ● ●●● ●●
●●●●● ● ● ●●●●●●●●●●●●
●
●●●●●●●●●●● ● ● ●●●●●●●●●●
● ●●●●●●●●●●● ●●●● ● ● ● ●
● ● ●●●●● ● ● ● ●● ●● ● ● ● ●
● ● ●●●●●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●
● ●●●● ●● ●●
●● ●●●●●●●●● ●
●●●●● ●● ● ● ● ● ● ● ● ●● ● ●
●●●● ● ● ● ● ● ●●
●● ●● ●● ●●●●
● ●●● ●
● ● ●
● ● ●● ●●● ●
● ●
●● ● ●● ● ●● ●
●● ● ● ●
●
• Positive association between building area and price
• Points with red circles seem to be far away from the majority of data
STAT 431
3
1000
2000 3000 Building Area (SQFT)
4000
Price ($1000)
100 200 300 400 500 600
Association between Numerical Variables
• Positively associated if increased value of one variable tend to occur with increased value of the other
• Negatively associated if increased value of one variable occur with decreased values of the other
• Pearson’s Data: son’s height positively associated with father’s height
• House Price Data: house price positively associated with building area
• Caution: association is NOT proof of causation
E.g., Reading ability of teenagers is positively associated with shoe size
• Sometimes, associations in datasets are not just positive or negative, but also appear to be linear
STAT 431 4
Chocolate Consumption, Cognitive Function, and Nobel Laureates
Franz H. Messerli, M.D.
N Engl J Med 2012; 367:1562-1564 October 18, 2012
STAT 431 5
Linear Association between Two Numerical Variables
• Data: paired observations of two numerical variables
(x1,y1),(x2,y2),…,(xn,yn)
• Sample means and sample SDs: x ̄, sx, y ̄, sy
Two summary statistics of linear relationship:
• The sample covariance between x and y is
1 Xn
sxy = n 1 (xi x ̄)(yi y ̄)
i=1
• The sample correlation (Pearson’s correlation) between xand y is
r= sxy sx · sy
STAT 431 6
• Properties
– Has no unit
Sample Correlation
r= sxy sx · sy
– Always satisfies 1 r 1
• r = 1 : the relationship between x and y is exactly positive linear
• r = 1: the relationship between x and y is exactly negative linear
• r = 0 : no linear relationship (what if the yi0 s are all the same?)
– Symmetric
– Invariant under linear transformation of x and y
STAT 431 7
Pearson’s Data
Housing Data
r = 0.82
r = 0.50
●
●
● ●
●
● ● ●● ● ● ● ● ●● ● ● ●
● ●●●● ●●●●● ●●●
●
● ●●●● ● ● ● ●●●●●●●●●● ●
● ●
● ● ● ● ●●●●●●●●●●●●●● ● ● ●●●●● ●●●●● ●
● ●
●
●
●●●
● ●●● ●
●●● ●
● ● ●
●●●●●●●●●● ● ● ● ● ● ● ● ●● ●● ●●●●● ●
●
●
●
● ● ●
● ●●●● ●● ●● ● ●●● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ●
● ●
●
● ●●
●●●●●●●●● ●●●●●●●●
● ● ●● ●●●●●● ●● ●●●
●
●●● ●
●
● ●●● ●●
●●
●●●●●● ●●●●●
● ●●●●●●●● ● ●●●●● ●
● ●
● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●●●●●●● ●
●●●●●● ● ● ●●●● ●●●●● ● ● ●●●●● ●● ● ●●● ● ● ●●●●●●●● ● ●●● ● ●●●●●●●●●●● ● ●●
● ●
●
●●
●●● ●● ● ●●●●●●●●●●●●●● ●● ●
● ● ● ●●●●●●● ●●● ● ● ●● ● ●●●●●●●● ●●
●● ● ● ●● ●
● ●●●●●●●●●●●●●●●●●● ●
● ● ●●● ● ● ●●●●●●●●●●●● ●●
● ●● ● ● ●●●●●●●●●●● ●●●● ●
●
● ● ●● ●●● ● ●● ●●●●●● ●●
● ●● ●●●●●●●● ●●●
● ● ● ●
●●●●●● ●●●●●●● ●●● ● ●●●
●● ● ●●●●● ● ●●●●●●●●
●● ●● ●● ●● ● ●●●●●● ●
● ●●●●●●●●● ●● ●●● ●●●
●● ● ●
● ●●●●●●● ●
●●●●●● ● ●●● ● ● ●
●●●●●●●
● ●●●●
●●●●● ●●
●
●
●●●●
●●● ● ●●●●●●●● ●●●●●
●●● ● ●● ●●●●● ● ●
●●●●● ●●●●●● ● ●
●●● ●● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●
●
● ● ● ●●●●●● ●●●●●● ●●●●●●● ● ● ● ● ●● ● ●●● ●●● ●●● ● ●● ●
● ●●● ●●●●● ●●● ●● ● ● ● ●●●●● ●● ● ● ●● ●● ● ●●●
●
●●
● ● ●●
●
●
●●●●● ●●●● ● ●●●●
● ●●●●● ●
●
● ● ●●●
●●●●● ● ● ● ● ● ● ● ●● ● ● ●
●●●●●●● ● ● ● ●●
●● ●● ●● ●●●●
● ●●● ●
● ● ●
● ● ●● ●●● ●
● ●
●● ● ●● ● ●● ●
●● ● ● ●
●
●●●●●● ● ●●● ●
●●●●● ●● ● ●●●●●●●●●●●●
●
●●●●●●●●●●● ● ● ●●●●●●●●●●
● ●●●●●●●●●●● ●●●● ● ● ● ●
● ● ●●●●● ● ● ● ●● ●● ● ● ● ●
● ● ●●●●●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●
● ●●●● ●● ●●
●● ●●●●●●●●● ●
●
●●● ●●
●
● ●●
●● ●● ● ●
●
●● ●● ● ● ●●●● ●●
● ●● ●● ● ●● ● ●●●●●●●
●●
●●● ● ●●●●● ●
● ●●●●●● ●
●●
●
● ●●●
● ●●● ●●●
●●
●
●
60 65
70
75 1000
2000 3000 Building Area (SQFT)
4000
Father’s height
STAT 431
8
Son’s height
60 65 70 75
Price ($1000)
100 200 300 400 500 600
Guessing Sample Correlations (1)
●
● ● ● ●●●● ● ●●●
● ●
●
●● ●
●● ●●●●●●●● ●● ● ●● ● ●● ● ● ● ●●●●
●
●
●
● ●●●
●● ●●●
●
●●●
● ●●● ●●●● ●●●●
● ●●●● ● ●
●● ●●● ● ●
●● ●●●●●
● ●●●●●●●●●●●●●● ● ●●● ● ●●● ●● ●● ● ●
● ● ●●●● ●●●●●● ●
● ● ● ● ● ●●●
● ● ● ●● ● ● ● ● ● ● ● ● ●
● ●
● ●
● ●● ● ● ● ●● ●●●
●
●● ●●●●●●●●●●● ●●●●●● ●●
● ●● ●
●● ●
●●●
● ● ●● ● ●
●●●●● ● ●● ●●●●●●
●● ●●●● ● ● ● ●
●● ●●●● ●●●●●● ●● ● ● ●●●●●●●
●
● ● ●●●● ● ●● ● ●● ●●●●● ● ●
●●
● ●●●●●●●●●●
●● ●●●● ●●
●● ●
●
●
●
●●
●●
●●
●
● ●
●● ●
●
●
●●
●
● ●
● ●●● ●● ● ● ●● ● ● ●● ●●
●●
●● ●●
●● ●●●
●
●
●
●● ●●●●●
●● ●●●●
●●
● ●●●●●
●●●●●● ● ●●●●●● ●
●●●● ●●●●●● ●●●●●●● ● ●●●
●● ●●●●●●●●
● ● ● ●●●●● ●●●●●● ●● ● ● ●● ●●●●●●●● ●●
●●●●●
●●● ●●●●●●●●●●
●●
● ●●
●● ● ●●●●●●
●● ●●●●●●● ● ●●●
● ● ●
● ● ●
● ● ● ● ● ● ● ●●
● ●● ●●●●●●●●●●● ●
●
● ●● ● ● ●● ●
● ●● ●●●● ● ● ●●●
● ●●●●●●●●●●●● ● ● ●
●●●●●●●●
●● ●● ● ● ● ●● ●
● ● ●● ●● ●● ● ● ● ● ● ● ●●●● ●●●
● ●● ●●●●●●●●●●
●● ● ●●
●●●● ●● ●●
●●●● ●●
● ●● ●●●
● ●● ● ●
●
● ● ●● ● ● ● ●●● ●●● ●●
●
●
−2 −1 0 1 2 −2 −1 0 1 2 xx
●
● ●
●
●●
●
●● ●
● ●●
● ● ● ● ● ●
●
●●● ●●● ●● ●●● ●● ●● ●● ●●●● ●●●●
● ● ● ● ● ●● ●● ●● ● ● ●
●●
●
●●● ●
● ● ●●●●●●● ● ●
●● ●● ● ● ● ● ●● ●●● ●●●●
●● ●
●●●● ● ●●●●
● ●●●● ●● ●●●●● ●● ●● ●● ● ● ●●● ●●
●
● ●● ●● ● ●●●●
● ●
●●
● ● ● ●● ● ● ●
● ● ●
● ●●●● ●●●
● ●●●
● ● ● ●● ●●●
● ●●●●● ●●● ●● ● ● ● ● ● ● ● ● ●● ● ● ●
●●●●
●●
● ● ●●●●●● ●
●
● ●●●●●●
● ●● ●● ● ●●
● ●●●●●● ●●●●
● ● ● ●● ● ● ●●
● ● ●● ●●●●●●●●●●●● ●●
●●●● ●
● ●● ●●● ●
● ● ● ● ●●
●● ●●●●●●
●● ●
●
● ●●●●
●●● ●●● ●● ●
●
●
● ●●
●
● ●●●
● ●●●● ● ●●●●
●● ●●●● ●
●●● ●●●●●●
● ●●●●● ●●●● ●
● ● ●● ● ●●● ● ●
● ●●●●●●●●●● ●
●
●
●
●
● ●●●●●● ●● ●● ●●●
●● ● ● ● ●● ● ● ●● ●● ● ●● ●● ●
● ● ●● ●●
●● ●●●●●●●
● ● ●● ● ●● ● ● ● ● ●●●●●●●
●
●●●●●●●●● ●●●●● ● ● ●●●●● ●
●●●●●●●●●●●●●●●●●●●● ●● ● ●● ● ●●● ●●●●●●
● ●● ●● ●● ●●●●
●
●
● ●
● ●● ●●●●● ●
● ●●●●●● ●● ●● ● ● ● ● ● ●● ●●● ●
● ●● ●●●●●● ●● ● ●●●● ●● ●●●●●
●●
● ●● ●●● ●
●●●●●●● ●●●●●●
● ●●● ●● ● ●●●●
●● ●●● ●●●●●
● ●
●● ●● ● ●
●● ●●
●● ●● ●
●●●
●
● ●
●●
●
−2 −1 0 1 2 x
−2 −1 0 1 2 x
STAT 431
9
yy
−3 −2 −1 0 1 2 3 −2 −1 0 1 2 3
yy
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
Guessing Sample Correlations (2)
●
●
●
●
● ●
●
●
●
●
●
● ●
● ●
●●● ●● ●●
0 5 10 15 20 V1
0 5 10 15 20 V3
● ●
● ●
● ●
● ●
●
● ●
● ●
● ●
● ●
● ● ●
●
0 5 10 15 20 V5
0 5 10 15 20 V7
STAT 431
10
V6
0 5 10 15 20
V2
0 5 10 15 20
V8
0 5 10 15 20
V4
0 5 10 15 20
Guessing Sample Correlations (3)
● ●●
● ●●●
●● ●●
●●
●●
● ● ● ●●●●●
●●●● ● ● ● ●● ● ●
●
●●● ●●●●● ●● ● ●
●● ●● ●●● ●
●●●●● ● ●●●● ● ●●
●●● ●●● ●●●
●●●● ●●● ● ● ●●●
●
●● ●●● ●●●● ● ●●● ●
● ● ● ● ●●●●●
●● ●●
● ●
●● ●
●● ●●
● ●●
●
●
●
●●● ●
●● ●● ●●
●●●
● ● ● ●● ● ●
●● ●●● ●●
●●● ●● ●●● ●
●●● ●●●● ● ●●
●● ● ●●●● ●●●
●●●●● ● ●●●● ●●● ● ●●●●
●●● ●●● ● ●● ●●●●●
●
●● ●●
● ● ●●● ● ● ●●●
● ●●●●
●●● ●
●●●● ●●● ●●●
● ●●
●
●●● ●●●
●
● ●
● ●●
●
●●● ●●
● ●●● ●●
●● ●● ●●● ●
● ● ● ●●●
●● ●●
● ● ●●
● ●
●
●
● ●●
●
●●●
● ● ● ●●
●●●●●
● ●
●● ●●
●●
● ●●●●●
●● ●
●● ●●●●
● ● ● ●● ● ● ●● ●●●
●● ●●●
● ●
●●● ●●●●●
● ●●●●●
● ●●
●
●
●
●
●
−4 −2 0 2 4 xxx
−4 −2 0 2 4
−4 −2 0 2 4
STAT 431
11
y −4 −2 0
2 4
y −4 −2 0
2 4
y −4 −2 0
2 4
Use Sample Correlation with Caution
• A good summary statistic only if relationship is (roughly) linear
• Cannot be used to measure strength of nonlinear relationships
• Simpson’s paradox
• It is always a good idea to plot the data first! – Linearity
– Groups
– Potential outliers (robustness)
STAT 431 12
Put A Line on A Scatter Plot
• When the relationship is (roughly) linear, we can put a line on a scatter plot to indicate it.
• Equation for a line:
y = 0 + 1x
Intercept
Regression Coefficients
Slope
• Q: How to find the “best” line? Xn A: Find ( 0, 1) that minimizes the objective function
i=1
[yi ( 0 + 1xi)]2
[the least square (LS) method]
• Solution ˆ =r·sy, ˆ =y ̄ ˆx ̄
1 sx 0 1
• More details later in the course: simple linear regression
STAT 431
13
Examples
●
●
● ●
●
● ● ●● ● ● ● ● ●● ● ● ●
● ●●●● ●●●●● ●●●
●
● ●●●● ● ● ● ●●●●●●●●●● ●
● ●
● ● ● ● ●●●●●●●●●●●●●● ● ● ●●●●● ●●●●● ●
● ●
●
●
●●●
● ●●● ●
●●● ●
●
● ● ●
● ●●●● ●● ●● ● ●●● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ●
● ●
●
●
● ● ●● ●●● ● ●● ●●●●●● ●●
●
●●● ●
●
● ●●● ●●
●●
●●●●●● ●●●●●
● ●●●●●●●● ● ●●●●● ●
● ●
● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●●●●●●● ●
● ●●● ●●●●● ●●● ●● ● ● ● ●●●●● ●● ● ● ●● ●● ● ●●●
● ● ●●● ● ● ●●●●●●●●●●●● ●●
● ●● ● ● ●●●●●●●●●●● ●●●● ●
● ●● ●●●●●●●● ●●●
● ● ● ●
●●●●●● ●●●●●●● ●●● ● ●●●
●● ● ●●●●● ● ●●●●●●●●
●● ●● ●● ●● ● ●●●●●● ●
● ●●●●●●●●● ●● ●●● ●●●
●● ● ●
● ●●●●●●● ●
●●●●●● ● ●●● ● ● ●
●●●●●●●
● ●●●●
●●●●● ●●
●
●
●●●●
●●● ● ●●●●●●●● ●●●●●
●
● ● ●
●●●●●●●●●● ● ● ● ● ● ● ● ●● ●● ●●●●● ●
●
●
● ●●
●●●●●●●●● ●●●●●●●●
● ● ●● ●●●●●● ●● ●●●
●●●●●● ● ● ●●●● ●●●●● ● ● ●●●●● ●● ● ●●● ● ● ●●●●●●●● ● ●●● ● ●●●●●●●●●●● ● ●●
● ●
●
●●
●●● ●● ● ●●●●●●●●●●●●●● ●● ●
● ● ● ●●●●●●● ●●● ● ● ●● ● ●●●●●●●● ●●
●● ● ● ●● ●
● ●●●●●●●●●●●●●●●●●● ●
●●● ● ●● ●●●●● ● ●
●●●●● ●●●●●● ● ●
●●● ●● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●
●
● ● ● ●●●●●● ●●●●●● ●●●●●●● ● ● ● ● ●● ● ●●● ●●● ●●● ● ●● ●
●●
● ● ●●
●
●
●●●●● ●●●● ● ●●●●
● ●●●●● ●
●
● ● ●●●
●●●●● ● ● ● ● ● ● ● ●● ● ● ●
●●●●●●● ● ● ● ●●
●● ●● ●● ●●●●
● ●●● ●
● ● ●
● ● ●● ●●● ●
● ●
●● ● ●● ● ●● ●
●● ● ● ●
●
●●●●●● ● ●●● ●
●●●●● ●● ● ●●●●●●●●●●●●
●
●●●●●●●●●●● ● ● ●●●●●●●●●●
● ●●●●●●●●●●● ●●●● ● ● ● ●
● ● ●●●●● ● ● ● ●● ●● ● ● ● ●
● ● ●●●●●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●
● ●●●● ●● ●●
●● ●●●●●●●●● ●
●
●●● ●●
●
● ●●
●● ●● ● ●
●
●● ●● ● ● ●●●● ●●
● ●● ●● ● ●● ● ●●●●●●●
●●
●●● ● ●●●●● ●
● ●●●●●● ●
●●
●
● ●●●
● ●●● ●●●
●●
●
●
60 65
70 75 1000
2000 3000 4000 Building Area (SQFT)
Father’s height
STAT 431
14
Son’s height
60 65 70 75
Price ($1000)
100 200 300 400 500 600
• Which line is a better fit, 1) solid or 2) dashed?
Examples
Son’s height
60 65 70 75
STAT 431
15
60 65 70 75
Father’s height
Multiple Categorical Variables: Contingency Tables
• Relationships between several categorical variables could be examined with a contingency table
• Construction: display the frequency for each possible combination of categories
• Example: Berkeley graduate admission data (1973)
– Three variables: (1) gender, (2) admission status, (3) major applied
– Only look at Gender vs. Admission status: • Bias against women?
Men Women
(Reconstructed from Table 4.11, p.132 of textbook)
% Admitted
44 30
Admitted Denied
1197 1494 557 1278
STAT 431
16
An Interesting Phenomenon: Simpson’s Paradox • Stratified by the third variable: major applied
Men
Women
Admitted Denied
Admitted Denied
511 314 353 207 120 205 138 279
53 138 22 351
89 19
17 8 202 391 131 244
94 299 24 317
1197 1494
557 1278
Major
% Admitted
Men Women
A 62 82
B 63 68
C 37 34
D 33 35
E 28 24
F67 Total 44 30
• Simpson’s paradox: direction of association reversed after marginalization
• It is important to stratify!
STAT 431 17
• •
Another Example of Simpson’s Paradox
Two treatments for kidney stones were compared
Researchers reviewed hospital records and computed the success rate for each treatment
Success
Failure
273
77
289
61
•
Treatment A Treatment B
Treatment B looks better, right?
% Success
78
83
STAT 431
18
Another Example of Simpson’s Paradox
• Next, researchers looked separately at patients with small stones, and patients with large stones…
Treatment A
Treatment B
Success Failure
Success Failure
81 6 192 71
234 36 55 25
273 77
289 61
Stone Size
Small Large Total
Treatment A
93 73 78
Treatment B
87 69 83
% Success
C. R. Charig, D. R. Webb, S. R. Payne, O. E. Wickham (March 1986) Br Med J (Clin Res Ed) 292 (6524):879-882
STAT 431 19
• Key points of this class: – Scatter plot
Class Summary
– Sample correlation / sample covariance
– The least square method for putting a line on a scatter plot
– Contingency table & Simpson’s paradox
• Topics in R:
– Data input and manipulation
– R routines for sample mean / SD / median / quantile / IQR / histogram / box plot / normal plot / z-scores / correlation / covariance / scatter plot
• Reading: Section 4.4 of the textbook
• Next class: Basic Concepts of Inference (I) (Ch.6.1-6.2)
STAT 431 20