Review of Probability and Statistics
Zhenhao Gong University of Connecticut
Welcome 2
This course is designed to be:
1. Introductory
2. Focusing on the core techniques with the widest applicability
3. Less math, useful, and fun! Most important:
Feel free to ask any questions!
Enjoy!
Goal 3
Reviews the core ideas of the theory of probability and statistics that are needed for regression analysis and forecasting
The probability framework for statistical inference Moments, Covariance, and Correlation
Sampling Distributions and Estimation
Hypothesis Testing and Confident interval
Review of Probability
Randomness 5
Most aspects of the world around us have an element of randomness:
the gender of the next new person you meet
the number of times your computer will crash while you are
writing a term paper
the change of the stock market price
The theory of probability provides mathematical tools for quantifying and describing this randomness.
Basic Concepts 6
Outcomes:
The mutually exclusive potential results of a random process.
Probability of an outcomes:
the proportion of the time that the outcome occurs in the long run.
Population (Sample space):
the group or collection of all possible entities of interest.
Random Variable 7
Random variable Y:
Mapping from sample space to the real numbers.
(e.g. the number of heads observed in flipping of a coin twice.)
Probability distribution function:
Assigns a probability pi to each value of yi such that ipi =1.
Probability density function (p.d.f.):
A non-negative continuous function such that the area under f(y) between any points a and b is the probability that y assumes a value between a and b.
Discrete random variable 8
Cumulative probability distribution (c.d.f.) of Y:
the probability that the random variable is less than or equal to a particular value.
Continuous random variable: normal 9 The normal probability density function of Y with mean μY
and variance σY2 :
A bell-shaped curve, centered at μY .
The area under the normal p.d.f. between μY − 1.96σY and μY + 1.96σY is 0.95.
The normal distribution is denoted N (μY , σY2 ).
Standard normal 10
The standard normal distribution is the normal distribution with mean μ = 0 and variance σ2 = 1 and is denoted N(0, 1).
Random variables that have a N(0, 1) distribution are often denoted Z, and its corresponding c.d.f. is denoted by the Greek letter Φ; P (Z ≤ c) = Φ(c).
Suppose Y is distributed N (μY , σY2 ). Then Y is standardized by subtracting its mean and dividing by its standard deviation, that is, by computing Z = (Y − μY )/σY .
Moments 11
Summaries of various aspects of distribution of Y: mean = expected value (expectation) of Y
= the first moment of Y =E(Y)=piyi =μY
i
variance = the second moment of Y
= measure of the spread of the distribution =E(Y −μY)2 =σY2
√
standard deviation =
variance = σY (same units)
Skewness and kurtosis 12
Skewness: S = E(Y − μY )3/σY3
measure of asymmetry of a distribution
skewness = 0 : distribution is symmetric
skewness > (<)0 : distribution has long right (left) tail
Kurtosis: S = E(Y − μY )4/σY4
measure of probability of large values kurtosis = 3 : normal distribution
kurtosis > 3 : heavy tails
Joint distribution and covariance 14
Random variables X and Y have a joint distribution (at least two random variables).
The covariance between X and Y :
cov(X,Y)=E[(X−μX)(Y −μY)]=σXY.
measure of the extent to which two random variables X and Y move together.
cov(X, Y ) > (<)0 means a positive (negative) relation between X and Y .
depends on units of measurement (e.g., dollars, cents).
Correlation 15
Frequently we convert the covariance to a correlation by standardizing by the product of σY and σX,
cov(X,Y) σXY corr(X,Y)= =σ σ =γXY.
var(X)var(Y) X Y
−1≤corr(X,Y)≤1.
corr(X, Y ) = 1: perfect positive linear association corr(X, Y ) = −1: perfect negative linear association corr(X, Y ) = 0: no linear association
not depends on units of measurement
Conditional distribution and mean 17
Conditional distribution of Y:
The distribution of Y , given value(s) of some other random variable, X
Example: the distribution of future values of a series conditional upon past values.
Conditional mean or variance of Y:
The mean or variance of conditional distribution:
E(Y |X) and V ar(Y |X) (important!)
Examples: the mean or variance of a series conditional upon its past values.
Goal 18
Reviews the core ideas of the theory of probability and statistics that are needed for regression analysis and forecasting
The probability framework for statistical inference Moments, Covariance, and Correlation
Sampling Distributions and Estimation
Hypothesis Testing and Confident interval