CS570 Biomedical Science & Health IT
CS544 D1
Foundations of Analytics
Lecture 8
Guanglan Zhang
1
1
Hypergeometric Distribution
In probability theory, the hypergeometric distribution is a discrete probability distribution that describes the probability of k successes in n draws, without replacement, from a finite population of size (M+N) that contains exactly M successes and N failures, wherein each draw is either a success or a failure. In contrast, the binomial distribution describes the probability of k successes in n draws with replacement.
In the hypergeometric distribution,
the sample data is selected without replacement
the outcomes are dependent on the previous observations
The hypergeometric distribution for a random variable X is defined as follows:
Given a sample size of M+N, where M is the number of successes and N is the number of failures, and K is the sample size without replacement, the probability of x events of interest is
The probability mass function of X is computed for using the above formula.
https://en.wikipedia.org/wiki/Hypergeometric_distribution
2
2
Hypergeometric Distributions
The mean of the hypergeometric distribution is:
The variance of the hypergeometric distribution is:
=
3
3
Geometric Distributions
The geometric distribution concerns the number of failures before a success occurs in a sequence of Bernoulli trials. In another word, we want to count the number of trails before a success is obtained.
The random variable X is the number of failures before a success, or the number of trails needed before we obtain the first success.
If the probability of success is p (hence the probability of failure is 1–p), the probability a getting a success after 2 failures is:
The probability mass function of X, p(X=x), is
The mean of the geometric distribution is:
The variance of the geometric distribution is: =
4
4
Negative Binomial Distribution
The negative binomial distribution concerns the number of failures until a total of r successes occur in a sequence of Bernoulli trials. In a negative binomial distribution, the random variable X is the number of failures required to produce r successes. The total number of successes, r, is fixed in the experiment.
A negative binomial experiment is a statistical experiment that has the following properties:
The experiment consists of X+r repeated trials.
Each trial can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure.
The probability of success, denoted by p, is the same on every trial.
The trials are independent; that is, the outcome on one trial does not affect the outcome on other trials.
The experiment continues until r successes are observed, where r is specified in advance.
A negative binomial random variable is the number X of repeated failures to produce r successes in a negative binomial experiment. The probability distribution of a negative binomial random variable is called a negative binomial distribution. The negative binomial distribution is also known as the Pascal distribution.
http://stattrek.com/probability-distributions/negative-binomial.aspx?Tutorial=AP
5
5
Negative Binomial Distribution
The probability mass function, , of the negative binomial random variable X is:
The mean of the negative binomial distribution is:
The variance of the negative binomial distribution is:
6
6
Poisson Distribution
The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.
In the Poisson distribution, the random variable X counts the number of events occurring in the unit interval. The probability mass function of the random variable X, P(X=x), is:
Where λ is a positive real number and
If the random variable X counts the number of events in the interval [0,t], then X has the Poisson distribution X ~ pois(lambda = λt):
The mean of the Poisson distribution is:
The variance of the Poisson distribution is:
7
7
Continuous Distributions
For continuous random variables, the probability density function (PDF) defines the distribution of values.
Common distributions include the normal distribution (also known as Gaussian distribution), uniform distribution (also known as rectangular distribution), and exponential distribution.
8
8
Continuous Uniform Distributions
For the random variable X with the continuous uniform distribution over the interval [a,b], the probability of occurrence is the same anywhere in the range.
The PDF for the random variable X is:
The cumulative distribution function for the random variable X is:
https://onlinecourses.science.psu.edu/stat414/node/135
9
9
Continuous Uniform Distributions
The mean of the continuous uniform random variable X is:
The variance of the continuous uniform random variable X is:
10
10
Normal Distribution
The normal distribution is the most common continuous distribution in practice.
The density curve (a mathematical model that represents the pattern of data) of a normal distribution has a very familiar, bell curve shape with a single peak. It is perfectly symmetrical.
11
A 2006 paper (Low Birth Weight, a Risk Factor for Cardiovascular Diseases in Later Life, Is Already Associated with Elevated Fetal Glycosylated Hemoglobin at Birth) showed a histogram of the birth weights of newborns in their sample.
The normal density curve seems to fit the data well.
It is an “idealized” description of the data. It gives the general picture of the data, ignoring minor irregularities. In situations like this, we can use properties of the normal distribution to make statements about the quantitative variable.
11
The Normal distribution is an important distribution in statistics. “normal” here refers to the regularity of the distribution.
In statistics, a data sample is a set of data collected and/or selected from a statistical population by a defined procedure. The elements of a sample are known as sample points, sampling units or observations.
Normal Distribution
For a normal distribution with mean, μ, and standard deviation, σ, it is often denoted N(μ, σ). The mean is the center of the distribution and is the point that splits the area
under the bell shaped curve in half.
The probability density function of the normal random continuous variable X is:
< for the given mean μ and the standard deviation, σ.
12
12
The Normal distribution is an important distribution in statistics. “normal” here refers to the regularity of the distribution.
In statistics, a data sample is a set of data collected and/or selected from a statistical population by a defined procedure. The elements of a sample are known as sample points, sampling units or observations.
68-95-99.7 Rule
For a normal distribution with mean, μ, and standard deviation, σ,
68% of the observations fall within one standard deviation of the mean
95% of the observations fall within two standard deviations of the mean
99.7% of the observations fall within three standard deviations of the mean
13
13
The Normal distribution is an important distribution in statistics.
Instead, “normal” here refers to the regularity of the distribution.
The density curve (a mathematical model that represents the pattern of data) of a normal distribution has a very familiar, bell curve shape with a single peak. It is perfectly symmetrical.
Standardized Normal Distribution
Convert a value into standard deviation units by calculating its Z-score
Z-score tells us how many standard deviation x is from the mean
14
The area under the standard Normal curve to the left of z is 0.9370.
14
Exponential Distributions
The exponential distribution is a continuous distribution and ranges from zero to positive infinity.
It is commonly used in queuing theory for waiting time distributions, the length of time between arrivals, patients entering a hospital, etc.
It is defined by a single parameter, λ, the mean number of arrivals per unit of time.
The probability density function of the random variable X with exponential distribution is:
The cumulative distribution function is:
15
15
The exponential distribution is the only continuous memoryless random distribution. It is a continuous analog of the geometric distribution.
The exponential distribution is a continuous probability distribution used to model the time we need to wait before a given event occurs.
Exponential Distributions
The mean of the exponential random variable X is:
The variance of the exponential random variable X is:
The exponential distribution is strictly related to the Poisson distribution.
If 1) an event can occur more than once and 2) the time elapsed between two successive occurrences is exponentially distributed and independent of previous occurrences, then the number of occurrences of the event within a given unit of time has a Poisson distribution.
https://www.statlect.com/probability-distributions/exponential-distribution
16
16
In-class quizzes
Go to https://b.socrative.com
Enter classroom: ZHANG6334
17
17
/docProps/thumbnail.jpeg