Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Human-Oriented Robotics
Probability Refresher
Kai Arras
Social Robotics Lab, University of Freiburg
1
Probability Refresher
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
•
• • • • • • • • •
•
•
• •
Introduction to Probability
Random variables Joint distribution Marginalization Conditional probability Chain rule
Bayes’ rule
Independence Conditional independence Expectation and Variance
We assume that you are familiar with the fundamentals of probability theory and probability distributions
This is a quick refresher, we aim at ease of understanding rather than formal depth
For a more comprehensive treatment, refer, e.g. to A. Papoulis or the references given on the last slide
Common Probability Distributions
Bernoulli distribution
• Binomial distribution
• Categorial distribution
• Multinomial distribution
• Poisson distribution
Gaussian distribution Chi-squared distribution
2
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Why probability theory?
• •
• •
• • • •
•
Consider a human, animal, or robot in the real world those task involves the solution of a set of problems (e.g. an animal looking for food, a robot serving coffee, …)
In order to be successful, it needs to observe and estimate the state of the world around it and act in an appropriate way
Uncertainty is an inescapable aspect of the real world It is a consequence of several factors, for example,
Uncertainty from partial, indirect and ambiguous observations of the world Uncertainty in the values of observations (e.g. sensor noise)
Uncertainty in the origin of observations (e.g. data association)
Uncertainty in action execution (e.g. from limitations in the control system)
Probability theory is the most powerful (and accepted) formalism to deal with uncertainty
3
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Random Variables
• •
•
• •
•
A random variable x denotes an uncertain quantity
x could be the outcome of an experiment such as rolling a dice (numbers from 1 to 6), “ipping a coin (heads, tails), or measuring a temperature (value in degrees Celcius)
If we observe several instances then it might take a different value each time, some values may occur more often than others. This information is captured by the probability distribution p(x) of x
A random variable may be continuous or discrete
Continuous random variables take values that are real numbers: !nite
(e.g. time taken to #nish 2-hour exam), in!nite (time until next bus arrives)
Discrete random variables take values from a prede!ned set: ordered (e.g. outcomes 1 to 6), unordered (e.g. “sunny”, “raining”, “cloudy”), !nite or in!nite.
4
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Random Variables
•
Continuous distribution
•
value, its integral always sums to one The probability distribution p(x) of a
discrete random variables is called
probability mass function and can
be visualized as a histogram (less
often: Hinton diagram). Each outcome
Discrete distribution
The probability distribution p(x) of a continuous random variable is called probability density function (pdf ). This function may take any positive
has a positive probability associated
to it whose sum is always one
5
Source [1] Source [1]
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Joint Probability
• •
• •
• •
Consider two random variables x and y
If we observe multiple paired instances of x and y, then some outcome combinations occur more frequently than others. This is captured in the joint probability distribution of x and y, written as p(x,y)
A joint probability distribution may relate variables that are all discrete, all continuous, or mixed discrete-continuous
Regardless – the total probability of all outcomes (obtained by summing or integration) is always one
In general, we can have p(x,y,z). We may also write to represent the joint probability of all elements in vector
We will write to represent the joint distribution of all elements from random vectors and
6
Introduction to Probability
Joint Probability
•
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Joint probability distribution p(x,y) examples:
Continuous:
Discrete: Mixed:
Source [1]
7
Source [1]
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Marginalization
• •
• •
•
We can recover the probability distribution of a single variable from a joint distribution by summing over all the other variables
Given a continuous p(x,y)
The integral becomes a sum in the discrete case
Recovered distributions are referred to as marginal distributions. The
process of integrating/summing is called marginalization
We can recover any subset of variables. E.g., given w, x, y, z where w is discrete
8
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Marginalization
• •
Calculating the marginal distribution p(x) from p(x,y) has a simple interpretation: we are #nding the probability distribution of x regardless of y (in absence of information about y)
Marginalization is also known as sum rule of law of total probability
Continuous Discrete Mixed
9
Source [1]
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Conditional Probability
• •
•
•
The probability of x given that y takes a !xed value y* tells us the relative frequency of x to take different outcomes given the conditioning event that y equal y*
This is written p(x|y = y*) and is called the conditional probability of x given y equals y*
The conditional probability p(x|y) can be recovered from the joint
distribution p(x,y)
p(x,y)
This can be visualized by a slice p(x,y = y*) through the joint distribution
p(x|y = y1)
p(x|y = y2)
Source [1]
10
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Conditional Probability
• •
•
The values in the slice tell us about the relative probability of x given y = y*, but they do not themselves form a valid probability distribution
They cannot sum to one as they constitute only a small part of p(x,y) which itself sums to one
To calculate a proper conditional probability distribution, we hence normalize by the total probability in the slice
where we use marginalization to simplify the denominator
11
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Conditional Probability
•
Instead of writing
it is common to use a more compact notation and write the conditional probability relation without explicitly de!ning the value y = y*
• •
This can be rearranged to give By symmetry we also have
12
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Bayes’ Rule
•
•
In the last two equations, we expressed the joint probability in two ways. When combining them we get a relationship between p(x|y) and p(y|x)
Rearranging gives
where we have expanded the denominator using the de#nition of marginal and conditional probability, respectively
13
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Bayes’ Rule
•
•
In the last two equations, we expressed the joint probability in two ways. When combining them we get a relationship between p(x|y) and p(y|x)
Rearranging gives
Bayes’ rule
where we have expanded the denominator using the de#nition of marginal and conditional probability, respectively
14
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Bayes’ Rule
•
Each term in Bayes’ rule has a name
likelihood
posterior
The posterior represents what we know about x given y Conversely, the prior is what is known about x before considering y
prior
normalizer
(a.k.a. marginal likelihood, evidence)
•
•
•
•
Bayes’ rule provides a way to change your existing beliefs in the light of new evidence. It allows us to combine new data with the existing knowledge or expertise
Bayes’ rule is important in that it allows us to compute the conditional probability p(x|y) from the “inverse” conditional probability p(y|x)
15
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Bayes’ Rule Example
Suppose that a tuberculosis (TB) skin test is 95% accurate. That is, if the patient is TB- infected, then the test will be positive with probability 0.95, and if the patient is not infected, then the test will be negative with probability 0.95.
A person gets a positive test result. What is the probability that he is infected?
• Wanted: given = 0.95, = 0.05
• Naive reasoning: given that the test result is wrong 5% of the time, then the probability that the subject is infected is 0.95
• Bayes’ rule: we need to consider the prior probability of TB infection , and the probability of getting positive test result
Example from [2]
16
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Bayes’ Rule Example (cont.)
• What is the probability of getting a positive test result, ?
• Let’s expand the denominator
• Suppose that 1 in 1000 of subjects who get tested is infected:
• We see that 0.95 · 0.001 = 0.00095 infected subjects get a positive result, and 0.05 · 0.999 = 0.04995 uninfected subjects get a positive result. Thus,
= 0.00095 + 0.04995 = 0.0509
• Applying Bayes’ rule, we obtain = 0.95 · 0.001 / 0.0509 ≈ 0.0187
17
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Bayes’ Rule Example (cont.)
• Wait, only 2%?
• This is much more than the prior infection probability of 0.001 but still… what if we needed a more accurate results?
• Insights
• Our subject was a random person for which = 0.001 is indeed low
• Our clinical test is very inaccurate, in particular is high
• If we set = 0.0001 (0.1 ‰) leaving all other values the same, we obtain a posterior probability of 0.90
• If we set = 0.9999 leaving all other values the same, we obtain a posterior of 0.0196
• The false positive rate is important in this case (see also later in this course)
18
p(x|y) = p(y|x) p(x)
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Introduction to Probability
p(y x) p(x)
p(y)
•
p(Positive|TB) p(TB)
Another immediate result of the de#nition of conditional probability is
•
p(Positive|TB) p(TB) + p(Positive|¬TB) p(¬TB) p(x|y) p(y) = p(y|x) p(x)
p x|y
=R =R
Chain Rule
=p(x|y = y⇤) = = RR
the chain rule p(x|y) = p(TB|Positive) =
p(Positive) p(Positive|TB) p(TB)
B|Positive) =
in ruleIn: general,
p(Positive|TB) p(TB) + p(Positive|¬TB) p(¬TB) p(Positive|TB) p(TB)
p(y) p(y|x) p(x) p(y|x) p(x) p(x,y = y⇤)
p(x, y) dx p(y|x) p(x) dx p(y|x) p(x) p(y|x) p(x)
p(y = y )
p(x, y) dx p(y|x) p(x) dx
p(TB|Positive) =
p(y) p(TB|Positive) = p(Positive)
p(x, y) p(Positive|TB) p(TB)
p(x, y) = p(x|y) p(y) p(x, y) = p(y|x) p(x)
⇤
p(x1,x2,…,xK) = p(x1) p(x2|x1) p(x3|x1,x2) ··· p(xk|x1,x2,…,xK 1)
p(x|y) = compactly expressed as
p(y|x) p(x) YK
,…,x ) = p(x )p(x |x )p(x |x ,x )···p(x |x ,x ,…,x ) 2 K 1 21 312 k12 K 1
p(x,x,…,x )= p(x|x,…,x ) p1(y)2 K i 1 i 1
i=1
p(y|x) p(xY) p(y|x) p(x)
=R K=R p(x,x,…,x )= p(x|x,…,x )
ependence:
p(TB|Positive) =
12K i1i 1 p(x, y) dx p(y|x) p(x) dx
i=1
p(x|y) = p(x) p(y|x) = p(y)
p(Positive|TB) p(TB) p(x,y) = p(x|y)p(y)
p(Positive)
p x,y p x p y
19
T a
x
d
()=
|| |
p(x|y) = p(x)
p(y|x) = p(y) ()=()()
p(Positive|TB) p(TB)
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Chain Rule
• •
•
In other words, we can express the joint probability of random variables in terms of the probability of the !rst, the probability of the second given the !rst, and so on
Note that we can expand this expression using any order of variables, the result will be the same
The chain rule is also known as the product rule
20
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Independence
•
• •
Assume that the value of variable x tells us nothing about variable y and vice versa. Formally,
Then, we say x and y are independent
When substituting this into the conditional probability relation
we see that for independent variables the joint probability is the product of the marginal probabilities
21
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Independence
• •
•
Let us visualize this for the joint distribution of two independent variables x and y
Independence of x and y means that every conditional distribution is the same (recall that the conditional distribution is the “normalized version of the slice”)
The value of y tells us nothing about x and vice versa
Source [1]
22
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Conditional Independence
• •
• •
•
While independence is a useful property, it is not often that we encounter two independent events. A more common situation is when two variables are independent given a third one
Consider three variables x1, x2, x3. Conditional independence is written as
Conditional independence is always symmetric
Note that when x1 and x3 are conditionally independent given x2, this does not mean that x1 and x3 are themselves independent. It implies that if we know x2, then x1 provides no further information about x3
This typically occurs in chain of events: if x1 causes x2 and x2 causes x3, then the dependence of x3 on x1 is entirely “contained” in x2
23
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Conditional Independence
•
• •
•
Example: entering a hip nightclub
Suppose we want to reason about the chance that a student enters
the two hottest nightclubs in town. Denote A the event “student passes bouncer of club A”, and B the event “student passes bouncer of club B”
Usually, these two events are not independent because if we learn that the student could enter club B, then our estimate of his/her probability of entering club A is higher since it is a sign that the student is hip, properly dressed and not too drunk
Now suppose that the doormen base their decisions only on the looks of the student’s company, and we know their preferences. Thus, learning that event B has occurred should not change the probability of event A: the looks of the company contains all relevant information to his/her chances of passing. Finding out whether he/she could enter club B does not change that
• •
Formally,
In this case, we say is conditionally independent of given
24
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Conditional Independence
•
•
• •
•
• • • •
•
Example: rolling a blue and red die
The two results are independent of each other
Now someone tells you “the blue result isn’t a 6 and the red result isn’t a 1”
From this information, you cannot gain any knowledge about the red die by looking at the blue die. The probability for each number except 1 on the red one is still 1/5
The information does not affect the independence of the results
Now someone tells you “the sum of the two results is even”
This allows you to learn a lot about the red die by looking at the blue die
For instance, if you see a 3 on the blue die, the red die can only be 1, 3 or 5
The result probabilities are not conditionally independent given this information
Conditional independence is always relative to the given condition
25
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Conditional Independence
•
Variable x1 is said to be conditional independent of variable x3 given variable x2 if – given any value of x2 – the probability distribution of x1 is the same for all values of x3 and the probability distribution of x3 is the same for all values of x1
• •
•
Let us look at a graphical example
Consider the joint density function of three discrete random variables
x1, x2, x3 which take 4, 3, and 2 possible values, respectively
All 24 probabilities sum to one
26
Source [1]
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Conditional Independence
First, let’s consider independence:
• • •
Figure b, marginalization of x3 : no independence between x1 and x2 Figure c, marginalization of x2 : no independence between x1 and x3 Figure d, marginalization of x1 : no independence between x2 and x3
Source [1]
27
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Conditional Independence
Now let’s consider conditional independence given x2
•
Figures e, f, g: value of x2 is #xed at 1, 2, 3 respectively
• •
For #xed x2 variable x1 tells us nothing more about x3 and vice versa Thus, x1 and x3 are conditionally independent given x2
Source [1]
28
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Expectation
• •
• •
Intuitively, the expected value of a random variable is the value one would “expect” to #nd if one could repeat the random variable process an in!nite number of times and take the average of the values obtained
Let x be a discrete random variable, then the expectation of x under the distribution p is
In the continuous case, we use density functions and integrals
It is a weighted average of all possible values where the weights are the corresponding values of the probability mass/density function
29
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Expectation
• •
•
For example, if x models the outcome of rolling a fair die, then
With a biased die where p(x = 6) = 0.5 and p(x = x*) = 0.1 for x* < 6, then
Often, we are interested in expectations of a function of random variables. Thus, we extend the de#nition to
30
Introduction to Probability
Expectation
Human-Oriented Robotics
Prof. Kai Arras
E[x] =
x · p(x)
Social Robotics Lab
Expectation
• •
•
Zx
E[x] = x · p(x) dx
E[x] = 1 · 16 + 2 · 16 + · · · + 6 · 16 E[x] = 1 · 0.1 + · · · + 5 · 0.1 + 6 · 0
E[f (x)] = X f (x) · p(x) E[f (x)
This idea also generalizes to functions of more than one variable
(x1,x2,...,xk) (g(x1),g(x2),..
x
ZZ Z
E[f (x, y)] = f (x, y) · p(x, y) Note however, that any function g of a set of a random variable x, or a set
+1 1
of variables is essentially a new random variable y
μkx = E[(x E[x])k] =
(x μ
For some choices of function f, the expectation is given a special name
E[a] = a E[a · x + b] = a E[x] + b
E[af(x)+b] = a E[f(x)]+b E[a · f(
Function f(x), f(x,y)
Expectation
meanxy x
x·y x· k-th moment about zero
k-th central moment
Moments
variance ( ) kk
(
xskew x
kurtosis
covariance of x and y
E[ + ]=E[ ]+E[y] E[ ]=E[ ] E[y]
x μ x x μ x μx)2 (x μx)3 moments
(x μx)k k
E[f(x)+g(x)]= E[f(x)·g(x)]=
kurtosis are also
de#ned as
Skew and
standardized
Variance
Var[x] = E[(x E[x])2] = +1(x
31
X
Z 1 μ
]
x
E
x
E
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Expectation
•
The expected value of a speci#ed integer power of the deviation of the random variable from the mean is called central moment or moment about the mean of a probability distribution
• •
•
•
•
•
Ordinary moments (or raw moments) are de#ned about zero Moments are used to characterize the shape of a distribution
The mean is the #rst raw moment. It’s actually a location measure The variance describes the distribution’s width or spread
The skew describes – loosely speaking – the extent to which a probability distribution "leans" to one side of the mean. A measure of asymmetry
The kurtosis is a measure of the "peakedness" of the probability distribution
32
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Expectation
• •
• • •
There are four rules for manipulating expectations, which can be easily proved from the original de#nition
Expected value of a constant
Expected value of a constant times a random variable
thus
Expected value of the sum of two random variables
Expected value of the product of two random variables if x,y are independent
33
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Expectation
• •
• • •
These properties also apply to functions of random variables Expected value of a constant
Expected value of a constant times a function thus
Expected value of the sum of two functions
Expected value of the product of two functions
if x,y are independent
34
Introduction to Probability
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Variance
•
• • •
The variance is the second central moment, de#ned as
Alternative formulation
Its square root is called the standard deviation
The rules for manipulating variances are as follows Variance of a linear function
Variance of a sum of random variables
if x,y are independent
35
Probability Refresher
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
•
• • • • • • • • •
•
•
• •
Introduction to Probability
Random variables Joint distribution Marginalization Conditional probability Chain rule
Bayes’ rule
Independence Conditional independence Expectation and Variance
We assume that you are familiar with the fundamentals of probability theory and probability distributions
This is a quick refresher, we aim at ease of understanding rather than formal depth
For a more comprehensive treatment, refer, e.g. to A. Papoulis or the references on the last slide
Common Probability Distributions
Bernoulli distribution
• Binomial distribution
• Categorial distribution
• Multinomial distribution
• Poisson distribution
Gaussian distribution Chi-squared distribution
36
Common Probability Distributions
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Bernoulli Distribution
• •
• •
1
0.5
0
Given a Bernoulli experiment, that is, a yes/no experiment with outcomes 0 (“failure”)
or 1 (“success”)
The Bernoulli distribution is a discrete proba- bility distribution, which takes value 1 with success probability and value 0 with failure probability 1 –
01
1−
Probability mass function
Notation
Parameters
•
Expectation
•
Variance
•
: probability of observing a success
37
Common Probability Distributions
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Binomial Distribution
• •
• •
0.2
0.1
00 10 20 30 40
m
Given a sequence of Bernoulli experiments The binomial distribution is the discrete
probability distribution of the number of successes m in a sequence of N indepen- dent yes/no experiments, each of which yields success with probability
= 0.5, N = 20 = 0.7, N = 20 = 0.5, N = 40
Probability mass function
Notation
Parameters
• •
Expectation
•
Variance
•
N : number of trials
: success probability
38
Common Probability Distributions
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Binomial Distribution
•
0.2
0.1
00 10 20 30 40
m
• •
For N = 1, the binomial distribution is the Bernoulli distribution
For #xed expectation , the Binomial converges towards the Poisson distribution as N goes to in#nity
The quantity
is the binomial coefficient (“N choose m”) and denotes the number of ways of choosing m objects out of a total of N identical objects
Parameters
• •
Expectation
•
Variance
•
N : number of trials
: success probability
= 0.5, N = 20 = 0.7, N = 20 = 0.5, N = 40
39
Common Probability Distributions
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Categorial Distribution
• •
• •
• •
0.5
Considering a single experiment with K possible outcomes
The categorial distribution is a discrete distribution that describes the probability of observing one of K possible outcomes
0
12345
2
5
Generalizes the Bernoulli distribution
The probability of each outcome is speci#ed
as with
Parameters
•
Expectation
•
Variance
•
Probability mass function
Notation
: vector of outcome probabilities
40
Common Probability Distributions
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Multinomial Distribution
• •
Given a sequence of experiments, each with K possible outcomes
The multinomial distribution is the discrete probability distribution of the number of observations of values {1,2,...,K} with counts
in a sequence of N independent trials
m1
m2
•
In other words:
For N independent trials each of which leads to a success for exactly one of K categories, the multinomial distribution gives the probability of a combination of numbers of successes for the various categories
Parameters
• •
Expectation
•
Variance
•
N : number of trials
: success probabilities
41
Common Probability Distributions
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Multinomial Distribution
•
to
• •
Each category has a given #xed success
probability
subject
Probability mass function
Notation with
m1
m2
Parameters
• •
Expectation
•
Variance
•
N : number of trials
: success probabilities
42
Common Probability Distributions
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Multinomial Distribution
•
The quantity
is the multinomial coefficient and denotes the number of ways of taking N identical objects and assigning mk of them to bin k
m1
m2
• •
Generalizes the binomial distribution to K outcomes
Generalizes the categorial distribution to sequences of N trials
Parameters
• •
Expectation
•
Variance
•
N : number of trials
: success probabilities
43
Common Probability Distributions
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Multinomial Distribution
• N = 10, = 0.01, = 0.4, = 0.49
• Maximum at m1 = 1, m2 = 4
• Showing successes for m1, m2
• N = 40, = 0.5, = 0.25, = 0.25 • Maximum at m1 = 20, m2 = 10
• Showing successes for m1, m2
m1 m2 m1
m2
44
Common Probability Distributions
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Poisson Distribution
• •
• •
•
0.4
0.3
0.2
0.1
0
=1 =4 = 10
Consider independent events that happen with an average rate of over time
The Poisson distribution is a discrete distribution that describes the probability of a given number of events occurring in a !xed interval of time
0 5 10 15 20
Can also be de#ned over other intervals such as distance, area or volume
Probability mass function
Parameters
•
Expectation
•
Variance
•
: average rate of events over time or space
Notation
45
Common Probability Distributions
Gaussian Distribution
• •
•
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
•
Probability density function
Most widely used distribution for continuous variables
Reasons: (i) simplicity (fully represented by only two moments, mean and variance) and (ii) the central limit theorem (CLT)
The CLT states that, under mild conditions, the mean (or sum) of many independently drawn random variables is distributed approximately normally, irrespective of the form of the original distribution
1
0.5
0
−4 −2 0 2 4
Parameters
• •
Expectation
•
Variance
•
μ = 0, = 1
μ = −3, = 0.1 μ = 2, = 2
: mean
: variance
46
Common Probability Distributions
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Gaussian Distribution
• •
• •
•
Notation
Called standard normal distribution for μ = 0 and = 1
About 68% (~two third) of values drawn from a normal distribution are within a range of ±1 standard deviations around the mean
About 95% of the values lie within a range of ±2 standard deviations around the mean
Important e.g. for hypothesis testing
Parameters
• •
Expectation
•
Variance
•
: mean
: variance
47
Common Probability Distributions
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Multivariate Gaussian Distribution
•
•
•
For d-dimensional random vectors, the multivariate Gaussian distribution is governed by a d-dimensional mean vector and a D x D covariance matrix that must be symmetric and positive semi-de#nite
Probability density function
Notation
Parameters
• •
Expectation
•
Variance
•
: mean vector
: covariance matrix
48
Common Probability Distributions
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Multivariate Gaussian Distribution
• •
For d = 2, we have the bivariate Gaussian distribution
The covariance matrix (often C) deter- mines the shape of the distribution (video)
Parameters
• •
Expectation
•
Variance
•
: mean vector
: covariance matrix
49
Common Probability Distributions
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Multivariate Gaussian Distribution
• •
For d = 2, we have the bivariate Gaussian distribution
The covariance matrix (often C) deter- mines the shape of the distribution (video)
Parameters
• •
Expectation
•
Variance
•
: mean vector
: covariance matrix
49
Common Probability Distributions
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Multivariate Gaussian Distribution
• •
For d = 2, we have the bivariate Gaussian distribution
The covariance matrix (often C) deter- mines the shape of the distribution (video)
Parameters
• •
Expectation
•
Variance
•
: mean vector
: covariance matrix
50
Source [1]
Common Probability Distributions
Chi-squared Distribution
• •
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Consider k independent standard nor- mally distributed random variables
The chi-squared distribution is the continuous distribution of a sum of the squares of k independent standard normal random variables
0.5
0.25
00 1 2 3 4 5 6 7 8
k =1
k =2 k =3 k =5 k =8
• •
Parameter k is called the number of “degrees of freedom”
It is one of the most widely used probability distributions in statistical inference, e.g., in hypothesis testing
Parameters
Variance
•
• •
k : degrees of freedom Expectation
51
Common Probability Distributions
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Chi-squared Distribution
•
• •
0.5
0.25
00 1 2 3 4 5 6 7 8
Probability density function (for x ≥ 0)
k =1
k =2 k =3 k =5 k =8
Notation
For hypothesis testing, values of the cumulative distribution function are taken, typically from tables in statistics text books or online sources
Parameters
Variance
•
• •
k : degrees of freedom Expectation
52
Summary
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
• •
• •
• •
•
•
Uncertainty is an inescapable aspect of every system in the real world Probability theory is a very powerful framework to represent, propagate,
reduce and reason about uncertainty
The rules of probability are remarkably compact and simple
The concepts of marginalization, joint and conditional probability, independence and conditional independence underpin many today algorithms in robotics, machine learning, computer vision and AI
Two immediate results of the de#nition of conditional probability are Bayes’ rule and the chain rule
Together with the sum rule (marginalization) they form the foundation of even the most advanced inference and learning methods. Memorize them!
There are also alternative approaches to uncertainty representation
Fuzzy logic, possibility theory, set theory, belief functions, qualitative uncertainty representations
53
References
Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab
Sources Used for These Slides and Further Reading
The #rst section, Introduction to Probability, follows to large parts chapter 2 of Prince et al. [1], in particular also the nice #gures are taken from this book. The section also contains material from chapters 1 and 2 in Koller and Friedman [2].
Another good compact summary of probability theory can be found in the book by Bischop [3]. A comprehensive treatment of probability theory is, for instance, the book by Papoulis and Pillai [4].
[1] S.J.D. Prince, “Computer vision: models, learning and inference”, Cambridge University Press, 2012. See www.computervisionmodels.com
[2] D. Koller, N. Friedman, “Probabilistic graphical models: principles and techniques”, MIT Press, 2009. See http://pgm.stanford.edu
[3] C.M. Bischop, “Pattern Recognition and Machine Learning”, Springer, 2nd ed., 2007. See http://research.microsoft.com/en-us/um/people/cmbishop/prml
[4] A. Papoulis, S.U. Pillai, “Probability, Random Variables and Stochastic Processes”, McGraw-Hill, 4th edition, 2002. See http://www.mhhe.com/engcs/electrical/papoulis
54