COMP2610 / COMP6261 Information Theory – Lecture 2: First Steps and Basic Probability
COMP2610 / COMP6261 Information Theory
Lecture 2: First Steps and Basic Probability
Robert C. Williamson
Research School of Computer Science
The Australian National University
1 L O G O U S E G U I D E L I N E S T H E A U S T R A L I A N N A T I O N A L U N I V E R S I T Y
ANU Logo Use Guidelines
Deep Gold
C30 M50 Y70 K40
PMS Metallic 8620
PMS 463
Black
C0 M0 Y0 K100
PMS Process Black
Preferred logo Black version
Reverse version
Any application of the ANU logo on a coloured
background is subject to approval by the Marketing
Office, contact
brand@anu.edu.au
The ANU logo is a contemporary
reflection of our heritage.
It clearly presents our name,
our shield and our motto:
First to learn the nature of things.
To preserve the authenticity of our brand identity, there are
rules that govern how our logo is used.
Preferred logo – horizontal logo
The preferred logo should be used on a white background.
This version includes black text with the crest in Deep Gold in
either PMS or CMYK.
Black
Where colour printing is not available, the black logo can
be used on a white background.
Reverse
The logo can be used white reversed out of a black
background, or occasionally a neutral dark background.
July 24, 2018
1 / 35
Outline
1 A General Communication System
2 The Role of Uncertainty
3 Basic Concepts In Probability
4 Relating Joint, Marginal and Conditional Probabilities
5 Wrapping Up
2 / 35
1 A General Communication System
2 The Role of Uncertainty
3 Basic Concepts In Probability
4 Relating Joint, Marginal and Conditional Probabilities
5 Wrapping Up
3 / 35
A General Communication System
Source : The information source that generates the message to be
communicated
Encoder : Operates on the message to produce a signal suitable for
transmission
Channel : The medium used to transmit the signal
Decoder : Reconstructs the message from the signal
Destination : The entity that receives the message
4 / 35
Communication over Noisy Channels
A channel is some medium for transmitting messages
A noisy channel is a channel that potentially introduces errors in the
sender’s message
The Problem of Communication
“The fundamental problem of communication is that of reproducing at one
point either exactly or approximately a message selected at another point.”
(Claude Shannon, 1948)
5 / 35
Example: Telephone Network
“Hi”% “Hi”%
Source : Aditya
Encoder : Telephone handset
Channel : Analogue telephone line
Decoder : Telephone handset
Destination : Mark
6 / 35
Examples of Noisy Communication Channels
Other examples of noisy channels:
A radio communication link
VDSL NBN connection
The complete link from camera, through editing, Netflix, the rest of the
internet, VDSL link to home, wifi to TV screen
Reproducing cells
A magnetic hard disk drive
I Channel does not need to involve physical movement
What would the other components be for each of these channels?
7 / 35
1 A General Communication System
2 The Role of Uncertainty
3 Basic Concepts In Probability
4 Relating Joint, Marginal and Conditional Probabilities
5 Wrapping Up
8 / 35
Uncertainty in Communication
We can not avoid uncertainty when
1 Dealing with noise (imperfections) in the channel
2 “Compressing” the messages (compare a high-resolution photograph
of a manuscript with the typed text that captures the essence; or a
transcript of a spoken utterance)
9 / 35
Channel Noise
A noisy channel introduces errors in sender’s message
Thus, receiver is uncertain that the message is what the sender intended
How to model, quantify, and mitigate this uncertainty?
10 / 35
Message Compression – I
Cover and Thomas, Example 1.1.2
Suppose we’d like to relay the outcome of an 8 horse race to a friend
I We wish to convey one of { A, B, . . ., H }
Suppose we encode the message as a binary string. A natural coding
scheme is
A → 000
B → 001
C → 010
…
H → 111
11 / 35
Message Compression – II
Cover and Thomas, Example 1.1.2
Now say the probabilities of the horses winning are
(1/2, 1/4, 1/8, 1/16, 1/64, 1/64, 1/64, 1/64)
Encoding messages based on their probability of the being chosen
will give shorter expected lengths:
A→ 0
B→ 10
C→ 110
D→ 1110
E→ 11110
F→ 111100
G→ 111101
H→ 111111
12 / 35
What is “Information”?
For noise correction and message compression, we will need to quantify
the information contained in a message
Roughly, “information” measures how much:
Unexpected data a message contains
The receiver’s uncertainty is reduced on seeing the message
13 / 35
The Case for Probability
We run into the notion of uncertainty when trying to pin down:
1 How to deal with channel noise
2 How to compress messages
3 What “information” means
To make progress, we need to formalise uncertainty
We will do this using probability theory
We now commence our review of probability; this will be hard going if you
have not met it before!
14 / 35
1 A General Communication System
2 The Role of Uncertainty
3 Basic Concepts In Probability
4 Relating Joint, Marginal and Conditional Probabilities
5 Wrapping Up
15 / 35
Probability: Example
Quantification and Manipulation of Uncertainty (Bishop, PRML, 2006)
1 Pick a box at random
2 From the selected box, pick a fruit (apple or orange) at random
3 Observe the fruit type and return it back to the original box
16 / 35
Probability: Example
Quantification and Manipulation of Uncertainty (Bishop, PRML, 2006)
1 Pick a box at random
2 From the selected box, pick a fruit (apple or orange) at random
3 Observe the fruit type and return it back to the original box
16 / 35
Probability: Example
Quantification and Manipulation of Uncertainty (Bishop, PRML, 2006)
1 Pick a box at random
2 From the selected box, pick a fruit (apple or orange) at random
3 Observe the fruit type and return it back to the original box
16 / 35
Probability: Example
Quantification and Manipulation of Uncertainty (Bishop, PRML, 2006)
1 Pick a box at random
2 From the selected box, pick a fruit (apple or orange) at random
3 Observe the fruit type and return it back to the original box
16 / 35
Probability: Example
Quantification and Manipulation of Uncertainty (Bishop, PRML, 2006) — Cont’d
Identity of the box is a random variable B ∈ {r , b}
Identity of the fruit is a random variable F ∈ {a, o}
Probability of an event: Proportion of times it happens out of a large
number of trials
17 / 35
Probability: Example
Quantification and Manipulation of Uncertainty (Bishop, PRML, 2006) — Cont’d
Identity of the box is a random variable B ∈ {r , b}
Identity of the fruit is a random variable F ∈ {a, o}
Probability of an event: Proportion of times it happens out of a large
number of trials
17 / 35
Probability: Example
Suppose we repeat the selection process many times, and ended up
picking up the blue box 60% of the time and the red box 40% of the time
p(B = r) = 0.4, p(B = b) = 0.6
18 / 35
Probability: Basic Properties
By definition, 0 ≤ p(B = b) ≤ 1 and 0 ≤ p(B = r) ≤ 1
Outcomes are mutually exclusive:
p(B = r ANDB = b) = p(B = r ,B = b)
= 0
Outcomes are jointly exhaustive:
p(B = r ORB = b) = p(B = r) + p(B = b)− p(B = r ANDB = b)
= p(B = r) + p(B = b)
= 1
19 / 35
Probability
What Types of Questions Can We Answer?
What is the probability of picking the red box, and an apple within that
box?
What is the (overall) probability of picking up an apple?
Given that we selected a red box, what is the probability of selecting
an apple?
We can answer these and more complex questions by using the rules of
probability.
20 / 35
Joint Probability
What is the probability of selecting the red box and selecting an apple?
Joint Probability of a Set of Events
The proportion of times these events happened together out of the total
number of trials.
If we repeated our experiment many (say N = 100) times, and in 10 of the
trials we saw B = r and F = a, then we may estimate
p(B = r ANDF = a) = p(B = r ,F = a)
=
10
100
= 0.1
21 / 35
Joint Probability
What is the probability of selecting the red box and selecting an apple?
Joint Probability of a Set of Events
The proportion of times these events happened together out of the total
number of trials.
If we repeated our experiment many (say N = 100) times, and in 10 of the
trials we saw B = r and F = a, then we may estimate
p(B = r ANDF = a) = p(B = r ,F = a)
=
10
100
= 0.1
21 / 35
Joint Probability
What is the probability of selecting the red box and selecting an apple?
Joint Probability of a Set of Events
The proportion of times these events happened together out of the total
number of trials.
If we repeated our experiment many (say N = 100) times, and in 10 of the
trials we saw B = r and F = a, then we may estimate
p(B = r ANDF = a) = p(B = r ,F = a)
=
10
100
= 0.1
21 / 35
Marginal Probability
What is the probability of an apple being picked, regardless of the box we
selected?
Marginal Probability of an Event
The proportion of times that this event happened out of the total number of
trials.
Remember that we selected a red box and an apple in 10 out of 100 trials
Say that in 45 of the trials, we selected a blue box and an apple
So, irrespective of B, an apple was selected 45 + 10 = 55 times, and:
p(F = a) =
55
100
=
11
20
22 / 35
Marginal Probability
What is the probability of an apple being picked, regardless of the box we
selected?
Marginal Probability of an Event
The proportion of times that this event happened out of the total number of
trials.
Remember that we selected a red box and an apple in 10 out of 100 trials
Say that in 45 of the trials, we selected a blue box and an apple
So, irrespective of B, an apple was selected 45 + 10 = 55 times, and:
p(F = a) =
55
100
=
11
20
22 / 35
Marginal Probability
What is the probability of an apple being picked, regardless of the box we
selected?
Marginal Probability of an Event
The proportion of times that this event happened out of the total number of
trials.
Remember that we selected a red box and an apple in 10 out of 100 trials
Say that in 45 of the trials, we selected a blue box and an apple
So, irrespective of B, an apple was selected 45 + 10 = 55 times, and:
p(F = a) =
55
100
=
11
20
22 / 35
Conditional Probability
What is the probability of an apple being picked up, given that a red box
was selected?
Conditional Probability of an Event
The conditional probability of an event X with respect to an event Y is the
proportion of times that X happens out of the times that Y happens.
The trials where we selected a blue box are irrelevant, whether or not an
apple was selected
We selected a red box and an apple 10 out of 100 times
We selected a red box (regardless of the fruit) 40 out of 100 times
p(F = a GIVENB = r) = p(F = a|B = r) =
10
40
=
1
4
Can we write this in terms of the joint and marginal probabilities?
23 / 35
Conditional Probability
What is the probability of an apple being picked up, given that a red box
was selected?
Conditional Probability of an Event
The conditional probability of an event X with respect to an event Y is the
proportion of times that X happens out of the times that Y happens.
The trials where we selected a blue box are irrelevant, whether or not an
apple was selected
We selected a red box and an apple 10 out of 100 times
We selected a red box (regardless of the fruit) 40 out of 100 times
p(F = a GIVENB = r) = p(F = a|B = r) =
10
40
=
1
4
Can we write this in terms of the joint and marginal probabilities?
23 / 35
Conditional Probability
What is the probability of an apple being picked up, given that a red box
was selected?
Conditional Probability of an Event
The conditional probability of an event X with respect to an event Y is the
proportion of times that X happens out of the times that Y happens.
The trials where we selected a blue box are irrelevant, whether or not an
apple was selected
We selected a red box and an apple 10 out of 100 times
We selected a red box (regardless of the fruit) 40 out of 100 times
p(F = a GIVENB = r) = p(F = a|B = r) =
10
40
=
1
4
Can we write this in terms of the joint and marginal probabilities?
23 / 35
Conditional Probability
What is the probability of an apple being picked up, given that a red box
was selected?
Conditional Probability of an Event
The conditional probability of an event X with respect to an event Y is the
proportion of times that X happens out of the times that Y happens.
The trials where we selected a blue box are irrelevant, whether or not an
apple was selected
We selected a red box and an apple 10 out of 100 times
We selected a red box (regardless of the fruit) 40 out of 100 times
p(F = a GIVENB = r) = p(F = a|B = r) =
10
40
=
1
4
Can we write this in terms of the joint and marginal probabilities?
23 / 35
1 A General Communication System
2 The Role of Uncertainty
3 Basic Concepts In Probability
4 Relating Joint, Marginal and Conditional Probabilities
5 Wrapping Up
24 / 35
Joint, Marginal and Conditional Probabilities:
A More General Formulation (1)
Consider the more general case of two random variables:
X ∈ {x1, x2, . . . , xM} and Y ∈ {y1, y2, . . . , yL}
N : Total number of trials
nij : ](X = xi ,Y = yj) = ] of times that xi and yi happen
ci : ](X = xi) =
∑
j nij = ] of times that xi happens
rj : ](Y = yj) =
∑
i nij = ] of times that yj happens
25 / 35
Joint, Marginal and Conditional Probabilities:
A More General Formulation (1)
Consider the more general case of two random variables:
X ∈ {x1, x2, . . . , xM} and Y ∈ {y1, y2, . . . , yL}
N : Total number of trials
nij : ](X = xi ,Y = yj) = ] of times that xi and yi happen
ci : ](X = xi) =
∑
j nij = ] of times that xi happens
rj : ](Y = yj) =
∑
i nij = ] of times that yj happens
25 / 35
Joint, Marginal and Conditional Probabilities:
A More General Formulation (1)
Consider the more general case of two random variables:
X ∈ {x1, x2, . . . , xM} and Y ∈ {y1, y2, . . . , yL}
N : Total number of trials
nij : ](X = xi ,Y = yj) = ] of times that xi and yi happen
ci : ](X = xi) =
∑
j nij = ] of times that xi happens
rj : ](Y = yj) =
∑
i nij = ] of times that yj happens
25 / 35
Joint, Marginal and Conditional Probabilities:
A More General Formulation (1)
Consider the more general case of two random variables:
X ∈ {x1, x2, . . . , xM} and Y ∈ {y1, y2, . . . , yL}
N : Total number of trials
nij : ](X = xi ,Y = yj) = ] of times that xi and yi happen
ci : ](X = xi) =
∑
j nij = ] of times that xi happens
rj : ](Y = yj) =
∑
i nij = ] of times that yj happens
25 / 35
Joint, Marginal and Conditional Probabilities:
A More General Formulation (2)
By definition:
p(X = xi ,Y = yj) =
nij
N
(Joint)
p(X = xi) =
ci
N
(Marginal)
p(Y = yj) =
rj
N
(Marginal)
p(Y = yj |X = xi) =
nij
ci
(Conditional)
26 / 35
Joint, Marginal and Conditional Probabilities:
A More General Formulation (1)
Bins and fruit example:
Orange Apple
Blue 15 45
Red 30 10
Verify the computations from previous section with this table
27 / 35
Joint, Marginal and Conditional Probabilities:
A More General Formulation (3)
Observe:
p(X = xi) =
∑
j nij
N
=
∑
j
p(X = xi ,Y = yj)
p(Y = yj |X = xi) =
nij
ci
=
nij
N
/ci
N
= p(X = xi ,Y = yj)/p(X = xi)
28 / 35
The Rules of Probability
Sum Rule / Marginalization :
p(X = xi) =
∑
j
p(X = xi ,Y = yj)
Product Rule :
p(X = xi ,Y = yj) = p(Y = yj |X = xi)p(X = xi)
and by symmetry:
P(Y = yj ,X = xi) = p(X = xi |Y = yj)p(Y = yj)
Therefore:
P(X = xi) =
∑
j
P(X = xi ,Y = yj) =
∑
j
P(X = xi |Y = yj)P(Y = yj)
29 / 35
The Rules of Probability
Sum Rule / Marginalization :
p(X = xi) =
∑
j
p(X = xi ,Y = yj)
Product Rule :
p(X = xi ,Y = yj) = p(Y = yj |X = xi)p(X = xi)
and by symmetry:
P(Y = yj ,X = xi) = p(X = xi |Y = yj)p(Y = yj)
Therefore:
P(X = xi) =
∑
j
P(X = xi ,Y = yj) =
∑
j
P(X = xi |Y = yj)P(Y = yj)
29 / 35
The Rules of Probability
Sum Rule / Marginalization :
p(X = xi) =
∑
j
p(X = xi ,Y = yj)
Product Rule :
p(X = xi ,Y = yj) = p(Y = yj |X = xi)p(X = xi)
and by symmetry:
P(Y = yj ,X = xi) = p(X = xi |Y = yj)p(Y = yj)
Therefore:
P(X = xi) =
∑
j
P(X = xi ,Y = yj) =
∑
j
P(X = xi |Y = yj)P(Y = yj)
29 / 35
The Rules of Probability
Sum Rule / Marginalization :
p(X = xi) =
∑
j
p(X = xi ,Y = yj)
Product Rule :
p(X = xi ,Y = yj) = p(Y = yj |X = xi)p(X = xi)
and by symmetry:
P(Y = yj ,X = xi) = p(X = xi |Y = yj)p(Y = yj)
Therefore:
P(X = xi) =
∑
j
P(X = xi ,Y = yj) =
∑
j
P(X = xi |Y = yj)P(Y = yj)
29 / 35
An Illustration of a Distribution over Two Variables
joint
marginal
marginal conditional
30 / 35
An Illustration of a Distribution over Two Variables
joint marginal
marginal conditional
30 / 35
An Illustration of a Distribution over Two Variables
joint marginal
marginal
conditional
30 / 35
An Illustration of a Distribution over Two Variables
joint marginal
marginal conditional
30 / 35
Joint, Marginal and Conditional Probabilities:
An even More General Formulation
Given D random variables X1, . . . ,XD:
p(X1, . . . ,Xi−1,Xi+1, . . . ,XD) =
∑
Xi
p(X1, . . . ,XD)
Chain Rule: We can also express:
p(X1,X2) = p(X1)p(X2|X1) What are we using here?
p(X1,X2,X3) = p(X1,X2)p(X3|X1,X2) = p(X1)p(X2|X1)p(X3|X1,X2)
p(X1, . . . ,XD) = p(X1)p(X2|X1)p(X3|X2,X1) . . . p(XD|X1, . . . ,XD−1)
31 / 35
Joint, Marginal and Conditional Probabilities:
An even More General Formulation
Given D random variables X1, . . . ,XD:
p(X1, . . . ,Xi−1,Xi+1, . . . ,XD) =
∑
Xi
p(X1, . . . ,XD)
Chain Rule: We can also express:
p(X1,X2) = p(X1)p(X2|X1) What are we using here?
p(X1,X2,X3) = p(X1,X2)p(X3|X1,X2) = p(X1)p(X2|X1)p(X3|X1,X2)
p(X1, . . . ,XD) = p(X1)p(X2|X1)p(X3|X2,X1) . . . p(XD|X1, . . . ,XD−1)
31 / 35
Joint, Marginal and Conditional Probabilities:
An even More General Formulation
Given D random variables X1, . . . ,XD:
p(X1, . . . ,Xi−1,Xi+1, . . . ,XD) =
∑
Xi
p(X1, . . . ,XD)
Chain Rule: We can also express:
p(X1,X2) = p(X1)p(X2|X1) What are we using here?
p(X1,X2,X3) = p(X1,X2)p(X3|X1,X2) = p(X1)p(X2|X1)p(X3|X1,X2)
p(X1, . . . ,XD) = p(X1)p(X2|X1)p(X3|X2,X1) . . . p(XD|X1, . . . ,XD−1)
31 / 35
Joint, Marginal and Conditional Probabilities:
An even More General Formulation
Given D random variables X1, . . . ,XD:
p(X1, . . . ,Xi−1,Xi+1, . . . ,XD) =
∑
Xi
p(X1, . . . ,XD)
Chain Rule: We can also express:
p(X1,X2) = p(X1)p(X2|X1) What are we using here?
p(X1,X2,X3) = p(X1,X2)p(X3|X1,X2) = p(X1)p(X2|X1)p(X3|X1,X2)
p(X1, . . . ,XD) = p(X1)p(X2|X1)p(X3|X2,X1) . . . p(XD|X1, . . . ,XD−1)
31 / 35
1 A General Communication System
2 The Role of Uncertainty
3 Basic Concepts In Probability
4 Relating Joint, Marginal and Conditional Probabilities
5 Wrapping Up
32 / 35
Summary
General architecture for communication systems
Why we need probability
Probability theory: joint, marginal and conditional distribution
Reading: Mackay § 2.1 and § 2.2; Bishop § 1.2
33 / 35
Exercise
Coming Back to our Original Example
Given: p(B = r) = 0.4, p(B = b) = 0.6
Assume the fruit are selected uniformly from each box
Write down all conditional probabilities P(F |B)
Evaluate the overall probabilities P(F )
34 / 35
Next time
More on joint, marginal and conditional distributions
When can we say that X ,Y do not influence each other?
What, if anything, does p(X = x |Y = y) tell us about
p(Y = y |X = x)?
35 / 35
Next time
Sign-up for tutorials open at 9am wednesday 25 July. Will offer three
tutorials. If we need more, I will add.
Class rep.
36 / 35
A General Communication System
The Role of Uncertainty
Basic Concepts In Probability
Relating Joint, Marginal and Conditional Probabilities
Wrapping Up