程序代写代做代考 scheme information theory chain COMP2610 / COMP6261 Information Theory – Lecture 2: First Steps and Basic Probability

COMP2610 / COMP6261 Information Theory – Lecture 2: First Steps and Basic Probability

COMP2610 / COMP6261 Information Theory
Lecture 2: First Steps and Basic Probability

Robert C. Williamson

Research School of Computer Science
The Australian National University

1 L O G O U S E G U I D E L I N E S T H E A U S T R A L I A N N A T I O N A L U N I V E R S I T Y

ANU Logo Use Guidelines

Deep Gold
C30 M50 Y70 K40

PMS Metallic 8620

PMS 463

Black
C0 M0 Y0 K100

PMS Process Black

Preferred logo Black version

Reverse version
Any application of the ANU logo on a coloured
background is subject to approval by the Marketing
Office, contact

brand@anu.edu.au

The ANU logo is a contemporary
reflection of our heritage.
It clearly presents our name,
our shield and our motto:

First to learn the nature of things.
To preserve the authenticity of our brand identity, there are
rules that govern how our logo is used.

Preferred logo – horizontal logo
The preferred logo should be used on a white background.
This version includes black text with the crest in Deep Gold in
either PMS or CMYK.

Black
Where colour printing is not available, the black logo can
be used on a white background.

Reverse
The logo can be used white reversed out of a black
background, or occasionally a neutral dark background.

July 24, 2018

1 / 35

Outline

1 A General Communication System

2 The Role of Uncertainty

3 Basic Concepts In Probability

4 Relating Joint, Marginal and Conditional Probabilities

5 Wrapping Up

2 / 35

1 A General Communication System

2 The Role of Uncertainty

3 Basic Concepts In Probability

4 Relating Joint, Marginal and Conditional Probabilities

5 Wrapping Up

3 / 35

A General Communication System

Source : The information source that generates the message to be
communicated

Encoder : Operates on the message to produce a signal suitable for
transmission

Channel : The medium used to transmit the signal
Decoder : Reconstructs the message from the signal

Destination : The entity that receives the message

4 / 35

Communication over Noisy Channels

A channel is some medium for transmitting messages

A noisy channel is a channel that potentially introduces errors in the
sender’s message

The Problem of Communication
“The fundamental problem of communication is that of reproducing at one
point either exactly or approximately a message selected at another point.”
(Claude Shannon, 1948)

5 / 35

Example: Telephone Network

“Hi”% “Hi”%

Source : Aditya

Encoder : Telephone handset

Channel : Analogue telephone line

Decoder : Telephone handset

Destination : Mark

6 / 35

Examples of Noisy Communication Channels

Other examples of noisy channels:

A radio communication link

VDSL NBN connection

The complete link from camera, through editing, Netflix, the rest of the
internet, VDSL link to home, wifi to TV screen

Reproducing cells

A magnetic hard disk drive
I Channel does not need to involve physical movement

What would the other components be for each of these channels?

7 / 35

1 A General Communication System

2 The Role of Uncertainty

3 Basic Concepts In Probability

4 Relating Joint, Marginal and Conditional Probabilities

5 Wrapping Up

8 / 35

Uncertainty in Communication

We can not avoid uncertainty when
1 Dealing with noise (imperfections) in the channel

2 “Compressing” the messages (compare a high-resolution photograph
of a manuscript with the typed text that captures the essence; or a
transcript of a spoken utterance)

9 / 35

Channel Noise

A noisy channel introduces errors in sender’s message

Thus, receiver is uncertain that the message is what the sender intended

How to model, quantify, and mitigate this uncertainty?

10 / 35

Message Compression – I
Cover and Thomas, Example 1.1.2

Suppose we’d like to relay the outcome of an 8 horse race to a friend
I We wish to convey one of { A, B, . . ., H }

Suppose we encode the message as a binary string. A natural coding
scheme is

A → 000
B → 001
C → 010

H → 111

11 / 35

Message Compression – II
Cover and Thomas, Example 1.1.2

Now say the probabilities of the horses winning are
(1/2, 1/4, 1/8, 1/16, 1/64, 1/64, 1/64, 1/64)

Encoding messages based on their probability of the being chosen
will give shorter expected lengths:

A→ 0
B→ 10
C→ 110
D→ 1110
E→ 11110
F→ 111100
G→ 111101
H→ 111111

12 / 35

What is “Information”?

For noise correction and message compression, we will need to quantify
the information contained in a message

Roughly, “information” measures how much:

Unexpected data a message contains

The receiver’s uncertainty is reduced on seeing the message

13 / 35

The Case for Probability

We run into the notion of uncertainty when trying to pin down:
1 How to deal with channel noise

2 How to compress messages

3 What “information” means

To make progress, we need to formalise uncertainty

We will do this using probability theory
We now commence our review of probability; this will be hard going if you
have not met it before!

14 / 35

1 A General Communication System

2 The Role of Uncertainty

3 Basic Concepts In Probability

4 Relating Joint, Marginal and Conditional Probabilities

5 Wrapping Up

15 / 35

Probability: Example
Quantification and Manipulation of Uncertainty (Bishop, PRML, 2006)

1 Pick a box at random

2 From the selected box, pick a fruit (apple or orange) at random

3 Observe the fruit type and return it back to the original box

16 / 35

Probability: Example
Quantification and Manipulation of Uncertainty (Bishop, PRML, 2006)

1 Pick a box at random

2 From the selected box, pick a fruit (apple or orange) at random

3 Observe the fruit type and return it back to the original box

16 / 35

Probability: Example
Quantification and Manipulation of Uncertainty (Bishop, PRML, 2006)

1 Pick a box at random

2 From the selected box, pick a fruit (apple or orange) at random

3 Observe the fruit type and return it back to the original box

16 / 35

Probability: Example
Quantification and Manipulation of Uncertainty (Bishop, PRML, 2006)

1 Pick a box at random

2 From the selected box, pick a fruit (apple or orange) at random

3 Observe the fruit type and return it back to the original box

16 / 35

Probability: Example
Quantification and Manipulation of Uncertainty (Bishop, PRML, 2006) — Cont’d

Identity of the box is a random variable B ∈ {r , b}

Identity of the fruit is a random variable F ∈ {a, o}

Probability of an event: Proportion of times it happens out of a large
number of trials

17 / 35

Probability: Example
Quantification and Manipulation of Uncertainty (Bishop, PRML, 2006) — Cont’d

Identity of the box is a random variable B ∈ {r , b}

Identity of the fruit is a random variable F ∈ {a, o}

Probability of an event: Proportion of times it happens out of a large
number of trials

17 / 35

Probability: Example

Suppose we repeat the selection process many times, and ended up
picking up the blue box 60% of the time and the red box 40% of the time

p(B = r) = 0.4, p(B = b) = 0.6

18 / 35

Probability: Basic Properties

By definition, 0 ≤ p(B = b) ≤ 1 and 0 ≤ p(B = r) ≤ 1

Outcomes are mutually exclusive:

p(B = r ANDB = b) = p(B = r ,B = b)

= 0

Outcomes are jointly exhaustive:

p(B = r ORB = b) = p(B = r) + p(B = b)− p(B = r ANDB = b)
= p(B = r) + p(B = b)

= 1

19 / 35

Probability
What Types of Questions Can We Answer?

What is the probability of picking the red box, and an apple within that
box?

What is the (overall) probability of picking up an apple?

Given that we selected a red box, what is the probability of selecting
an apple?

We can answer these and more complex questions by using the rules of
probability.

20 / 35

Joint Probability

What is the probability of selecting the red box and selecting an apple?

Joint Probability of a Set of Events
The proportion of times these events happened together out of the total
number of trials.

If we repeated our experiment many (say N = 100) times, and in 10 of the
trials we saw B = r and F = a, then we may estimate

p(B = r ANDF = a) = p(B = r ,F = a)

=
10

100
= 0.1

21 / 35

Joint Probability

What is the probability of selecting the red box and selecting an apple?

Joint Probability of a Set of Events
The proportion of times these events happened together out of the total
number of trials.

If we repeated our experiment many (say N = 100) times, and in 10 of the
trials we saw B = r and F = a, then we may estimate

p(B = r ANDF = a) = p(B = r ,F = a)

=
10

100
= 0.1

21 / 35

Joint Probability

What is the probability of selecting the red box and selecting an apple?

Joint Probability of a Set of Events
The proportion of times these events happened together out of the total
number of trials.

If we repeated our experiment many (say N = 100) times, and in 10 of the
trials we saw B = r and F = a, then we may estimate

p(B = r ANDF = a) = p(B = r ,F = a)

=
10
100

= 0.1

21 / 35

Marginal Probability

What is the probability of an apple being picked, regardless of the box we
selected?

Marginal Probability of an Event
The proportion of times that this event happened out of the total number of
trials.

Remember that we selected a red box and an apple in 10 out of 100 trials

Say that in 45 of the trials, we selected a blue box and an apple

So, irrespective of B, an apple was selected 45 + 10 = 55 times, and:

p(F = a) =
55

100
=

11
20

22 / 35

Marginal Probability

What is the probability of an apple being picked, regardless of the box we
selected?

Marginal Probability of an Event
The proportion of times that this event happened out of the total number of
trials.

Remember that we selected a red box and an apple in 10 out of 100 trials

Say that in 45 of the trials, we selected a blue box and an apple

So, irrespective of B, an apple was selected 45 + 10 = 55 times, and:

p(F = a) =
55

100
=

11
20

22 / 35

Marginal Probability

What is the probability of an apple being picked, regardless of the box we
selected?

Marginal Probability of an Event
The proportion of times that this event happened out of the total number of
trials.

Remember that we selected a red box and an apple in 10 out of 100 trials

Say that in 45 of the trials, we selected a blue box and an apple

So, irrespective of B, an apple was selected 45 + 10 = 55 times, and:

p(F = a) =
55
100

=
11
20

22 / 35

Conditional Probability

What is the probability of an apple being picked up, given that a red box
was selected?

Conditional Probability of an Event
The conditional probability of an event X with respect to an event Y is the
proportion of times that X happens out of the times that Y happens.

The trials where we selected a blue box are irrelevant, whether or not an
apple was selected

We selected a red box and an apple 10 out of 100 times

We selected a red box (regardless of the fruit) 40 out of 100 times

p(F = a GIVENB = r) = p(F = a|B = r) =
10
40

=
1
4

Can we write this in terms of the joint and marginal probabilities?

23 / 35

Conditional Probability

What is the probability of an apple being picked up, given that a red box
was selected?

Conditional Probability of an Event
The conditional probability of an event X with respect to an event Y is the
proportion of times that X happens out of the times that Y happens.

The trials where we selected a blue box are irrelevant, whether or not an
apple was selected

We selected a red box and an apple 10 out of 100 times

We selected a red box (regardless of the fruit) 40 out of 100 times

p(F = a GIVENB = r) = p(F = a|B = r) =
10
40

=
1
4

Can we write this in terms of the joint and marginal probabilities?

23 / 35

Conditional Probability

What is the probability of an apple being picked up, given that a red box
was selected?

Conditional Probability of an Event
The conditional probability of an event X with respect to an event Y is the
proportion of times that X happens out of the times that Y happens.

The trials where we selected a blue box are irrelevant, whether or not an
apple was selected

We selected a red box and an apple 10 out of 100 times

We selected a red box (regardless of the fruit) 40 out of 100 times

p(F = a GIVENB = r) = p(F = a|B = r) =
10
40

=
1
4

Can we write this in terms of the joint and marginal probabilities?

23 / 35

Conditional Probability

What is the probability of an apple being picked up, given that a red box
was selected?

Conditional Probability of an Event
The conditional probability of an event X with respect to an event Y is the
proportion of times that X happens out of the times that Y happens.

The trials where we selected a blue box are irrelevant, whether or not an
apple was selected

We selected a red box and an apple 10 out of 100 times

We selected a red box (regardless of the fruit) 40 out of 100 times

p(F = a GIVENB = r) = p(F = a|B = r) =
10
40

=
1
4

Can we write this in terms of the joint and marginal probabilities?
23 / 35

1 A General Communication System

2 The Role of Uncertainty

3 Basic Concepts In Probability

4 Relating Joint, Marginal and Conditional Probabilities

5 Wrapping Up

24 / 35

Joint, Marginal and Conditional Probabilities:
A More General Formulation (1)

Consider the more general case of two random variables:

X ∈ {x1, x2, . . . , xM} and Y ∈ {y1, y2, . . . , yL}

N : Total number of trials

nij : ](X = xi ,Y = yj) = ] of times that xi and yi happen
ci : ](X = xi) =


j nij = ] of times that xi happens

rj : ](Y = yj) =

i nij = ] of times that yj happens

25 / 35

Joint, Marginal and Conditional Probabilities:
A More General Formulation (1)

Consider the more general case of two random variables:

X ∈ {x1, x2, . . . , xM} and Y ∈ {y1, y2, . . . , yL}

N : Total number of trials
nij : ](X = xi ,Y = yj) = ] of times that xi and yi happen

ci : ](X = xi) =

j nij = ] of times that xi happens
rj : ](Y = yj) =


i nij = ] of times that yj happens

25 / 35

Joint, Marginal and Conditional Probabilities:
A More General Formulation (1)

Consider the more general case of two random variables:

X ∈ {x1, x2, . . . , xM} and Y ∈ {y1, y2, . . . , yL}

N : Total number of trials
nij : ](X = xi ,Y = yj) = ] of times that xi and yi happen
ci : ](X = xi) =


j nij = ] of times that xi happens

rj : ](Y = yj) =

i nij = ] of times that yj happens

25 / 35

Joint, Marginal and Conditional Probabilities:
A More General Formulation (1)

Consider the more general case of two random variables:

X ∈ {x1, x2, . . . , xM} and Y ∈ {y1, y2, . . . , yL}

N : Total number of trials
nij : ](X = xi ,Y = yj) = ] of times that xi and yi happen
ci : ](X = xi) =


j nij = ] of times that xi happens

rj : ](Y = yj) =

i nij = ] of times that yj happens

25 / 35

Joint, Marginal and Conditional Probabilities:
A More General Formulation (2)

By definition:

p(X = xi ,Y = yj) =
nij
N

(Joint)

p(X = xi) =
ci
N

(Marginal)

p(Y = yj) =
rj
N

(Marginal)

p(Y = yj |X = xi) =
nij
ci

(Conditional)

26 / 35

Joint, Marginal and Conditional Probabilities:
A More General Formulation (1)

Bins and fruit example:

Orange Apple
Blue 15 45
Red 30 10

Verify the computations from previous section with this table

27 / 35

Joint, Marginal and Conditional Probabilities:
A More General Formulation (3)

Observe:

p(X = xi) =


j nij
N

=

j

p(X = xi ,Y = yj)

p(Y = yj |X = xi) =
nij
ci

=
nij
N

/ci
N

= p(X = xi ,Y = yj)/p(X = xi)

28 / 35

The Rules of Probability

Sum Rule / Marginalization :

p(X = xi) =

j

p(X = xi ,Y = yj)

Product Rule :
p(X = xi ,Y = yj) = p(Y = yj |X = xi)p(X = xi)

and by symmetry:

P(Y = yj ,X = xi) = p(X = xi |Y = yj)p(Y = yj)

Therefore:

P(X = xi) =

j

P(X = xi ,Y = yj) =

j

P(X = xi |Y = yj)P(Y = yj)

29 / 35

The Rules of Probability

Sum Rule / Marginalization :

p(X = xi) =

j

p(X = xi ,Y = yj)

Product Rule :
p(X = xi ,Y = yj) = p(Y = yj |X = xi)p(X = xi)

and by symmetry:

P(Y = yj ,X = xi) = p(X = xi |Y = yj)p(Y = yj)

Therefore:

P(X = xi) =

j

P(X = xi ,Y = yj) =

j

P(X = xi |Y = yj)P(Y = yj)

29 / 35

The Rules of Probability

Sum Rule / Marginalization :

p(X = xi) =

j

p(X = xi ,Y = yj)

Product Rule :
p(X = xi ,Y = yj) = p(Y = yj |X = xi)p(X = xi)

and by symmetry:

P(Y = yj ,X = xi) = p(X = xi |Y = yj)p(Y = yj)

Therefore:

P(X = xi) =

j

P(X = xi ,Y = yj) =

j

P(X = xi |Y = yj)P(Y = yj)

29 / 35

The Rules of Probability

Sum Rule / Marginalization :

p(X = xi) =

j

p(X = xi ,Y = yj)

Product Rule :
p(X = xi ,Y = yj) = p(Y = yj |X = xi)p(X = xi)

and by symmetry:

P(Y = yj ,X = xi) = p(X = xi |Y = yj)p(Y = yj)

Therefore:

P(X = xi) =

j

P(X = xi ,Y = yj) =

j

P(X = xi |Y = yj)P(Y = yj)

29 / 35

An Illustration of a Distribution over Two Variables

joint

marginal

marginal conditional

30 / 35

An Illustration of a Distribution over Two Variables

joint marginal

marginal conditional

30 / 35

An Illustration of a Distribution over Two Variables

joint marginal

marginal

conditional

30 / 35

An Illustration of a Distribution over Two Variables

joint marginal

marginal conditional
30 / 35

Joint, Marginal and Conditional Probabilities:
An even More General Formulation

Given D random variables X1, . . . ,XD:

p(X1, . . . ,Xi−1,Xi+1, . . . ,XD) =

Xi

p(X1, . . . ,XD)

Chain Rule: We can also express:

p(X1,X2) = p(X1)p(X2|X1) What are we using here?
p(X1,X2,X3) = p(X1,X2)p(X3|X1,X2) = p(X1)p(X2|X1)p(X3|X1,X2)

p(X1, . . . ,XD) = p(X1)p(X2|X1)p(X3|X2,X1) . . . p(XD|X1, . . . ,XD−1)

31 / 35

Joint, Marginal and Conditional Probabilities:
An even More General Formulation

Given D random variables X1, . . . ,XD:

p(X1, . . . ,Xi−1,Xi+1, . . . ,XD) =

Xi

p(X1, . . . ,XD)

Chain Rule: We can also express:

p(X1,X2) = p(X1)p(X2|X1) What are we using here?

p(X1,X2,X3) = p(X1,X2)p(X3|X1,X2) = p(X1)p(X2|X1)p(X3|X1,X2)
p(X1, . . . ,XD) = p(X1)p(X2|X1)p(X3|X2,X1) . . . p(XD|X1, . . . ,XD−1)

31 / 35

Joint, Marginal and Conditional Probabilities:
An even More General Formulation

Given D random variables X1, . . . ,XD:

p(X1, . . . ,Xi−1,Xi+1, . . . ,XD) =

Xi

p(X1, . . . ,XD)

Chain Rule: We can also express:

p(X1,X2) = p(X1)p(X2|X1) What are we using here?
p(X1,X2,X3) = p(X1,X2)p(X3|X1,X2) = p(X1)p(X2|X1)p(X3|X1,X2)

p(X1, . . . ,XD) = p(X1)p(X2|X1)p(X3|X2,X1) . . . p(XD|X1, . . . ,XD−1)

31 / 35

Joint, Marginal and Conditional Probabilities:
An even More General Formulation

Given D random variables X1, . . . ,XD:

p(X1, . . . ,Xi−1,Xi+1, . . . ,XD) =

Xi

p(X1, . . . ,XD)

Chain Rule: We can also express:

p(X1,X2) = p(X1)p(X2|X1) What are we using here?
p(X1,X2,X3) = p(X1,X2)p(X3|X1,X2) = p(X1)p(X2|X1)p(X3|X1,X2)

p(X1, . . . ,XD) = p(X1)p(X2|X1)p(X3|X2,X1) . . . p(XD|X1, . . . ,XD−1)

31 / 35

1 A General Communication System

2 The Role of Uncertainty

3 Basic Concepts In Probability

4 Relating Joint, Marginal and Conditional Probabilities

5 Wrapping Up

32 / 35

Summary

General architecture for communication systems

Why we need probability

Probability theory: joint, marginal and conditional distribution

Reading: Mackay § 2.1 and § 2.2; Bishop § 1.2

33 / 35

Exercise
Coming Back to our Original Example

Given: p(B = r) = 0.4, p(B = b) = 0.6
Assume the fruit are selected uniformly from each box

Write down all conditional probabilities P(F |B)
Evaluate the overall probabilities P(F )

34 / 35

Next time

More on joint, marginal and conditional distributions

When can we say that X ,Y do not influence each other?

What, if anything, does p(X = x |Y = y) tell us about
p(Y = y |X = x)?

35 / 35

Next time

Sign-up for tutorials open at 9am wednesday 25 July. Will offer three
tutorials. If we need more, I will add.

Class rep.

36 / 35

A General Communication System
The Role of Uncertainty
Basic Concepts In Probability
Relating Joint, Marginal and Conditional Probabilities
Wrapping Up