计算机代写 COMP2610 / COMP6261 - Information Theory Lecture 3: Probability Theory and

COMP2610 / COMP6261 – Information Theory Lecture 3: Probability Theory and Bayes’ Rule
U Logo Use Guidelines Robert C. Williamson
logo is a contemporary n of our heritage.
presents our name, ld and our motto:

earn the nature of things.
authenticity of our brand identity, there are n how our logo is used.
go – horizontal logo
go should be used on a white background. ludes black text with the crest in Deep Gold in
rinting is not available, the black logo can hite background.
e used white reversed out of a black occasionally a neutral dark background.
Research School of Computer Science
Preferred logo
Black version
July 30, 2018

A general communication system
Why do we need probability?
Basics of probability theory
Joint, marginal and conditional distributions

Review Exercise
Suppose I go through the records for N = 1000 students, checking their admission status, A = {0, 1}, and whether they are “brilliant” or not,
B = {0, 1}
(Aside: “Brilliance” is a dodgy concept, and does not predict scientific achievement as well as persistence and combinatorial ability; see e.g. , Scientific Genius: A Psychology of Science, Cambridge University Press, 2009; this is just a toy example!)
Say that the counts for admission and brilliance are
680 10 220 90
p(A = 1, B = 0) p(B = 1)
p(B = 1|A = 1) p(A = 0|B = 0)

More on joint, marginal and conditional distributions
When can we say that X , Y do not influence each other?
What, if anything, does p(X = x|Y = y) tell us about p(Y = y|X = x)?

More on joint, marginal and conditional distributions
When can we say that X , Y do not influence each other?
What, if anything, does p(X = x|Y = y) tell us about p(Y = y|X = x)?
Philosophically related to “How do we know / learn about the world?”

1 More on Joint, Marginal and Conditional Distributions
2 Statistical Independence
3 Bayes’ Theorem
4 Wrapping up

Document Modelling Example
Suppose we have a large document of English text, represented as a sequence of characters:
x1x2x3 …xN e.g. hello how are you
Treat each consecutive pair of characters as the outcome of “random variables” X , Y , i.e.
X = ‘h’, Y = ‘e’ X = ‘e’, Y = ‘l’ X = ‘l’, Y = ‘l’ .

􏰥 􏰬 􏰦 􏰭 􏰧 􏰮 􏰨 􏰯 􏰩 􏰰 􏰪 􏰟
􏰡􏰠􏰡􏰦􏰨􏰦 􏰡􏰠􏰡􏰢􏰣􏰩 􏰡􏰠􏰡􏰣􏰧􏰤 􏰡􏰠􏰡􏰣􏰩􏰦
􏰡􏰠􏰡􏰤􏰤􏰦 􏰳 􏰴
􏰡􏰠􏰡􏰣􏰤􏰦 􏰴 􏰵
􏰡􏰠􏰡􏰦􏰪􏰧 􏰵 􏰶
􏰡􏰠􏰡􏰧􏰩􏰪 􏰶 􏰚
􏰡􏰠􏰡􏰢􏰪􏰣 􏰚 􏰷
􏰡􏰠􏰡􏰡􏰡􏰩 􏰷 􏰸
􏰡􏰠􏰡􏰦􏰡􏰩 􏰸 􏰹
􏰡􏰠􏰡􏰦􏰧􏰨 􏰹 􏰺􏰻􏰼􏰽􏰾
􏰡􏰠􏰡􏰨􏰡􏰧 􏰺 􏰿
􏰡􏰠􏰡􏰤􏰤􏰥 􏰻 􏱀
􏰡􏰠􏰡􏰡􏰧􏰪 􏰼 􏰫
Document Modelling: Marginal and Joint Distributions
􏰢􏰡 􏰱 􏰢􏰢 􏰲 􏰢􏰣 􏰳 􏰢􏰤 􏰴 􏰢􏰥 􏰵 􏰢􏰦 􏰶 􏰢􏰧 􏰚 􏰢􏰨 􏰷 􏰢􏰩 􏰸 􏰢􏰪 􏰹 􏰣􏰡 􏰺 􏰣􏰢 􏰻 􏰣􏰣 􏰼 􏰣􏰤 􏰽 􏰣􏰥 􏰾 􏰣􏰦 􏰿 􏰣􏰧 􏱀 􏰣􏰨 􏰫
􏰬 􏰡􏰠􏰡􏰡􏰨􏰤 􏰾
􏰞􏰛 􏰬􏰭􏰮􏰯􏰰􏰟􏰱􏰲􏰳􏰴􏰵􏰶􏰚􏰷􏰸􏰹􏰺􏰻􏰼􏰽􏰾􏰿􏱀􏰫 􏰿
Unigram / Monogram
Marginal and joint distributions for English alphabet, estimated from the “FAQ manual for Linux”. Figure from Mackay (ITILA, 2003); areas of squares proportional to probability (the right way to do it!).

Document Modelling: Conditional Distributions
􏰬􏰭􏰮􏰯􏰰􏰟􏰱􏰲􏰳􏰴􏰵􏰶􏰚 􏰬􏰭􏰮􏰯􏰰􏰟􏰱􏰲􏰳􏰴􏰵􏰶􏰚
􏰷􏰸􏰹􏰺􏰻􏰼􏰽􏰾􏰿􏱀􏰫 􏰞 􏰛 􏰬 􏰭 􏰮 􏰯 􏰰 􏰟 􏰱 􏰲 􏰳 􏰴 􏰵 􏰶 􏰚 􏰷 􏰸 􏰹 􏰺 􏰻 􏰼 􏰽 􏰾 􏰿 􏱀 􏰫 􏰿 􏰷􏰸􏰹􏰺􏰻􏰼􏰽􏰾􏰿􏱀􏰫 􏰞 􏰛 􏰬 􏰭 􏰮 􏰯 􏰰 􏰟 􏰱 􏰲 􏰳 􏰴 􏰵 􏰶 􏰚 􏰷 􏰸 􏰹 􏰺 􏰻 􏰼 􏰽 􏰾 􏰿 􏱀 􏰫 􏰿 􏱁􏰞􏱂 􏱃 􏱁􏰿 􏰱 􏰾􏱂 􏱁􏰛􏱂 􏱃 􏱁􏰾 􏰱 􏰿􏱂
Conditional distributions for English alphabet, estimated from the “FAQ manual for Linux”. Are these distributions “symmetric”? Figure from Mackay (ITILA, 2003)
P(X =x|Y =y)=P(Y =y|X =x)?P(X =x|Y =y)=P(X =y|Y =x)?.

Recap: Sum and Product Rules
p(X =xi)=􏰄p(X =xi,Y =yj) j
Product rule:
p(X =xi,Y =yj)=p(Y =yj|X =xi)p(X =xi)

Relating the Marginal, Conditional and Joint
Suppose we knew p(X = x,Y = y) for all values of x,y. Could we computeallofp(X =x|Y =y),p(X =x)andp(Y =y)?

Relating the Marginal, Conditional and Joint
Suppose we knew p(X = x,Y = y) for all values of x,y. Could we computeallofp(X =x|Y =y),p(X =x)andp(Y =y)? Yes.

Relating the Marginal, Conditional and Joint
Suppose we knew p(X = x,Y = y) for all values of x,y. Could we computeallofp(X =x|Y =y),p(X =x)andp(Y =y)? Yes.
Now suppose we knew p(X = x) and p(Y = y) for all values of x,y. Couldwecomputep(X =x,Y =y)orp(X =x|Y =y)? No.
The difference in answers above is of great significance

Joint as the “Master” Distribution
In general, there can be many consistent joint distributions for a given set of marginal distributions
The joint distribution is the “master” source of information about the dependence

1 More on Joint, Marginal and Conditional Distributions
2 Statistical Independence
3 Bayes’ Theorem
4 Wrapping up

Recall: Fruit-Box Experiment

Statistical Independence
Suppose that both boxes (red and blue) contain the same proportion of apples and oranges.
If fruit is selected uniformly at random from each box:
p(F =a|B =r)=p(F =a|B =b) (=p(F =a)) p(F =o|B =r)=p(F =o|B =b) (=p(F =o))
The probability of selecting an apple (or an orange) is independent of the box that is chosen.
We may study the properties of F and B separately: this often simplifies analysis

Statistical Independence: Definition
Definition: Independent Variables
Two variables X and Y are statistically independent, denoted X ⊥ Y , if and only if their joint distribution factorizes into the product of their marginals:
X ⊥ Y ↔ p(X,Y) = p(X)p(Y) This definition generalises to more than two variables.

Statistical Independence: Definition
Definition: Independent Variables
Two variables X and Y are statistically independent, denoted X ⊥ Y , if and only if their joint distribution factorizes into the product of their marginals:
X ⊥ Y ↔ p(X,Y) = p(X)p(Y)
This definition generalises to more than two variables.
Are the variables in the language example statistically independent?

A Note on Notation
When we write
p(X,Y) = p(X)p(Y)
we have not specified the outcomes of X , Y explicitly
This statement is a shorthand for
p(X =x,Y =y)=p(X =x)p(Y =y) for every possible x and y
This notation is sometimes called implied universality

Conditional independence
We may also consider random variables that are conditionally independent
given some other variable
Definition: Conditionally Independent Variables
Two variables X and Y are conditionally independent given Z , denoted X ⊥ Y|Z, if and only if
p(X,Y|Z) = p(X|Z)p(Y|Z) Intuitively, Z is a common cause for X and Y
Example: X = whether I have a cold Y = whether I have a headache
Z = whether I have the flu

1 More on Joint, Marginal and Conditional Distributions
2 Statistical Independence
3 Bayes’ Theorem
4 Wrapping up

Revisiting the Product Rule
The product rule tells us:
p(X,Y) = p(Y|X)p(X)
This can equivalently be interpreted as a definition of conditional
probability:
p(Y|X) = p(X,Y) p(X)
Can we use these to relate p(X |Y ) and p(Y |X )?

Posterior Inference: Example 1 (Mackay, 2003)
Dicksy Sick had a test for a rare disease
􏰜 Only 1% people of Dicksy’s background have the disease

Posterior Inference: Example 1 (Mackay, 2003)
Dicksy Sick had a test for a rare disease
􏰜 Only 1% people of Dicksy’s background have the disease
The test simply classifies a person as having the disease, or not
The test is reliable, but not infallible
􏰜 It correctly identifies a sick individual 95% of the time
p(identifies sick | sick) = 95%.
􏰜 It correctly identifies a healthy individual 96% of the time p(identifies healthy | healthy) = 96%.

Posterior Inference: Example 1: Formalization
Let D ∈ {0, 1} denote whether Dicksy has the disease, and T ∈ {0, 1} the outcome of the test:

Posterior Inference: Example 1: Formalization
Let D ∈ {0, 1} denote whether Dicksy has the disease, and T ∈ {0, 1} the outcome of the test:
p(D =1)=0.01 p(T =1|D =1)=0.95 p(T =0|D =1)=0.05
p(D =0)=0.99 p(T =1|D =0)=0.04 p(T =0|D =0)=0.96
We need to compute p(D = 1|T = 1), the probability of Dicksy having the disease given that the test has resulted positive.

Posterior Inference: Example 1: Solution
p(D = 1, T = 1)
p(D = 1|T = 1) = p(T = 1) Def. conditional prob.

Posterior Inference: Example 1: Solution
p(D = 1|T = 1) = =
p(D = 1, T = 1) p(T = 1) p(T = 1, D = 1) p(T = 1)
Def. conditional prob.

Posterior Inference: Example 1: Solution
p(D = 1|T = 1) = = =
p(D = 1, T = 1) p(T = 1) p(T = 1, D = 1) p(T = 1)
Def. conditional prob.
p(T =1|D =1)p(D =1)
p(T = 1) Product rule

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts