17crypto_L02
Information Theory
Crpto & SecDev 2017 © Ron Poet:
Lecture 2
1
Entropy
� The entropy of a set of messages is the number of bits
needed to encode all possible messages with an optimal
encoding.
� Let X = {Xi} be the set of all possible messages, with
message X occurring with probability p(X ).
Crpto & SecDev 2017 © Ron Poet:
Lecture 2
2
i
message Xi occurring with probability p(Xi).
� The entropy H is defined as
�H(X) = Σ p(Xi) log2(1/p(Xi))
� Σ means sum.
A Note On Logs
� The log function is the inverse of the power function.
� If a = 2b then b = log2(a).
� Special values
�If b = 1 then a = 2, so log (2) = 1.
Crpto & SecDev 2017 © Ron Poet:
Lecture 2
3
�If b = 1 then a = 2, so log2(2) = 1.
�If b = 0 then a = 1, so log2(1) = 0.
� Note: abac = ab+c
� Note: (ab)c = abc
� Note: 1/ab = a-b
� Logs base 2 are convenient for computing.
Entropy Example
� Consider the Gender field in a database.
�The two possible values, MALE and FEMALE have
equal probability.
� p1 = p2 = 1/2
Crpto & SecDev 2017 © Ron Poet:
Lecture 2
4
� p1 = p2 = 1/2
� H = 1/2 log2(2) + 1/2 log2(2)
� = 1/2 + 1/2
� = 1
� Therefore 1 bit is needed to encode this information.
Non Equal Probabilities
� Now change the probabilities
�p1 = 7/8
�p2 = 1/8
� H = 0.5410
Crpto & SecDev 2017 © Ron Poet:
Lecture 2
5
� H = 0.5410
� Fewer bits are needed when one value is more likely than
the other.
� We could group the values in 10’s and use 6 bits per
group.
Null Values Allowed
� Now assume that the database allows NULL values for the
Gender field.
�The gender information is not provided in half of the
entries.
Crpto & SecDev 2017 © Ron Poet:
Lecture 2
6
�p1 = p2 = 1/4
�p3 = 1/2
� H = 1/4 log2(4) + 1/4 log2(4) + 1/2 log2(2)
� = 1/2 + 1/2 + 1/2
� = 1.5
Another Example
� There are n messages, all equally likely.
�pn = 1/n
� H = n (1/n log2(n)) = log2(n)
� log (n) is the number of bits in the binary representation
Crpto & SecDev 2017 © Ron Poet:
Lecture 2
7
� log2(n) is the number of bits in the binary representation
of n.
The Redundancy of a Language
� All ways of encoding data have some structure which
makes them different from purely random collections.
� English is highly structured, with a lot of redundancy.
� Compressing a file of English text reduces the structure,
Crpto & SecDev 2017 © Ron Poet:
Lecture 2
8
� Compressing a file of English text reduces the structure,
but there is still some structure in the compression
encoding table.
� Information theory provides quantitative estimates of the
amount of redundancy.
� Code breaking attacks that exploit this redundancy are
called statistical attacks.
The Actual Rate of a Language
� The actual rate of a language (r) is defined as the average
number of bits of information per character in the
language.
�It is essentially the entropy per character.
Crpto & SecDev 2017 © Ron Poet:
Lecture 2
9
� We can calculate it by defining the rate per character for all
N character strings rN and making N large.
� We can define it in terms of entropy as:
� rN = H(X) / N
�where X is the set of all messages of length N using
characters in the language.
The Actual Rate of a Language (2)
� As N increased, the rate decreases because there are fewer
choices and certain choices are more likely.
�This decrease tapers off quickly to a constant value.
� In English, r is between 1.0 and 1.5 bits per character
Crpto & SecDev 2017 © Ron Poet:
Lecture 2
10
� In English, rN is between 1.0 and 1.5 bits per character
for large N.
� r is the value of rN when N is large.
The Absolute Rate of a Language
� The absolute rate (R) is defined as the maximum number
of bits of information that could be encoded by the
language, assuming all sequences of characters are equally
likely.
Crpto & SecDev 2017 © Ron Poet:
Lecture 2
11
likely.
� Let L be the number of characters in the alphabet.
� R = log2(L) from earlier.
�Note, this means L = 2R.
� In English, R = log2(26) = 4.7
The Redundancy of a Language
� The redundancy (D) of a language is defined as
�D = R – r
� In English, D = 3.7 .. 3.2. We will use 3.2.
� This corresponds to 68% redundancy.
Crpto & SecDev 2017 © Ron Poet:
Lecture 2
12
� This corresponds to 68% redundancy.
� The redundancy comes from
�Uneven single letter distribution;
�Uneven digram (2 letter) frequency;
�Uneven trigram (3 letter) frequency.
� By deleting vowels and double letters, mst ids cn b xprsd n
fwr ltrs, bt th xprnc s mst nplsnt.
Unicity Distance
� We are now in a position to ask an interesting question.
� If we decode some cipher text to produce plain text, how
can we tell whether we have succeeded!
� If the text reads as English rather than gibberish then we
Crpto & SecDev 2017 © Ron Poet:
Lecture 2
13
� If the text reads as English rather than gibberish then we
are part the way there.
� On the other hand, could we have produced a seemingly
meaningful message by accident?
Unicity Distance (2)
� Assume we have an alphabet of L symbols.
� The number of possible messages of length N is
�LN = (2R)N since L = 2R
�= 2RN
Crpto & SecDev 2017 © Ron Poet:
Lecture 2
14
�= 2RN
� The number of meaningful messages can be expressed in
terms of the rate for the language.
� 2rN
Unicity Distance (3)
� If we assume that all messages are equally likely, then the
probability of getting a meaningful message by chance is
number of apparently meaningful messages
Crpto & SecDev 2017 © Ron Poet:
Lecture 2
15
number of apparently meaningful messages
number of possible messages
= 2rN / 2RN
= 2(r-R)N
= 2-DN
The Keyspace
� Try all possible keys and keep those that look right.
�An exhaustive search or brute force attack.
� Let K be the set of all possible keys.
� Now the number of keys is 2H(K)
Crpto & SecDev 2017 © Ron Poet:
Lecture 2
16
� Now the number of keys is 2H(K)
�We must use the entropy of the keyspace, since the
chosen key values may exhibit some redundancy.
� So the expected number of wrong keys is
�= 2H(K) – 1, approximately 2H(K)
� Where we assume there are a lot of possible keys, all but
one of which is wrong.
The Keyspace (2)
� The expected number of false solution that look like they
might be right is
�number of wrong keys × probability of chance
meaningful message
Crpto & SecDev 2017 © Ron Poet:
Lecture 2
17
meaningful message
� = 2H(K) * 2-DN
� = 2H(K) – DN
� This is a rapidly decreasing function of N
� We can define the value of N at the crossover point when
the exponent = 0 as the unicity distance Nu.
� H(K) – D Nu = 0, so Nu = H(K) / D
Unicity Distance (4)
� If N > Nu then the chance of getting a false positive is
negligible since the number of correct looking false
messages is much less than 1.
� On the other hand, if N < Nu then many of the correct Crpto & SecDev 2017 © Ron Poet: Lecture 2 18 � On the other hand, if N < Nu then many of the correct looking messages will be false. � Nu is roughly the number of characters needed to unambiguously break the code and determine the key. DES Example � The DES algorithm has a key length of 56 bits. If the keys are chosen at random then the entropy H(K) is 56. � Assume an English language document is encrypted and that one byte is used for each letter. Crpto & SecDev 2017 © Ron Poet: Lecture 2 19 � Thus R = 8 and hence D = R – r = 6.5. � Hence H(K) / D = 56 / 6.5 = 8.6 � So we need to decode about 9 letters to make sure that a message that looks like English is the real message. Removing Redundancy � Reducing the value of D increased the unicity distance, and so makes it harder to break the code. �More characters are needed before an unambiguous solution is found. Crpto & SecDev 2017 © Ron Poet: Lecture 2 20 solution is found. � If the plain text were nearly random, and so D is very small, then the real message will be hard to tell it apart from actual random text. The Random Assumption � The preceding model assumes a random encryption system. � In particular, there is no relationship between: �similar keys and the cipher text Crpto & SecDev 2017 © Ron Poet: Lecture 2 21 �similar keys and the cipher text �similar plain text and the cipher text. � The unicity distance is less if the encryption is non random. Confusion and Diffusion � This random concept is usually expressed in the two terms: confusion and diffusion. � An encryption system has good confusion if changing one bit in the key changes roughly half the bits of the cipher text. Crpto & SecDev 2017 © Ron Poet: Lecture 2 22 text. � An encryption system has good diffusion if changing one bit in the plain text changes roughly half the bits of the cipher text. Usefulness of Unicity � The unicity distance does not provide a way of breaking the code. � It just indicates how much information we need to gather before we are able to attempt to break it. Crpto & SecDev 2017 © Ron Poet: Lecture 2 23 before we are able to attempt to break it. � It based on the rate for a language, a statistical property, which is less valid for very short character strings.