Computational Complexity and Computability
Lecture 9 – Information & Kolmogorov Complexity
Koushik Pal
University of Toronto
February 8, 2021
Information
Two fundamental concepts in computer science: Algorithm
Information
Unlike algorithms, there is no universally accepted comprehensive definition for information.
One definition of information is via computability theory.
Information
Question: Can we quantify how much information is contained in a string?
Example
x = y = z =
Idea: The more we can “compress” a string, the less “information” it contains.
Thesis: The amount of information in a string is equivalent to the shortest way of describing that string.
Information
Question: How to describe strings? Answer: Use Turing machines with inputs.
To be more specific, describe a string x as ⟨M,w⟩ such that M is a TM that, on input w, halts with only x on its tape.
Kolmogorov Complexity
Definition
Let x ∈ {,}∗.
The minimal description of x, denoted d(x), is the lexicographically shortest string ⟨M,w⟩ such that M is a TM that on input w halts with only x on its tape.
The descriptive complexity (also known as Kolmogorov complexity) of x, denoted K(x), is the length of the minimal description of x, i.e.,
K(x) = |d(x)|.
Encoding
Problem: How to figure out where M ends and w starts in the encoding ⟨M, w⟩?
There are many possible solutions. Here are two examples. Assume the alphabet is {, }.
Write each bit of ⟨M⟩ twice, i.e., as and as , and use as a delimiter between ⟨M⟩ and w; e.g.
⟨M, w⟩ = · · · .
⟨M⟩
In this case, |⟨M, w⟩| = |⟨M⟩| + |w| + .
w
If ⟨M⟩ = zz …zk ∈ {,}∗ and w = ww …wn ∈ {0,1}∗, let
⟨M,w⟩ = zz…zkww …wn. In this case, |⟨M, w⟩| = |⟨M⟩| + |w| + .
Properties of Kolmogorov Complexity
Property 1
The amount of “information” in x isn’t much more than |x|. Theorem
There is a constant c such that for all x ∈ {, }∗, K(x) ≤ |x| + c.
Proof.
Define the TM Mid, which on any input w, immediately halts, thereby leaving w on the tape.
Clearly, ⟨Mid,x⟩ is a description of x, and hence
K(x) ≤ |⟨Mid, x⟩| = |⟨Mid⟩| + |x| + = |x| + c,
where c = |⟨Mid⟩| + .
Properties of Kolmogorov Complexity
Property 2
The amount of “information” in xx isn’t much more than in x, i.e., repetitive strings have low information.
Theorem
There is a constant c such that for all x ∈ {, }∗, K(xx) ≤ K(x) + c.
Proof.
Define a TM N as follows: On input ⟨M, w⟩:
Simulate M on w; let s be the result. Output ss.
Let ⟨M, w⟩ be the minimal description of x. Then ⟨N, ⟨M, w⟩⟩ is a description of xx. Hence,
K(xx) ≤ |⟨N, ⟨M, w⟩⟩| = |⟨N⟩| + K(x) + = K(x) + c, where c = |⟨N⟩| + .
Properties of Kolmogorov Complexity Corollary
There is a constant c such that for all n ≥ and x ∈ {,}∗, K(xn) ≤ K(x) + c log n.
In particular, K(()n) = O(log n).
Proof.
Define a TM T as follows: On input ⟨n, ⟨M, w⟩⟩:
Simulate M on w; let s be the result. Print s for n times.
Let ⟨M, w⟩ be the minimal description of x. Then ⟨T, ⟨n, ⟨M, w⟩⟩⟩ is a description of xn. Hence,
K(xn)≤|⟨T,⟨n,⟨M,w⟩⟩⟩| ≤ |⟨T⟩|+⌈logn⌉+K(x)+ ≤ K(x) + O(log n).
Does the model matter?
Question: TMs are programming languages. If we used another programming language, could we get significantly shorter descriptions?
Answer: No!
Does the model matter?
Definition
An interpreter is a semi-computable function p : {, }∗ → {, }∗ (which takes programs as inputs and prints their outputs).
The minimal description of x under p, denoted dp(x), is the lexicographically shortest string s for which p(s) = x. (For example, dPython(x) is the shortest binary encoding of a Python program that outputs x.)
Finally, define Kp(x) = |dp(x)| as the descriptive complexity of x under p.
Does the model matter?
Theorem
For every interpreter p, there is a constant c such that for all x ∈ {,}∗,
K(x) ≤ Kp(x) + c.
(In other words, using any other programming language would only
change K(x) by some constant.) Proof.
Define the TM Mp, which on any input w, outputs p(w).
Then ⟨Mp,dp(x)⟩ is a description of x. Hence,
K(x) ≤ |⟨Mp, dp(x)⟩| = |⟨Mp⟩| + Kp(x) + = Kp(x) + c,
where c = |⟨Mp⟩| + .