COMPUT ING T HEORY
COSC1107/1105: A Computable Odyssey Course Material
Semester 2, 2020
Cryptography
Complexity
Grammars
Automata
The persons mainly responsible for assembling this material are and .
However, there are a number of people who have contributed to the preparation of this material, including , , , , , , , , , , Lavindra de Silva, , , Lai, , , and .
Since 2012, this notes uses tikz LaTex drawing package (www.texample.net/tikz/). Please send any corrections or comments to .
Last Edited: Tuesday 3rd August, 2021 14:41
Table of Contents
InitialQuestions………………………………. ii Introduction………………………………… iii Tutorials………………………………….. vii BackgroundMaterial ……………………………. viii RandomNumberGeneration ………………………… x
Lecture Notes 1 Section1:Complexity …………………………… 1 Section2:Cryptography ………………………….. 5 Section3:GrammarsandParsing………………………. 9 Section4:FiniteStateMachines ………………………. 23 Section5:TuringMachines…………………………. 32 Section6:Computability ………………………….. 45 Section7:Complexity(revisited)………………………. 51 Section8:Cryptography(revisited)……………………… 58 Section9:Grammars(revisited) ………………………. 61 Section10:Automata(revisited) ………………………. 64 Section11:ChomskyHierarchy ………………………. 74 Section12:Extensiontopics ………………………… 79 TilesRevisited……………………………….. 83 AndFinally………………………………….. 83
i
Some Initial Questions
Below are three problems, all to do with tiles, which will hopefully set the scene for this subject. How would you go about attacking these problems?
Tiling Problem 1:
Given a set of tile designs T , is it possible to cover any rectangular area using only designs from T ? Rotations are not allowed, and the patterns on each of the four sides of a tile must match the neighbouring tiles.
Informally, can we tile any bathroom this way? (including some enormous examples as the bathrooms at Versailles, or the , or …)
Tiling Problem 2:
Given a set of tile designs T and a bathroom wall of size s × t, is it possible to cover the wall using only designs from T ? Rotations are not allowed, and the patterns on each of the four sides of a tile must match the neighbouring tiles.
Informally, can we tile my bathroom this way? This is clearly a more specific question that the earlier version.
Tiling Problem 3:
Given a set of tile designs T and a bathroom wall of size s × t, is it possible to cover the wall using only designs from T ? Rotations are allowed, but the patterns on each of the four sides of a tile must still match the neighbouring tiles.
Informally, can we tile my bathroom by turning the tiles around, if necessary?
ii
Introduction
There are a number of topics, but there are five main ones: • Complexity.
• Cryptography.1 • Grammars.
• Automata.
• Computability.
Complexity is concerned with measuring the cost of computations. There are many possible aspects to this; however, we concentrate on maximum execution time. Even then, it is not hard to find problems which cannot be solved in any reasonable time, no matter how fast hardware improves.
Cryptography is concerned with encrypting information so that no unauthorised person can read it. The main technical issue is how we can be sure of this; the answer is to rely on problems which no-one can solve in any reasonable time.
Grammars are concerned with the specifications of syntax rules — in particular, how do you indicate to a machine what programming language constructs are legal and which are not? Automata are concerned with the modelling of computation, and in particular how we can reason about computation. In particualr, the key role of memory is emphasised in three dif- ferent approaches to memory, being none (finite state machines), a single stack (a pushdown
automaton) and a semi-infinite tape (a Turing machine).
Computability is concerned with the fundamental limits of computation (in contrast with the
resources view used in complexity). In particular, it may be surprising to learn that there are some problems for which it can be proved that computers (at least in their current conception) cannot solve.
As may be noticed, there is an inter-dependence between these topics.
Also, observe that the order of the topics in this course notes may not correspond to the order delivered during the weeks. However, it should be straightforward to find the relevant section for each week.
1No longer covered in the course.
iii
Cryptography
Complexity
Computability
Grammars
Automata
This picture depicts the general topics touched during the course. Notice, however, that in the current course we do NOT include Cryptography. Nonetheless, the section on Cryptog- raphy was left in the course notes for interest of the reader and as an example of when difficult computational problems are very useful.
We will mostly use a sequence compatible with several textbooks, starting with Automata and finishing with Complexity.
Specific Skills Covered
• analyze algorithmic complexity • read and write automata
• NFA → DFA conversion
• read and write grammars
• parsing and derivations
Concepts
• decidability
• tractability and O(n) notation
• intractability and NP-completeness • grammar and language classes
• automata relationships
• grammar relationships
Motivating Questions
• How can we reason about programs?
• What is possible and impossible?
• How can we tell the difference?
• How do we show that some things are impossible?
• Can we ever say “Never”? “Always”?
(e.g. How do we reason about cryptographic attacks?)
• What precisely are programs anyway?
• How can we be precise about complexity costs?
iv
“Put the right kind of software into a computer, and it will do whatever you want it to. There may be limits on what you can do with the machines them-
selves, but there are no limits on what you can do with software.” April, 1984.
• What limits are there to computation? • How do we establish them?
• Can we do anything about such limits? • What is an algorithm?
• Do algorithms always exist?
• What can and cannot be computed? • What can be feasibly computed?
– Time,
We can establish claims about “exists”, “sometimes”, “possible”, etc. by construction, such as “there is at least one bearded lecturer at RMIT”.
It is generally harder to establish claims about “all”, “always”, “impossible”, etc., such as “Humans will never reach the moon”.
“Never” is a very long word! Or, as Treebeard the Ent puts it “Never is too long a word, even for me”. 2
Overview Problem
The problem described below is intended as an overview of the material studied in this subject. You are not required to provide a solution to it; it is intended as a means of showing how the material fits into the “big picture”. In particular, you will not be required to know any of the details of quantum computing.
Problem Statement
You are employed as an IT professional at a company called Incandescent Technolo- gies. One day your boss comes into your office and says: “I hear that there is a new kind of computer called a quantum computer, which can apparently perform some computations much faster than those we have now. No such machines are currently commercially available, but we need to be prepared for the possibility that they will be in the near future. I want you to investigate this topic and prepare a comprehensive report on it, specifically addressing the points below:
1. Will the security of our systems be affected? Could some hacker utilise a quan- tum computer to break into our systems?
2. Can we harness the power of quantum computers to make our compilers run faster?
3. Are the benefits of quantum computing limited to faster processing alone? Or can we now solve problems which were unsolvable before?
Please, have your report on my desk by the end of the semester.”
2J.R.R. Tolkien, ‘The Lord of the Rings’, p.1016, Harper-Collins, 1991.
v
Addressing this problem
In order to tackle this problem (which is (deliberately 🙂 very broad), you may find the following questions useful.
1. (a) How do modern-day cryptographic systems work?
(b) What relationship is there between high complexity problems and cryptography? (c) How are factorisation and an algorithm for finding primes linked to cryptography? (d) How do we estimate the execution time of a program?
(e) What are intractable problems and why are they important?
(f) How much faster hardware do we need to solve problems with exponential com- plexity?
(g) What approaches can we use to deal with difficult problems?
(h) What are NP-complete problems and why are they important?
2. (a) How does parsing work?
(b) What are grammars used for?
(c) What are some different classes of grammars and how do they differ? (d) What can regular expressions express?
(e) What are finite state machines and how do they work?
(f) what are the differences between deterministic and non-deterministic finite state automata?
(g) What are pushdown automata and how do they work?
(h) What are turing machines and how do they work?
3. (a) Can we always find an algorithm to solve a problem?
(b) What are the relationships between different machines and grammars?
(c) WhatformalmodelsofcomputationaretherewhicharemorepowerfulthanTuring
machines?
(d) What is the most powerful kind of Turing machine?
(e) Can we show without any doubt whether all problems could or couldn’t be solved
by computers? How?
Questions such as these will be discussed at lectures and answered by the problems set for
each week of tutorials.
vi
Tutorials
Whereas lectures are meant to give the “big picture” of each aspect of the course, tutorials are meant to deal with the actual technical details. Both lectures and tutorials are important, but tutorials are more personal, and a tutor will expect students to attend them unless there is a very good reason not to. It is extremely difficult, if not impossible, to get the technical training required for passing the course without attending tutorials.
During tutorials you will be working in learning groups of around 2-4 people. Your tutor will spend time with each group, assisting as necessary to ensure that you are on the right track. You should discuss and work actively together to ensure that everyone fully understands the subject material. In some tutorials you will have specific exercises to do — these are designed to assist you in mastering the various skills and concepts. Tutorial time can also be spent discussing assignments as well as material from lectures or the textbook. In the last part of each tutorial, tutors will go over some questions of the previous tutorial in the whiteboard, so make sure to attend and stay around until the end!
During each week you should:
• Attendlecturestogetthegeneralbigpictureofthematerial.Sometimeslecturesprovide
essential core information not covered in tutorials.
• Attend tutorials to dive into the technical details and get the necessary training.
• Aftertutorials,completebyyourselftheremainingquestionsthatwereleftincomplete
in the tutorial. You may want to collaborate with other fellow students. However, you are meant to develop yourself the solutions. Just knowing the solution (e.g., because another student explained it to us) means very little. Indeed, a moderate form of collaboration is encouraged as a useful part of any educational process. Nevertheless, students are expected to do their own thinking and writing. Never copy another student’s work, even if the other student “explains it to you first.” Never give your written work to others. Do not work together to form a collective solution, from which the members of the group copy out the final solution. Rather, walk away and recreate your own solution later.
• Usetheonlineforumtohelpcompleteeachtutorialaswellasclarifyingaspectscovered in the lecture that you have not fully grasped.
• Ask yourself: would I be able to solve similar exercise completely by myself?
After each week’s tutorial, it is a very good idea to write out the complete solutions to the problems set that week, particularly as for some tutorials your group may not complete all the problems in class. This will help you gather the content of the course up to the necessary details. This means that you can commence the following week’s class with a brief discussion of the solutions, possibly by asking your tutor for feedback on your written work. Such regular clarification of your own understanding of the topic is the best form of preparation for the final
exam.
vii
Background Material
The material in this section is not examinable. However, you should make yourself familiar with this material, and it will be useful for reference.
Mathematical Formalism
To answer such questions, we use mathematical formalisms to describe computation.
• We can reason about the program and its behaviour.
• We can estimate the cost of the program.
• We can be rigorous and precise.
• We can (sometimes) establish “negative” results, e.g. “There is no algorithm to solve problem X”.
• For this subject, mathematics is generally “read-only”.
Mathematical Language
Definition A precise mathematical statement of what we mean, such as “A prime number is a positive integer which has no factors other than itself and 1”.
Theorem/Proposition Apropertywhichwecanshowalwaysholdsandisinterestingbyitself, such as “All even numbers except 2 are not prime” or “There are an infinite number of primes”.
property that always holds and that is used as a stepping stone to a larger result rather than as a statement of interest by itself.
Corollary A property that always holds and is interesting by itself, but that can be readily deduced from another result, generally a Theorem or Proposition.
Proof The justification for our statement. This must cover all cases and leave no room for doubt. Techniques used here include proof by induction and proof by contradiction.
Mathematical Methodology
We will often use mathematical methods to state precise results, usually in the form of a theo- rem and a proof. This requires us to
• write a precise summary,
• provide guaranteed behaviour (proof), • concentrate on important parts,
• prove it can be applied elsewhere.
Two common techniques used for proofs are proof by induction and proof by contradiction.
Mathematical Induction
Let P be a property defined on N. How can we show that ∀x ∈ N, P(x) is true? A generally applicable method is the principle of mathematical induction. If
1. P(0)istrue,and
2. if P(n) is true, then P(n + 1) is true.
viii
Then by the principle of mathematical induction, ∀x ∈ N, P (x) is true. Note that step 2 is an abbreviation of
P(0) ⇒ P(1) P(1) ⇒ P(2) P(2) ⇒ P(3) …
Proof by Contradiction
To prove a statement P , it is sometimes easier to assume its opposite, written as ¬P , and derive a contradiction.
Key Idea: if ¬P cannot hold, then P must be true.
Step 1 Assume ¬P is true.
Step 2 Based on this assumption draw a number of conclusions. Step 3 Establish that the conclusions lead to a contradiction. Step 4 Therefore ¬P cannot be true (as was assumed).
Step 5 Therefore P is true.
Limits of Computation
Hardware speeds currently tend to double every two years (Moore’s Law). “Computers can do anything!”
• Some problems cannot be solved even in principle (i.e. given unlimited resources)
• Some problems are too difficult to be solved in practice (i.e. the needed resources are too
vast)
• Some problems are too difficult to solve on a given hardware platform.
Some Example Problems
You may like to consider which category these problems fall into:
Problem 1
Problem 2
Problem 3
Write a program which finds the cheapest route to visit a given list of cities. Assume you have a list of all cities and the cost of flying between each two cities. The program should take as input a start point and a list of cities to visit. Write a program which determines whether another program terminates on all possible inputs (i.e. is there an input that puts the program into an infinite loop?).
Write a program to index and make available all full text articles from the last 10 years of The Age newspaper.
ix
Random Number Generation
A nice example of the need to think things through before coding is that of random number generation. A sequence of seemingly random numbers has many applications.
• Simulation
• Program testing
• Numerical analysis
• Decision making and scheduling • Games ( )
Many of these problems can be mapped into the problem of generating a random sequence of integers between 0 and a given number M .
• Random 10 letter words: sequences of length 10 between 0 and 25. 0 becomes ‘a’, 1 becomes ‘b’ etc. (n − 1 becomes nth letter of alphabet).
• Random arrival times, a sequence of real numbers in the interval [1, 10): for 0 ≤ X < M , useY =1+9(X/M).
• Random words from a dictionary: if there are N items in the dictionary, use the nth word, where n = ⌈1 + N(X/M)⌉.
Problem: To generate a sequence of seemingly random integers between 0 and M from a computer program.
Examples:
23, 654, 1, 54, 777, 233, 4232, . . . 3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, . . . Non-examples:
1, 2, 3, 4, 5, . . .
1, 4, 9, 16, 25, 36, . . .
x
“Spaghetti attempt”
This was an attempt by to write a program tha was as unpredictable as possible, and hence would (hopefully) produce a sequence of random numbers.
Input: A 10-digit number X
Output: Next number in the sequence
K1 Y ← ⌊X/109⌋
K2 Z ← ⌊X/108⌋ mod 10. Goto Step K(Z + 3)
K3 If X < 5 × 109 then X ← X + (5 × 109)
K4 X ← ⌊X2/105⌋ mod 1010
K5 X ← (1001001001 × X) mod 1010
K6 If X < 108 then X ← X + 9814055677, else X ← 1010 − X K7 Swap the lower 5 digits of X with the higher 5 digits
K8 Same as K5
K9 Decrease each non-zero digit of X by one
K10 If X < 105 then X ← X2 + 99999, else X ← X − 99999 K11 IfX<109 thenX←10×X.GotoK11
K12 X ← ⌊X(X − 1)/105⌋ mod 1010
K13 IfY >0thenY ←Y −1;GotoK2
The hope was that starting with a number such as 1234567890 and repeatedly applying this algorithm would produce a random sequence of numbers. Unfortunately, what came out was a sequence that soon just repeatedly produced the number 6065038420 — not very random at all!
Moral (Knuth): Random numbers should not be generated with a method chosen at ran- dom. Some theory should be used.
Desirable properties of a random sequence:
• each integer 0, 1, . . . M − 1 occurs equally often,
• each pair, triple, etc. occurs consecutively equally often,
• the sequence should have mean (M − 1)/2 and variance M2/12, and • successive numbers should be independent.
xi
Linear congruential generator
One method to do this is to use a linear congruential generator.
Xn+1 =(a∗Xn +c)modM,wherea,c≤M and0≤Xn
seed = random(seed);
printf(“%d\n”, seed);
} }
int random(int seed) {
return(((25173*seed + 13849) % 65535));
}
Generated Numbers:
7564, 43246, 43522, 44560, 25669, 4486, 23122, 47620, 51424, 62881, 50407, 20590, 9604, 16726, 60607, 19060, 29494, 20296, 14197, 32575, 50404, 10606, 9097, 33340, 40459, 8821, 32302, 59350, 30004, 13666, 34852, 26200, 2209, 47326, 56017, 13195, 40204, 12136, 54742, 29770, 21334, 60841, 11392, 3505, 35104, 12901, 44797, 27985, 44539, 21316, 937, 8350, 37654, 45286, 17002, 61645, 169, 8311, 38932, 38695, 36379, 61861, 63667, 44815, 22354, 47581, 52702, 56290, 4249, 20806, 7567, 53230, 44029, 27946, 45817, 14725, 20314, 8566, 35617, 16255, 424, 4996, 16492, 2740, 45049, 14686, 21592, 1975, 54994, 16471, 63922, 41500, 65449, 11626, 61372, 9115, 27709, 43501, 40207, 22120, …
Moral: There is no substitute for carefully thinking about the problem in an organised way.
xii
Section 1: Complexity Computation “Size”
• How do we estimate the cost of a computation?
• What cost is deemed “acceptable”?
• Given that input size varies, how do we “standardise” costs? • What properties of such costs are important?
• If the cost is too high, what can we do?
All computing resources are finite, no matter how large they may seem: • processor speed,
• memory (RAM, ROM),
• disk, tape, etc.,
• screen size.
Hence, there is always some problem which is too large to be solved by a given platform.
For example, the number of operations need to solve the travelling salesman problem for 100 cities is a 161-digit number. The number of protons in the known universe is a 79-digit number. How can the resources needed by a program be measured? What resources are required by the computation?
The actual resources used by a program depend upon the programming language, CPU speed, available memory, disk speed, operating system, bus traffic, etc. The theoretical analysis of computational time
• should not depend on a particular implementation; • should not limit available memory or time;
• must allow for all possible algorithms.
Time Complexity
The measurement of complexity is usually described in terms of how the time (or possibly space) requirements increase as the size of the input to the program increases. The most com- mon measure of complexity is the amount of time the program takes to run.
“The difference between time and space is that you can re-use space”
In general, we perform a worst-case analysis; i.e. we work out the maximum number of operations that could occur. Worst case analysis:
• guarantees termination within a certain number of steps; • provides an upper bound on the resources needed.
An average (or even minimum) performance measurement may sometimes be more useful, but in general it makes the analysis significantly more difficult. You have to ask yourself, what is average? How often do “bad” cases occur? etc.
Consider an electronic phone directory.
1
Aristotle Archimedes Da Gallileo Newton Tesla Turing
Consider the following “pseudo-code” to find a supplied name:
i := 1;
while (name <> book[i] and i < n) do
i := i + 1;
This has a complexity n, as in the worst case it will take n comparisons to find an entry (i.e. the last one in the book). A better method is to use a binary search (assuming that the address book is sorted).
Aristotle Archimedes Da Gallileo Newton Tesla Turing
left := 1; right := n;
found := false;
while (not found and left <= right) do
i := (left + right)/2;
if (name = book[i]) then found := true
else if (name > book[i] then
left := i+1
else right := i-1
This algorithm has a complexity log2 n. In order to calculate the maximum number of operations that will be done, we analyse the algorithm as explained below.
Get array[ind-id]
will always take constant time no matter how big the array is for a particular problem. 2
For i = 1 to n print i
will increase linearly as n increases.
For i = 1 to n
for j = 1 to n
print i – j
will have n2 steps and will increase quadratically.
We often only count the most expensive operation involved. • Sorting programs — comparisons
• Numerical programs — floating point operations
• Graph algorithms — edge traversals
Rates of Growth
We measure the time complexity of a program by a function. The rate of growth of the function is usually what is most important.
n
20n + 500
n2
n2 + 2n + 5
0 10 100 1000 500 700 2,500 20,500 0 100 10,000 1,000,000 5 125 10,205 1,002,005
Note that the lower order terms have progressively less influence as n gets large.
A function f is of order g, written f ∈ O(g) if g is an approximation of the rate of growth of f for large n.
For example:
• n2 +2n+5isO(n2)
• 20n+500isO(n)
• n3 +500n2 +6isO(n3) • 2n + 30n6 is O(2n)
To determine the order we always take the fastest growing term.
O(1)
O(loga n)
O(n)
O(n loga n) O(n2)
O(n3) O(nr) O(an)
O(n!)
constant logarithmic linear
“n log n” quadratic cubic polynomial exponential factorial
3
Often anything larger than exponential is referred to simply as exponential (or sometimes
√
hyperexponential).O(a n)isbecomingmoreimportantinpractice.
n logn n2 5 2 25 10 3 100 20 4 400 30 4 900 40 5 1,600 50 5 2,500 100 6 10,000 200 7 40,000
2n 32
1,024 1,048,576
∼ 109 ∼ 1012
∼ 1015 ∼ 1030 ∼ 1060
n!
120
3,628,800
∼ 1018 ∼ 1032 ∼ 1048 ∼ 1064
∼ 10161 ∼ 10374
Consider a problem with factorial complexity. Assume 1 computation per 10−12 seconds. Computing continuously from 1 A.D. until now, we would still not have solved the problem for n = 23 (!?!). At a rate 1000 times faster, we would have barely finished solving the problem for n = 21 (!?!?!?).
Polynomial time
The complexity of a problem is measured by the most efficient solution to it. For example, a problem with a linear solution and a quadratic solution is linear. A problem is decidable in polynomial time if there is a program which solves the problem with time complexity O(nr) for some r. The class of all such problems is denoted as P. A problem for which there is no efficient solution is said to be intractable.
Generally, it is considered that any problem outside P is intractable. Note that this is a broad categorisation only, because:
• sometimes a polynomial-time algorithm is not efficient enough, or
• sometimes an exponential algorithm will suffice, or
• sometimes an algorithm is “in-between”, such as O(2 n).
For example, an O(n2) algorithm is often sufficiently less efficient than an O(n log n) al-
gorithm. For n = 25, n2 = 625, whereas n log n ≈ 80. Also, the constant factors can make a
difference. For example, 1 2n < 1000n2 for n < 25. 55
4
√
Section 2: Cryptography
Note: this section will not be covered during the 2016 edition. However, this section was left in the course notes for interest of the reader and as as an example of when difficult computational problems are very useful.
Intractable problems can actually be very useful. Consider the problem of keeping infor- mation private on a public network, such as transmitting credit card numbers over the Internet. We want to make sure that only the right people can access this information. One way to ensure this is to ensure that unauthorised decryption is intractable.
Consider the problem of encryption. Write a message as a (large) number — encoding letters as numbers as in ASCII. A message M, is encrypted by applying E, so the encrypted message is E(M). To decrypt, we apply a decryption key, D, so that M = D(E(M)). The idea is that no “unauthorised” person can see the message M. In order to do this, we need to ensure that “cracking” the system is sufficiently hard that no-one will attempt it.
Simple Ciphers Substitution Ciphers Shift each letter n places.
A B C D E F G ... Z E F G H I J K ... D
The message “SINK THE BISMARCK” becomes “WMRO XLI FMWQEVGO”. Such codes are very easy to break — there are only 25 possible substitution ciphers (why?). This is sometimes also known as Caesar’s Cipher. used n = 3.
A B C D E F G ... Z D E F G H I J ... C
Rot 13 (moving 13 places either way) was often used to hide offensive material, quiz an- swers, punch lines, etc. Historically, many ciphers were based on this idea.
Polyalphabetic Ciphers
Thse are multiple substitution ciphers. These were invented in 1568, and were still in use in the US Civil War (1860-5). For example, one using 5 substitution ciphers might work as follows:
1st letter 2nd letter 3rd letter 4th letter 5th letter 6th letter
...
Rotors
• Machines
4 to the right
3 to the left 10 to the left
7 to the right 2 to the right 4 to the right
with a mechanical wheels (rotors), one per substitution cipher 5
• Rotors move at different rates; period for an n-rotor machine was 26n • Enigma machine used by Germans in World War II
• Enigma codes broken by a British team including
The Unix utility crypt is based on a similar idea. This provides minimal security and so- phisticated attack methods are known.
Moral: substitution ciphers of any form are insecure since the invention of the computer. A more secure system would be to use an arbitrary mixing of the letters.
A B C D E F G ... Z X C J Q B Z R ... Y
There are 25! = 1.5×1025 combinations, making it difficult to check them all. However, this makes it more difficult to specify the key. Other factors come into play, such as the distribution of letters, keywords, contextual knowledge, ...
There are many other old techniques like these, however, all of them are insecure when computers get involved. Note, that ciphers are secret key systems, i.e. the key must be kept private.
The DES Cryptosystem
One cipher which has been used in more modern times is the Data Encryption Standard (DES). It is a
• widely used cryptosystem (certified 1976)
• 64-bit “block” cipher (i.e. text encoded in 64-bit blocks);
• 56-bit (secret) keys;
• complex combination of bitwise permutations and substitutions; • encryption and decryption algorithms are same;
• decryption key = reverse of encryption key;
• usually implemented in hardware;
• intended for 5-10 year span.
At the time, 56-bit keys were considered outside the cracking range of all but very large organisations. 128-bit numbers were considered more appropriate. Today DES is considered insecure; various credible cracking claims have surfaced. Secure variations include triple-DES (i.e. using 3 different keys).
Moral: Always make the numbers bigger than is strictly needed. Public key cryptosystems
In general, we wish to be able to ensure secure communication between two arbitrary users. The standard way to do this is public key cryptosystems. In public key encryption,
• individual encryption keys E are published; • decryption keys D are kept secret;
• anyone can encrypt a message for Fred;
• only Fred can decrypt it.
Hence we publish E, and keep D secret. To work, this requires that D be not deducible from E. We can ensure this if the task of computing D from E is intractable. This gives us
6
confidence that no-one can find D from E. To sum up, requirements of a public key encryption are:
• D must not be (easily) deducible from E; • increase the numbers until this happens;
• computing E(M) must be simple enough.
Hence, we want E(M) to be a one-way trapdoor function. For authentification, as well as security, we require E(D(M)) = M (as well as D(E(M)) = M to decrypt the message). This allows us to have signatures, i.e. a way of determining whether or not a message is authentic.
For example, B sends a secret message to A. B sends EA(DB(M)) to A, rather than just EA(M). A decrypts with DA(EB(M)) rather than just DA(M).
DA(EB(EA(DB(M)))) = DA(EB(DB(EA(M)))) = DA(EA(M)) = M
MM
B encrypts A decrypts
EA(DB(M)) Transmission EA(DB(M)) The RSA Cryptosystem
How do we find the appropriate functions? One method is to use the RSA cryptosystem. This method chooses two large primes p and q, and let n = p × q. Now, find k, g such that
k × g ≡ 1 mod ((p − 1) × (q − 1)), where
• g and n are made public;
• k, p and q are kept secret;
• divide message into chunks ≤ n − 1; • E(m) = mg(modn);
• D(m) = mk(modn).
So D(E(m)) = E(D(m)) = m(g×k)(modn), and we can show that: m(g×k)(modn) ≡ m(modn).
To break this system, it is necessary to determine k from n and g. If we can factor n into p and q, k can easily be determined from g, p and q. The way that this is done is to solve the equation (k × g) + (x × r) = 1, where r = (p − 1) × (q − 1). This is known as a Diophantine equation, and k and x can be found from g and r quite easily using a variation of Euclid’s algorithm for greatest common divisor. Determining k from g, p and q like this is how the system is set up in the first place.
Hence modern cryptography uses an algorithm 2,500 years old ... Hence security depends on not being able to factor n in any reasonable time. By making n large enough to defeat the best algorithms known, the method is secure. Factoring is known to be a very intractable
7
problem, and so it is unlikely that a “smart” algorithm will be able to defeat the system. Note also that probabilistic methods can be used to easily find appropriate primes, and so setting up the system is computationally tractable.
8
Section 3: Grammars and Parsing
Consider the following encrypted message. WKHVHYHQWLHV
The 25 possible decryptions of this as a simple cipher are below. xliwizirxmiw
ymjxjajsynjx
znkykbktzoky
aolzlcluaplz bpmamdmvbqma cqnbnenwcrnb drocofoxdsoc espdpgpyetpd ftqeqhqzfuqe gurfriragvrf hvsgsjsbhwsg iwthtktcixth jxuiuludjyui kyvjvmvekzvj lzwkwnwflawk maxlxoxgmbxl nbymypyhncym ocznzqziodzn pdaoarajpeao qebpbsbkqfbp rfcqctclrgcq sgdrdudmshdr theseventies uiftfwfoujft vjgugxgpvkgu
How do we find the “right” one? For this, we need to solve two problems:
1. Find the right way to organise streams of characters into “building blocks” (lexical anal-
ysis).
2. Determine whether the current sequence is a “legal” one (parsing).
This makes it important to know about formal languages.
9
Languages
Languages come in all kinds of different forms:
• Natural Languages: English, Chinese, heiroglyphics, Russian, Greek, ...
• ComputerLanguages:Java,C,HTML,SQML,Cobol,(programming,formatting,query-
ing, etc) ...
• Mathematical Languages: predicate calculus, sets, relations, functions, ...
Given the nature and expanse of such languages, how do we specify such languages? What can we express in such languages? How do we describe “legal” phrases in the language? We need much more than a dictionary. How do we write programs to recognise and manipulate the language?
A language is a set of symbols that can be combined together in particular ways to form the strings of the language.
Definition 1 An alphabet is a finite set of symbols.
Examples:
• Roman: {a,b,c,d,e,f, ... z}
• Greek: {α, β, γ, δ, ε, ...ω}
• Binary: {0,1}
• Numeric: {0,1,2,3,4,5,6,7,8,9}
• Alphanumeric: {a-z, A-Z, 0-9}
• “Keyboard”: {a-z, A-Z, 0-9, :’?!()&@ ...}
• C Tokens: {if, while, main, return, +=, ==, case, . . . }
Definition 2 A string over an alphabet Σ is a finite sequence of symbols from Σ.
Examples:
• watermelon and banana are strings over {a,b,c,d,e,f, ... z} • 1011010111 and 110 are strings over {0,1}
• “if((x+=1)>=y)whilez”isastringofCtokens
Strings can be empty; in this notes, we denote the empty string by λ. Another symbol commonly used to denote the empty string is ε. The set of all strings over the alphabet Σ (including λ) is denoted by Σ∗. Note that Σ∗ represents all possible strings, some of which may not make sense. A language then places restrictions on what set of strings are valid (or legal). For example, this sentence is not a valid English sentence, although almost is it.
Also, the syntax of programming languages places restrictions on the ordering of constructs such as if, while, and return. Natural languages can be very difficult to get right. For e.g.,
incorrect syntax correct syntax sensible meaning sensible meaning (?)
An arrow like flies time An arrow flies like time Time flies like an arrow Fruit flies like a banana
Definition 3 A language is a subset of Σ∗.
Hence a language is just a “certain class” of strings over Σ. Examples:
• Σ = {0,1}, L = {0,01,011,0111,01111,…} 10
• Σ = {a,…z}, L = {ab,cd,efghi,s,z}
• Σ = {0, 1}, L = { (representations of) prime numbers } • Σ=CTokens,L={legalCprograms}
• Σ = {0, 1}, L = {strings containing at least 2 0’s}
• Σ={a,…z},L=∅
In general, languages are specified via
L={w∈Σ∗ |whaspropertyP} Describing a Language
We need to have some mechanism to describe what are the valid strings within a language. Consider the ”language” of correct mathematical expressions (infixnotation), involving variable names, *, + (, ). How can we describe legal phrases in this language?
Example of valid strings: • a*(b*c+d)
• a+b
• (c + d)
Example of invalid strings: • *+a
• +b+*
• (* c d
Language Rules
In order to formally specify languages, we follow certain conventions/forms. For example,
Textbook form
E→E+E E→E*E E → (E)
E → id
BNF form
—
— (
—
string
Other conventions for specifying language rules include:
• DTD’s (Data Type Definitions) for languages such as HTML
• Regular Expressions – simple languages with only union, concatenation, repetition.
11
Regular Expressions
We introduce ⊘ to represent the empty language. Hence regular expressions are strings over the alphabet {(, ), ⊘, ∪,∗ } ∪ Σ.
Definition 4
1. ⊘, λ and each member of Σ is a regular expression. 2. If α and β are regular expressions, then so is (αβ).
3. If α and β are regular expressions, then so is (α ∪ β). 4. If α is a regular expression, then so is α∗.
5. Nothing else is a regular expression.
Regular expressions are used as a finite representation of languages. Hence the two lan- guages above can be represented as
0∗10∗10∗ and 0∗10∗10∗ ∪ 0∗1110∗
We define the language L(α) of a regular expression α as follows:
Definition 5
1. L(⊘)=∅.L(a)={a}foreacha∈Σ.
2. If α and β are regular expressions, then L(αβ) = L(α)L(β).
3. If α and β are regular expressions, then L(α ∪ β) = L(α) ∪ L(β). 4. If α is a regular expression, then L(α∗) = L(α)∗.
Examples:
• L((a∪b)∗a)={a,aa,ba,aaa,aba,baa,bba,…} ={w∈{a,b}∗ |wendsina}
• L((01∪1)∗)={1,01,011,0101,01011,…} = {w | w does not contain 00}
• L(0∗(10∗)∗)={0,1,00,01,10,11,001,…} = {0, 1}∗
To see why, consider w ∈ {0, 1}∗. Each occurrence of 1 in w is preceded or followed by
a (possibly empty) sequence of 0’s.
• L(0∗(1∪⊘)0∗)={0,1,00,01,010,000,…}
={w∈{0,1}∗ |
w contains at most one 1}
• Let L = L(c∗(a∪(bc∗))∗). What is L?
Note that no string in this language contains ac — each occurrence of a is followed either byanotheraorbyb.SoL⊆A={w∈{a,b,c}∗ |wdoesnotcontainac}.
Is A ⊆ L?
Consider w ∈ A. If w starts with a or b, then w ∈ (a ∪ (bc∗))∗, as any sequence of c’s must be preceded by a b. Hence it only remains to allow for a sequence of c’s at the beginning.
Hence L(c∗(a∪(bc∗))∗) is exactly the strings in {a,b,c}∗ which do not contain ac, i.e. L = A.
12
• “The language over {0, 1}∗ in which all strings contain 11”. Does this mean
1. 0∗110∗ (exactly two 1’s)?
2. (0 ∪ 1)∗11(0 ∪ 1)∗ (at least two 1’s)?
Hence informal descriptions can be ambiguous, but there is no ambiguity in the formal verisons.
• “The language over {0, 1}∗ in which all strings begin with 00 or end with 11”. 00(0∪1)∗ ∪(0∪1)∗11
There can be many different regular expressions for a given language. For example,
(0∪1)∗ =((0∪1)∗)∗
= (0 ∪ 1)∗(0 ∪ 1)∗
= (⊘ ∪ 0 ∪ 1)(0 ∪ 1)∗ = (0∗ ∪ 1∗)∗
= 1∗(01∗)∗.
Hence we often want the “simplest” expression (usually minimal number of nested *’s).
• Can be used to represent some (simple) languages.
• Representation is itself a string (and hence can be typed at a keyboard).
• Used in search facilities (vi, emacs, grep, egrep, fgrep, . . . ) and in compilers.
• Cannot handle some “easy” languages — for example, there is no regular expression for
{0n1n | n ≥ 0}.
Grammar Rules
Grammars are a way of describing language rules, defining the strings that are allowed within a language. Given the alphabet of symbols, the grammar tells us how these symbols can be combined. The grammar itself also uses some language form. Some (simple) languages can be described using the mathematical language of regular expressions Other languages require a more powerful specification format. If we try to specify the expression (+,*,(,), etc.) as a regular expression, we might try
id((+ ∪ ∗)id)∗
But how can we capture the brackets? And get the matching number of each? No way to do this as we can’t “count” brackets. Similarly we can’t express {w ∈ anbn | n ≥ 0} as a regular expression. We will discuss classes of languages that can be expressed by different kinds of grammars.
Equivalent Grammars
The same language can be specified in different ways, even within the same specification lan- guage. For example, consider the language of strings over {a, b} which do not contain the substring aa
Regular Expressions:
b∗(abb∗)∗ ∪ b∗(abb∗)∗a (1)
= b∗(abb∗)∗λ ∪ b∗(abb∗)∗a (2)
13
= b∗(abb∗)∗(λ ∪ a) = (b∪ab)∗(λ∪a)
Equivalent grammars:
(3) (4)
C → bC | λ (from 1)
A → AC | λ,
Two different grammars which use the same specification language and specify the same
set of strings are equivalent. Generating Language Strings
Consider the language defined by a(a∗ ∪ b∗)b. First output is ‘a’. Then output a string of ‘a’’s or string of ‘b’’s. Then output ‘b’.
Let S = a string, M = the “middle part”, A = a string of a’s, B = a string of b’s.
S → aMb (1) M→A (2) M→B (3) A → aA (4) A→e (5) B → bB (6) B→λ (7)
To generate the string aaab:
S
aMb rule (1) aAb rule (2) aaAb rule (4) aaaAb rule (4) aaab rule (5)
This is an example of Context Free Grammar. They are an important class of grammars, in which rules can be applied to symbols (e.g. A) regardless of context of symbol. This makes them computationally useful.
Example Grammars
• Even length strings over {a, b} S → aO | bO | λ
O → aS | bS
S ⇒ aO ⇒ abS ⇒ abbO ⇒ abbbS ⇒ abbb
S → A | B, D → abC,
A → CD, B → Aa
S → AB, B → a | λ
C → b | ab (from 4)
14
• Strings with an even number of b’s (i.e. a∗(ba∗ba∗)∗)
S → aS | bB | λ B → aB | bS
S ⇒ aS ⇒ abB ⇒ abbS ⇒ abbbB ⇒ abbbaB ⇒ abbbabS ⇒ abbbab
• Strings not containing abc
S → aB | bS | cS | λ B → aB | bC | cS | λ C → aB | bS | λ
S ⇒ aB ⇒ abC ⇒ abaB ⇒ abacS ⇒ abac
• Palindromic strings, w = wR, over {a, b}
S→a|b|λ S → aSa | bSb
S ⇒ aSa ⇒ abSba ⇒ ababa Formal Definition
A context-free grammar G is a quadruple (V, Σ, R, S) where
• V is a finite set of non-terminals;
• Σ is the set of terminals (the symbols of the language);
• S is a distinguished element of V called the start symbol; and • Risasetofrules.
Note that the set of non-terminals V and the set of terminals Σ are assumed to be disjoint. Example Grammar
Grammar G = (V,Σ,R,S), where V = {S}, Σ = {a,b} and R consists of rules {S → aSb and S → λ}. Using these rules we can get:
S ⇒aSb⇒aaSbb⇒aabb
S ⇒ aSb ⇒ aaSbb ⇒ aaaSbbb ⇒ aaabbb
The language of G, denoted L(G), is the set {w ∈ Σ∗ | S ⇒∗ w}. This is clearly {anbn | n ≥ 0}. Recall that this is not a language which can be specified by a regular expression.
15
Derivations
Rule application: Given the string uAv and the rule A → w, we obtain the string uwv. We denote this as uAv ⇒ uwv. A string w is derivable from v if there is a finite sequence of rule applications form v to w.
v⇒w1 ⇒w2 ⇒…⇒wn =w
Usuallywrittenasu⇒∗ w.Forexample,S⇒AA⇒AbA⇒abA,soS⇒∗ abA.The
length of a derivation v ⇒∗ Derivations
S → aMb M→A M→B
A → aA A→e
B → bB B→λ
w is the number of rule applications in the derivation.
(1) (2) (3) (4) (5) (6) (7)
aaab is derivable from S:
S ⇒aMb⇒aAb⇒aaAb⇒aaaAb⇒aaab S⇒∗ aaab.
The length of this derivation is 5.
Parse Trees and Derivations
Derivations can be written in a graphical form as a parse tree. Given a grammar and a string, there may be different derivations to get the same string. Equivalent derivations (same meaning) have the same parse tree. Any parse tree has unique leftmost and rightmost derivations. Gram- mars with strings having 2 or more parse trees are ambiguous. Some ambiguous grammars can be rewritten as equivalent unambiguous grammars.
Please refer to Chapter 3 in Sudkamp’s course book (page 71 in 3rd edition; page 60 in 2nd edition) for definition of left-most derivations, as well as Definition 3.5.2 (3rd edition) or Definition 4.1.2 (2nd edition) for precise definition of ambiguous grammars.
Derivation as Parse Tree
Consider the following grammar.
S → AS | SB | λ A → aB | bA | λ B → bS | c | λ
The string abc can be derived by applying the above rules in the following order. S ⇒ AS ⇒ bAS ⇒ baBS → bacS → bac
A parse tree of S ⇒∗ w is then obtained as follows:
• The parse tree has root S
16
• If S → AS is the rule applied to S, then add A and S as children of S. A → bA, then add b and A as children of A …
• IfA→eistheruleappliedtoA,thenaddeastheonlychildofA. S
Equivalent Derivations
AS
e
B c
bA a
A grammar can admit different derivations yielding the same string. For example, consider this simple grammar G = (V, Σ, R, S), where
• V ={S}.
• Σ={(,)}.
• R={S→SS|(S)|ε}.
The string “(())()” can be derived from S by several derivations, e.g.: (D1) S ⇒ SS ⇒ (S)S ⇒ ((S))S ⇒ (())S ⇒ (())(S) ⇒ (())() (D2) S ⇒ SS ⇒ (S)S ⇒ ((S))S ⇒ ((S))(S) ⇒ (())(S) ⇒ (())() (D3) S ⇒ SS ⇒ (S)S ⇒ ((S))S ⇒ ((S))(S) ⇒ ((S))() ⇒ (())()
In the derivations above, D1 and D2 differ in that steps 4 and 5 are reversed. These all have the same parse tree, namely:
S
SS (S)(S) (S) e
e
17
Leftmost/Rightmost Derivations
A left-most derivation is one in which the left-most variable is used for expansion at all points in the derivation. In other words, when there is a choice to be made between two or more variables, the left-most one is always chosen. Similarly, a right-most derivation is one where the right-most variable is always chosen. If there is only one variable at every derivation step, then the derivation is both left-most and right-most.
See Chapter 3 in Sudkamp’s course book (page 71 in 3rd edition; page 60 in 2nd edition) for definition of left-most derivations.
Let G be the following grammar:
S → AB
A → aA | Aa | a B → bB | Bb | b
A left-most derivation for aaabb: S ⇒ AB ⇒ aAB ⇒ aAaB ⇒ aaaB ⇒ aaaBb ⇒ aaabb
A right-most derivation for aaabb: S ⇒ AB ⇒ ABb ⇒ Abb ⇒ aAbb ⇒ aAabb ⇒ aaabb
A derivation that is neither left nor right most: S ⇒ AB ⇒ ABb ⇒ aABb ⇒ aAaBb ⇒ aAabb ⇒ aaabb
Ambiguous Grammars
If you can have two left-most derivations for the same string from a given grammar then that grammar is ambiguous. Please see Sudkamp’s book section on left-most derivations and ambiguous grammars: Definition 3.5.2 in 3rd edition or Definition 4.1.2 in 2nd edition.
Recall the grammar G = (V, Σ, R, S), where
• V ={S,E};
• Σ = {∗,+,(,)}; and
• R={S→E,E→E+E|E∗E|(E)|id}.
The string id + id ∗ id can be generated by this grammar with two different left-most derivations according to two different parse trees.
• E⇒E+E⇒id+E⇒id+E∗E⇒id+id∗E⇒id+id∗id
• E⇒E∗E⇒E+E∗E⇒id+E∗E⇒id+id∗E⇒id+id∗id
Accordingly, these derivations yield two different parse trees, namely: EE
E+E E∗E id E ∗ E E + E id idid idid
18
Observe that one of these corresponds to the “natural” semantics of id + id ∗ id, where ∗ takes precedence over +. Which one?
Ambiguous Languages
Many ambiguous grammars (such as the one on the previous slide) can easily be converted to an unambiguous grammar representing the same language. Some context free languages have the property that all grammars that generate them are ambiguous. Such languages are inherently ambiguous.
Inherently ambiguous languages are not useful for programming languages, formatting lan- guages or other languages which must be parsed automatically.
Parsing
Given a context-free grammar G and input w, determine if w ∈ L(G). The problem is: • how do we determine this for all possible strings?;
• multiple derivations may exist;
• must also discover when no derivation exists.
A procedure to perform this function is called a parsing algorithm or parser. Some gram- mars allow deterministic parsing, i.e. each sentential form has at most one derivation.
Ambiguity and Parsing
A grammar is unambiguous if at each step in a leftmost derivation there is only one rule which can lead to the desired string. Deterministic parsing is based upon determining which rule to apply.
• top-down parsing From start symbol generate a leftmost derivation of w guided by w.
• bottom-up parsing From w generate in reverse order a rightmost derivation of w ⇐∗ S
guided by rules of G. Top Down Parsing
Leftmost derivation from S with choice of rule guided by w. For example, a grammar for {aibi |i≥0}isS →aSb|λ
Derivation for aaabbb
Derivation terminal
prefix
S e aSb a aaS bb aa aaaSbbb aaa aaabbb
Grammar for arithmetic expressions
next char
a a a b
rule
S→aSb S → aSb S → aSb S→λ
19
S→A A→T|A+T T →b|(A)
20
Derivation for (b + b)
Derivation terminal Derivation terminal prefix prefix
SλSλ AλAλ TλTλ (A) ( (A) ( (A+T) ( (T) ( (T+T) ( (b) (b) (b+T) (b+
(b+b) (b+b) Grammar for arithmetic expressions
S
A
T (A) (A + T) (T + T) (b + T) (b + b)
Bottom Up Parsing
Reductions for (b) + b
S→A A→T|A+T T →b|(A)
Construct a rightmost derivation in reverse by doing the leftmost production backwards. A reduction (reverse production) is of the form w = u1qu2 and A → q gives v = u1Au2.
Reduction Rule
(b) + b (T ) + b (A) + b T+b A+b A+T A
S
T → b
A → T
T → (A) A→T T→b A→A+T S→A
21
Rightmost derivation in reverse
S⇒A
⇒ A+T
⇒ A+b ⇒T+b ⇒ (A)+b ⇒ (T)+b ⇒ (b)+b
Again there are multiple possibilities. For string w ∈ (V ∪ Σ)∗ we can split it into w = uv and apply reductions to the rightmost part of u.
u v Rule
e (A+T)
( A+T)
(A +T) S→A (A+ T)
(A+T )
A → T (A + A)
Current derivation string w = u · v. Apply reductions to rightmost part of u, or shift one
character from v to u. For e.g. u · az becomes ua · z. Further Parsing Methods
More sophisticated forms of parsing: • top-down parsing
– Lookahead
– LL(k) grammars • bottom-up parsing
– LR(0) grammars (and machines) – LR(1) grammars (and machines)
Reduction
shift shift shift shift
(S+T) A→A+T (A)
shift
(A+T) e
22
Section 4: Finite State Machines
Our model of computation comes down to two parts: input (and output) language and proces- sor. For example:
• Coke machine (input money, output drinks + change if right coins inserted).
• Hardware adder (input bits, output bits representing sum of input).
• Credit card checker (input card number + expiry date, output validity as ‘yes’