COMSM1201 : Exercises in C Neill Campbell
Department of Computer Science, University of Bristol
Copyright © 2019 Neill Campbell
Formatted in LATEX, based on the Legrand Orange Book from BOOK-WEBSITE.COM
This work is licensed under a Creative Commons Attribution-NonCommerical-ShareAlike 4.0 International License.
Contents
1 HelloWorld(Week0&1) ………………………………. 5
1.1 Lecture Notes Chapter 1 5
1.2 Twice the Sum 5
1.3 Letter C 5
1.4 Lecture Notes Chapter 2 6
1.5 ++a++ 6
1.6 Randomness 6
1.7 Lecture Notes Chapter 3 7
1.8 find_max 7
1.9 Loving Oddness 7
1.10 Lecture Notes Chapter 4 7
1.11 Hailstone 7
1.12 Primes 7
1.13 Triangles 8
1.14 Lecture Notes Chapter 5 8
1.15 Vowelness 8
1.16 Lecture Notes Chapter 6 8
1.17 Unit Circle 8
1.18 Time Flies 9
1.19 Lecture Notes Chapter 7 9
1.20 Roulette 9
1.21 Linear Congruent Generator 9
2 Pre-Arrays(Week2) …………………………………. 11
2.1 Planet Trium 11
2.2 Planet Bob 12
2.3 Secret Codes 12
2.4 Cash Machine (ATM) 13
2.5 Hailstone Sequence 13
2.6 Monte Carlo Π 14
2.7 Leibniz Π 14
2.8 Triangle Numbers 14
2.9 Higher-Lower 15
2.10 Irrational numbers 15
3 1DArraysandStrings ………………………………… 17
3.1 Neill’s Microwave 17
3.2 Music Playlisters 17
3.3 Rule 110 18
3.4 Yahtzee 19
3.5 Palindromes 22
3.6 Int to String 22
3.7 Merging Strings 22
3.8 Roman Numerals 23
3.9 Soundex Coding 24
4 2DArrays ………………………………………….. 27
4.1 The Game of Life 27
4.2 Life Wars 28
4.3 Wireworld 29
4.4 Forest Fire 30
4.5 Diffusion Limited Aggregation 31
4.6 Langton’s Ant 32
4.7 ANSI Escape Sequences 33
4.8 ncurses 33
5 Strings,RecursionandSDL ……………………………. 35
5.1 Anagrams 35
5.2 Draw to Unlock 36
5.3 SDL – Intro 37
5.4 Word Ladders 37
5.5 The Devil’s Dartboard 40
5.6 Maze 40
6 Lists,InsertionSort&moreRecursion ……………………. 43
6.1 A Simple Spelling Checker 43
6.2 Prime Factors 43
6.3 Sierpinski Carpet 45
6.4 Sierpinski Squares 45
7 SearchingBoards……………………………………. 47
7.1 Conway’s Soldiers 47
7.2 The 8-Tile Puzzle 48
7.3 Happy Bookcases 48
9 Huffman,Trees&Bitwise………………………………. 53
9.1 Depth 53
9.2 Two Trees 55
9.3 Huffman Encoding 55
9.4 Binary Tree Visualisation 56
9.5 Boolean Arrays 57
9.6 Advent of Code 57
9.7 Movember 58
10 ADTsandAlgorithmsI………………………………… 59
10.1 Indexed Arrays 59
10.2 Sets 59
10.3 Towards Polymorphism 60
10.4 Double Hashing 60
10.5 Separate Chaining 60
11 ADTsandAlgorithmsII ……………………………….. 63
11.1 Polymorphic Hashing 63
11.2 Cuckoo Hashing 64
11.3 MultiValue Maps 64
11.4 Rhymes 64
11.5 Faster MVMs 66
12 ParsingData ……………………………………….. 67
12.1 Teletext 67
12.2 Guido van Robot 73
12.3 Turtle Graphics 77
12.4 The UNIX awk program 80
12.5 Neill’s Adventure Language 82
A HouseStyle ………………………………………… 87
A.1 Correctness 87
A.2 Prettifying 89
A.3 Readability 90
B LabTests …………………………………………… 91
B.1 Anagrams 92
B.2 Isograms 93
B.3 Mutating Boards 94
Lecture Notes Chapter 1 Twice the Sum
Letter C
Lecture Notes Chapter 2 ++a++
Randomness
Lecture Notes Chapter 3 find_max
Loving Oddness
Lecture Notes Chapter 4 Hailstone
Primes
Triangles
Lecture Notes Chapter 5 Vowelness
Lecture Notes Chapter 6 Unit Circle
Time Flies
Lecture Notes Chapter 7 Roulette
Linear Congruent Generator
1. Hello World (Week 0 & 1)
The exercises in this Chapter are largely taken from the book “C by Dissection”.
1.1 Lecture Notes Chapter 1
1.2 Twice the Sum
Here is part of a program that begins by asking the user to input three integers:
…
Exercise 1.1 Once you’ve studied Chapter 1 of the lecture notes, type in, compile and run the examples given in the handout.
#include
int main(void) {
int a, b, c;
printf(“Input three integers: “);
Exercise 1.2 Complete the program so that when the user executes it and types in 2, 3, and 7, this is what appears on the screen:
Input three integers: 2 3 7
Twice the sum of integers plus 7 is 31 !
1.3 Letter C
Execute this program so you understand the output:
#include
8 Chapter 1. Hello World (Week 0 & 1)
#define HEIGHT 17
int main(void) {
int i = 0;
printf(“\n\nIIIIIII\n”); while(i < HEIGHT){ printf(" III\n");
i = i + 1;
} printf("IIIIIII\n\n\n"); return 0;
}
Exercise 1.3 Write a similar program that prints a large letter C on the screen (it doesn’t need to be curved!).
1.4 Lecture Notes Chapter 2
1.5 ++a++
Study the following code and write down what you think it prints.
Exercise 1.4 Once you’ve studied Chapter 2 of the lecture notes, type in, compile and run the examples given in the handout.
int a, b = 0, c = 0;
a = ++b + ++c;
printf("%d %d %d\n", a, b, c);
a = b++ + c++;
printf("%d %d %d\n", a, b, c); a = ++b + c++;
printf("%d %d %d\n", a, b, c); a = b−− + −−c;
printf("%d %d %d\n", a, b, c);
Exercise 1.5 Then write a test program to check your answers.
1.6 Randomness
The function rand() returns values in the interval [0, RAND_MAX]. If we declare the variable median and initialise it to have the value RAND_MAX/2, then rand() will return a value that is sometimes larger than median and sometimes smaller.
Exercise1.6 Writeaprogramthatcallsrand(),say500times,insideaforloop,increments the variable minus_cnt every time rand() returns a value less than median. Each time through the for loop, print out the value of the difference of plus_cnt and minus_cnt. You might think that this difference should oscillate near zero. Does it ?
1.7 Lecture Notes Chapter 3 9
1.7 Lecture Notes Chapter 3
1.8 find_max
1.9 Loving Oddness
Suppose that you detest even integers but love odd ones.
1.10 Lecture Notes Chapter 4
1.11 Hailstone
The next number in a hailstone sequence is n/2 if the current number n is even, or 3n + 1 if the
current number is odd. If the initial number is 77, then the following sequence is produced:
Exercise 1.7 Once you’ve studied Chapter 3 of the lecture notes, type in, compile and run the examples given in the handout.
Exercise 1.8 Write a program that finds the largest number entered by the user. Executing the program will produce something like:
How many numbers do you wish to enter ? 5 Enter 5 real numbers: 1.01 −3 2.2 7.0700 5 Maximum value: 7.07
Exercise1.9 Modifythefind_maxprogramsothatallvariablesareoftypeintandthatonly odd integers are processed. Explain all this to the user via appropriate printf() statements.
Exercise 1.10 Once you’ve studied Chapter 4 of the lecture notes, type in, compile and run the examples given in the handout.
77 232
116 58 29 88 44 22
11 34
Exercise1.11 Writeaprogramthat,givenanumbertypedbytheuser,printsoutthesequence of hailstone numbers. The sequence terminates when it gets to 1.
1.12 Primes
A prime number can only be exactly divided by itself or 1. The number 17 is prime, but 16 is not because the numbers 2, 4 and 8 can divide it exactly. (Hint 16%4 == 0).
10 Chapter 1. Hello World (Week 0 & 1)
Exercise1.12 Writeaprogramthatprintsoutthefirstnprimes,wherenisinputbytheuser. The first 8 primes are:
What is the 3000th prime ?
2 3
5 7 11 13 7
19
1.13 Triangles
A triangle can be equilateral (all three sides have the same length), isosceles (has two equal
length sides), scalene (all the sides have a different length), or right angled where if the three
sidesarea,bandc,andcisthelongest,then: c=
√
a2+b2
Exercise1.13 Writeaprogramsothatyoucanprocessanumberoftriplesofsidelengthsin a single run of your program using a suitable unlikely input value for the first integer in order to terminate the program. e.g. -999.
Think hard about the test data for your program to ensure that all possible cases are covered and all invalid data results in a sensible error message. Such cases can include sides of negative length, and impossible triangles (e.g. one side is longer than the sum of the other two).
1.14 Lecture Notes Chapter 5
1.15 Vowelness
Vowels are the letters a, e, i, o and u.
1.16 Lecture Notes Chapter 6
1.17 Unit Circle
In mathematics, for all real x, it is true that:
sin2(x) + cos2(x) = 1
Exercise 1.14 Once you’ve studied Chapter 5 of the lecture notes, type in, compile and run the examples given in the handout.
Exercise 1.15 Write a program that reads characters from the keyboard and writes to the screen. Write all vowels as uppercase letters, and all non-vowels as lowercase letters. Do this by using writing a function isvowel() that tests whether or not a character is a vowel.
Exercise 1.16 Once you’ve studied Chapter 6 of the lecture notes, type in, compile and run the examples given in the handout.
1.18 Time Flies 11
i.e. sin(x) ∗ sin(x) + cos(x) ∗ cos(x) = 1.
Exercise 1.17 Write a program to demonstrate this for values of x input by the user.
1.18 Time Flies
Exercise 1.18 Write a program which allows the user to enter two times in 24-hour clock format, and computes the length of time between the two, e.g.:
Enter two times : 23:00 04:15
or,
Difference is : 5:15
Enter two times : 23:40 22:50 Difference is : 23:10
1.19 Lecture Notes Chapter 7
1.20 Roulette
This is an chance for you to practice self-document techniques such as sensible identifier naming,
commenting, typedefs and enumeration.
1.21 Linear Congruent Generator
One simple way to generate ‘random’ numbers is via a Linear Congruent Generator which might
look something like this:
here, A, C and M are constants defined in the code. All of these pseudo-generators have a period of repetition that is shorter than M. For instance, with A set to 9, C set to 5 and M set to 11 the sequence of numbers is :
Exercise 1.19 Once you’ve studied Chapter 7 of the lecture notes, type in, compile and run the examples given in the handout.
Exercise 1.20 Write a roulette program. The roulette machine will select a number between 0 and 35 at random. The player can place an odd/even bet, or a bet on a particular number. A winning odd/even bet is paid off at 2 to 1, except that all odd/even bets lose if the roulette selects 0. If the player places a bet on a particular number, and the roulette selects it, the player is paid off a 35 to 1.
int seed = 0;
/* Linear Congruential Generator */ for(i=0; i
#include
#define MAXTHROW 6 #define NUMDICE 5
#define TESTS 10000000 enum bool {false, true};
typedef enum bool bool;
/* Fill the array d with random numbers 1..6 */
void randomthrow(int d[NUMDICE]);
/* Decide if the number n occurs anywhere in the histogram h */ bool histo_has(const int h[MAXTHROW], const int n);
/* Compute a histogram, given a dice hand */
void makehist(const int d[NUMDICE], int h[MAXTHROW]); /* Check that the histograms h1 & h2 are the same */
bool hists_same(const int h1[MAXTHROW], const int h2[MAXTHROW]); /* Does this hand have 2 lots of one number and 3 lots of another */
bool isfullhouse(const int d[NUMDICE]);
/* Does this hand have 4 lots of one number and 1 of another ? */
bool is4ofakind(const int d[NUMDICE]);
/* Do some testing of the functions required */
void test(void); int main(void)
{
int dice[NUMDICE];
int k4 = 0; int fh = 0; int i;
test();
for(i=0; i
#include
3.8 Roman Numerals 25
Hint : Functions such as strncmp() and strcat might be useful.
#include
typedef enum bool {false, true} bool;
void strmerge(const char* s1, const char* s2, char*s3);
#define LARGESTRING 1000
int main(void) {
char s[LARGESTRING];
strmerge(“Hello World!”, “World! & Everyone.”, s); assert(strcmp(s, “Hello World! & Everyone.”)==0);
strmerge(“The cat sat”, “sat on the mat.”, s); assert(strcmp(s, “The cat sat on the mat.”)==0);
strmerge(“The cat sat on the mat”, “The cat sat on the mat.”, s); assert(strcmp(s, “The cat sat on the mat.”)==0);
strmerge(“One “, “Two”, s); assert(strcmp(s, “One Two”)==0);
strmerge(“”, “The cat sat on the mat.”, s); assert(strcmp(s, “The cat sat on the mat.”)==0);
strmerge(“The cat sat on the mat.”, “”, s); assert(strcmp(s, “The cat sat on the mat.”)==0);
assert(strcmp(s, “123412341234”)==0); }
3.8 Roman Numerals Adapted from:
http://mathworld.wolfram.com/RomanNumerals.html
“Roman numerals are a system of numerical notations used by the Romans. They are an additive (and subtractive) system in which letters are used to denote certain “base” numbers, and arbitrary numbers are then denoted using combinations of symbols. Unfortunately, little is known about the origin of the Roman numeral system.
The following table gives the Latin letters used in Roman numerals and the corres- ponding numerical values they represent :
www
26
Chapter 3. 1D Arrays and Strings
I V X L C D M
1
5
10
50
100
500
1000
For example, the number 1732 would be denoted MDCCXXXII in Roman numerals. However, Roman numerals are not a purely additive number system. In particular, instead of using four symbols to represent a 4, 40, 9, 90, etc. (i.e., IIII, XXXX, VIIII, LXXXX, etc.), such numbers are instead denoted by preceding the symbol for 5, 50, 10, 100, etc., with a symbol indicating subtraction. For example, 4 is denoted IV, 9 as IX, 40 as XL, etc.”
It turns out that every number between 1 and 3999 can be represented as a Roman numeral made up of the following one- and two-letter combinations:
I V X L C D M
1
5
10
50
100
500
1000
IV IX XL XC CD CM
4
9
40
90
400
900
Exercise 3.10 Write a program that reads a roman numeral (in the range 1 – 3999) and outputs the corresponding valid arabic integer. Amongst others, check that MCMXCIX returns 1999, MCMLXVII returns 1967 and that MCDXCI returns 1491.
You should use the following template :
You need to add the function romanToArabic().
#include
int romanToArabic( char *roman );
int main(int argc, char **argv) {
if( argc==2 ){
printf(“The roman numeral %s is equal to %d\n”, \ argv[1], romanToArabic(argv[1]));
}else{
printf(“ERROR: Incorrect usage, try e.g. %s XXI\n”, argv[0]);
}
return 0; }
3.9 Soundex Coding
First applied to the 1880 census, Soundex is a phonetic index, not a strictly alphabetical one. Its key feature is that it codes surnames (last names) based on the way a name sounds rather
3.9 Soundex Coding 27
www
than on how it is spelled. For example, surnames that sound the same but are spelled differently, like Smith and Smyth, have the same code and are indexed together. The intent was to help researchers find a surname quickly even though it may have received different spellings. If a name like Cook, though, is spelled Koch or Faust is Phaust, a search for a different set of Soundex codes and cards based on the variation of the surname’s first letter is necessary.
To use Soundex, researchers must first code the surname of the person or family in which they are interested. Every Soundex code consists of a letter and three numbers, such as B536, representing names such as Bender. The letter is always the first letter of the surname, whether it is a vowel or a consonant.
The detailed description of the algorithm may be found at :
http://www.highprogrammer.com/alan/numbers/soundex.html
The first letter is simply the first letter in the word. The remaining numbers range from 1 to 6, indicating different categories of sounds created by consanants following the first letter. If the word is too short to generate 3 numbers, 0 is added as needed. If the generated code is longer than 3 numbers, the extra are thrown away.
Code
Letters Description
1
B, F, P, V Labial
2
C, G, J, K, Q, S, X, Z Gutterals and sibilants
3
D, T Dental
4
L Long liquid
5
M, N Nasal
6
R Short liquid
SKIP
A, E, H, I, O, U, W, Y Vowels (and H, W, and Y) are skipped
There are several special cases when calculating a soundex code:
• Letters with the same soundex number that are immediately next to each other are discarded. So Pfizer becomes Pizer, Sack becomes Sac, Czar becomes Car, Collins becomes Colins, and Mroczak becomes Mrocak.
• If two letters with the same soundex number seperated by “H” or “W”, only use the first letter. So Ashcroft is treated as Ashroft.
Sample Soundex codes:
Word
Soundex
Washington
W252
Wu
W000
DeSmet
D253
Gutierrez
G362
Pfister
P236
Jackson
J250
Tymczak
T522
Ashcraft
A261
28 Chapter 3. 1D Arrays and Strings
Exercise 3.11 Write a program that takes the name entered as argv[1] and prints the corresponding soundex code for it.
4. 2D Arrays
4.1 The Game of Life
The Game of Life was developed by British mathematician John Horton Conway. In Life, a board represents the world and each cell a single location. A cell may be either empty or inhabited. The game has three simple rules, which relate to the cell’s eight nearest neigbours :
1. Survival An inhabited cell remains inhabited if exactly 2 or 3 of its neighbouring cells are inhabited.
2. Death An inhabited cell becomes uninhabited if fewer than 2, or more than 3 of its neighbours are inhabited.
3. Birth An uninhabited cell becomes inhabited if exactly 3 of its neighbours are inhabited. The next board is derived solely from the current one. The current board remains unchanged while computing the next board. In the simple case shown here, the boards alternate infinitely
between these two states.
The 1.06 format
A general purpose way of encoding the input board is called the Life 1.06 format :
http://conwaylife.com/wiki/Life_1.06
This format has comments indicated by a hash in the first column, and the first line is always:
#Life 1.06
Every line specifies an x and y coordinate of a live cell; such files can be quite long. The coordinates specified are relative to the middle of the board, so 01 means the middle row, one cell to the right of the centre.
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
www
The Game of Life
Life Wars
Wireworld
Forest Fire
Diffusion Limited Aggregation Langton’s Ant
ANSI Escape Sequences ncurses
30 Chapter 4. 2D Arrays There are hundreds of interesting patterns stored like this on the above site.
Exercise 4.1 Write a program which is run using the argc and argv parameters to main. The usage is as follows :
where file1.lif is a file specifying the inital state of the board, and 10 specifys that ten iterations are required.
Display the output to screen every iteration using plain text, you may assume that the board is 150 cells wide and 90 cells tall.
% life file1.lif 10
www
Alternative Rules for The Game of Life
The rules for life could also be phrased in a different manner, that is, give birth if there are two neighbours around an empty cell (B2) and allow an ‘alive’ cell to survive only if surrounded by 2 or 3 cells (S23). Other rules which are life-like exist, for instance 34 Life (B34/S34), Life Without Death (B3/S012345678) and HighLife (B36/S23).
http://en.wikipedia.org/wiki/Life-like_cellular_automaton
Exercise 4.2 Write a program that allows the user to input life-like rules e.g. :
or
and display generations of boards, beginning with the inital board in the input file.
life B34/S34 lifeboard.lif
life B2/S lifeboard.lif
4.2 Life Wars
Inspired by the classic game, Core Wars
http://en.wikipedia.org/wiki/Core_War
here we look at a two player version of Conway’s Game of Life.
In our game, each of two ‘players’ submit a Life 1.06 file and cells from these inserted into
an empty board. The cells are coded based on which player created them (say ‘+’ and ‘@’). The game is then run, and the player having most cells left after a fixed number of iterations, over many games, is deemed the winner.
The rules are :
1. The board is 150 cells wide, and 90 cells tall.
2. The board is toroidal; that is, it wraps around from the left edge to the right, and the top to
the bottom.
3. Each player can submit a Life 1.06 file that is maximum of 100 lines long, including the
header line.
4. Since each of the Life 1.06 files describe absolute positions for cells, each player is
assigned a random origin for their cells. If there is any collision when attempting to add the cells initially (i.e. both players happen to specify a cell in the same square), then new origins are chosen randomly and the process begun from scratch.
www
4.3 Wireworld 31
5. The ‘standard’ B3/S23 rules are used.
6. The colour of cells is never taken into account (i.e. cells are still either ‘alive’ or ‘dead’).
The sole exception to this is that when a cell is born, it takes the colour of the majority its
neighbours.
7. There are 5000 generations played every game.
8. A running count of how many cells each player has left at the end of each game is kept.
The board is cleared and the next game randomly restarted.
9. There are 50 games run in total.
10. The player with the highest number of cells over all 50 games is the winner.
It’s easy to extend these rules, of course, to allow for three or more players, but it’s unclear for three players what would happen in the majority vote if there was a draw (i.e. a new cell is born based on three cells around it, one belonging to each player. Other rule variants are also possible (e.g. Highlife (B36/S23), but once again, the birth rule for 3 or 6 neighbours would
cause majority voting issues for two and three players.
Exercise 4.3 Write a program that accepts two Life 1.06 files and reports which of them is the winner. The first few games might look like:
% /lifewars blinkerpuffer2.lif bigglider.lif
0 50
1 370 141
2 437 281
3 450 602
12 Player 1 Player 1 Player 1 Player 2
4 540 623 5 991 629 6 1063 674 7 1211 707 8 1263 735 9 1358 758
Player 2 Player 1 Player 1 Player 1 Player 1 Player 1
. . .
Player 1 wins by 7857 cells to 2373 cells
4.3 Wireworld
Wireworld is a cellular automaton due to Brian Silverman, formed from a 2D grid of cells, designed to simulate digital electronics. Each cell can be in one of four states, either ‘empty’,
‘electron head’, ‘electron tail’ or ‘copper’ (or ‘conductor’).
The next generation of the cells follows the rules, where n is the number of electron heads
found in the 8-surrounding cells: • empty − > empty
• electron head − > electron tail
• electron tail − > copper
• copper−>electronheadifn==1orn==2 • copper − > copper otherwise
See also:
www
www
32 Chapter 4. 2D Arrays https://en.wikipedia.org/wiki/Wireworld
http://www.heise.ws/fourticklogic.html
Exercise 4.4 Write a program which is run using the argc and argv parameters to main. The usage is as follows :
where wirefile.txt is a file specifying the initial state of the board. This file codes empty cells as ‘ ’, heads as ‘H’, tails as ‘t’ and copper as ‘c’. Display the board for 1000 generations using plain text. You may assume that the grid is always 40 cells by 40
$ wireworld wirefile.txt
Ensure your code is C90 (ANSI) compliant, and fully follows the house-style guidelines. Show that your code has been developed using short, well-tested functions via the use of assert() testing.
4.4 Forest Fire
You can create a very simple model of forest fires using a cellular automaton in the form a 2D grid of cells. Each cell can be in one of three states; either ‘empty’, ‘tree’, or ‘fire’. The next generation of cells follows these rules:
• A ‘fire’ cell will turn into an ‘empty’ cell.
• A ‘tree’ that is within the 8-neighbourhood of a ‘fire’ cell will itself become ‘fire’. • A ‘tree’ will burn (due to a lightning strike) 1 time in L.
• An ‘empty’ space will become a ‘tree’ (spontaneous growth) 1 time in G.
See also:
https://en.wikipedia.org/wiki/Forest-fire_model
https://www.aryan.app/randomstuff/forestfire.html
You can experiment with different values of L and G, but a useful starting point is G = 250 and L = 10×G.
Ensure your code is C90 (ANSI) compliant, and fully follows the house-style guidelines. Show that your code has been developed using short, well-tested functions via the use of assert() testing.
www
www
Exercise4.5 Writeaprogramwhichcreatesanempty2Dgridofcells,80wideand30high. Then, apply the rules above to iterate the simulation, so that the next generation is created from the previous one.
Print out every generation onto the screen, using a space to represent an empty cell, the @ character for a tree, and * for fire cells. Display the board for 1000 generations using plain text. You may assume that the grid is always 80 cells by 30
4.5 Diffusion Limited Aggregation 33
4.5 Diffusion Limited Aggregation
https://en.wikipedia.org/wiki/Diffusion-limited_aggregation
In its simplest form, DLA occurs on a grid of square cells. The cell at the center of the grid is the location of the seed point, a particle stuck at that square. Now pick a square on the perimeter of the grid and place a wandering particle on that square. At each iteration, this particle moves to one of the four adjacent squares, left, right, above, or below. When a wandering particle arrives at one of the four squares adjacent to the seed, it sticks there forming a cluster of two particles, and another edge particle is released. When a moving particle arrives at one of the squares adjacent to the cluster, it sticks there.
+++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++@+++++++++++++++++++++++++++ + + + + + + + + + + + + + + + + + + + + + +@@@@@@+ + + + + + + + + + + + + + + + + + + + + + + +++++++++++++++++++++++++@+++++++++++++++++++++++++ +++++++++++++++++++++++++@+++@+++++++++++++++++++++ + + + + + + + + + + + + + + + + + + + + + + + + +@@@+@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@+ + + +@@@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ + + + + + @ + + + @ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ + + + + + @ + @@@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@+ + + +@@+@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@@@+ + +@@@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@@ + @@ + + + @ + @ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@@@@+@@@@@@@@@@+ + + + + + + + + + + + + + + + ++++++++++++++++++++++++++@+++@+@++++++++++++++++++ ++++++++++++++++++++++++++@+++++@++++++++++++++++++ + + + + + + + + + + + + @ + + + + + + @@ + + + + + @@ + + + + + + + + + + + + + + + + + + + + + + + ++++++++++++@+++++++@+@+@+@++++++++++++++++++++++++ + + + + + + + + + + +@@+ + + + +@@@@@@@@@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ + + + + + + + + + + + @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@+@@@@@@@+ +@@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@@@@@@@@@+@@@@@@@@@@@@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + @@ + @ + @ + + + @ + + + + + + + @ + + + + @ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@@+ + + + + + +@@+ + + + + + + + + + + + + + + + + + + + + + + + +++++++++++++++++++++++++@++++++@++++++++++++++++++ + + + + + + + + + + + + + + + + + + + + + + + + +@+ + + + +@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @ + + + @@ + + @@@@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@+ + +@@@@@@@@@@+@+ + + + +@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + +@@@@@@+ +@+ +@+ + + + + + + + +@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@+ + + +@@@+@@+ + + + + + +@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @ + + + + @ + @ + + @@ + + @ + + + @ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@@@@+ + + + + +@+ +@@+ +@+ +@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@+ + + + + + + + +@@@@@@+@@+@+@+ + + + + + + + + + + + + + + + + + + + + + + + + + + @@@ + + + + + + + + @ + + @ + @@@@@@@@@ + + + + + + + + + + + + + + + + + + + + + + + + + + +@@+ + + + + + + +@@@@+ +@@+@+@+@@@+ + + + + + + + + + + + + + + + + + + + + + + + + +@+ + + + + +@@@+@+ + +@+ +@+@+@@@@+ + + + + + +++++++++++++++++++@++++++++++++++++++++++@++++++++ ++++++++++++++++++++++++++++++++++++++++++@++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++
www
www
Note that the grid is toroidal – this means that if a particle goes off the top of the grid, it reappears at the bottom. Likewise, if it goes off the edge, it will appear on the other side.
Exercise4.6 Usingthealgorithmoutlinedabove,usinga50×50board,outputeachof250 iterations (here iteration means when each particle has become ‘stuck’). The output should be in plain text.
In the basic version of a DLA, a particle ‘sticks’ to the seed points when it gets adjacent to one. Here we introcude the probability of stickiness, a number between 0 and 1, called ps. The particle only sticks to a seed point with a probabilty of ps; if it doesn’t stick it continues wandering. This allows for the patterns that emerge to be more hairy and solid:
http://paulbourke.net/fractals/dla/
Exercise 4.7 Implement this stickiness concept by extending the program writen for Exer- cise 4.6, allowing the user to specify a ‘stickiness’ probablity via the command line using argv[1]. Here a value of 1.0 means that the particle will always stick, and 0.5 means that it will stick one time in 2.
34 Chapter 4. 2D Arrays Here’s an example when setting ps = 0.25:
+++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@+@+@@@@@+ +@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@@@+ + +@@+ + +@@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@@+ +@@@@@@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@@+@@@@+@+@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@@@@+@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@@+ +@@+ + + + + +@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@@@+@@@+ + +@+@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@@@+@+@@@@@@@+ + +@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@@@@@@@@+ + + + +@+ +@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@@@@@+ + + + + + + +@@@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ + + @ + + + + + @ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@+ +@@+ + + +@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@@ + + + + @ + @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@@@@@@@@@@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @ + @@@@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@+ + +@@@+@+@@@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@+ + +@@@@@+@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@@+ +@@@@@+ + +@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@@@@@@@ + @ + + + + @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@+ +@@+ +@@+ + + + +@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@+ + + +@@+ + +@@@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @ + + + + @ + + + @@ + @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@+ +@@+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @ + + + @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +@@@+ + + + + + + + + + + + + + + + + + +++++++++++++++++++++++++++++++@+++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++
Using different planar walks. e.g. in base-4: http://demonstrations.wolfram.com/UsingIrrationalSquar
4.6 Langton’s Ant
http://en.wikipedia.org/wiki/Langton’s_ant
Exercise 4.8 Write a program that shows the behaviour of Langton’s Ant and displays it on the screen using text. The ant will start in the middle, and each cell will have 2 states : ‘#’ (white) or ‘.’ (black).
At each step, the ant:
• At a white square, turn 90° right, flip the color of the square, move forward one unit • At a black square, turn 90° left, flip the color of the square, move forward one unit
After each step, display the board, and wait for the user to type a character before repeating.
After 30 or so iterations, the board might look something like :
++++++++++++++++ ++++++++++++++++ ++++++++++++++++ ++++++++++++++++ ++++++++++++++++ ++++++++++++++++ ++++++++++++++++ ++++++++++++++++ ++++++++++++++++ ++++++++++++++++ ++++++++++++++++ ++++++++++++++++ ++++++++++++++++ ++++++++++++++++ ++++++++++++++++
www
4.7 ANSI Escape Sequences 35
4.7 ANSI Escape Sequences
C has no inherent functionality to allow printing in colour etc. Therefore, some terminals allow certain control codes to be printed to move the cursor, change the foreground and background colour, and so on. Note that only some terminals support these control code – if you’re using a different one, garbage may appear on the screen instead.
https://en.wikipedia.org/wiki/ANSI_escape_code
I’ve created a very simple set of functions which shows a small fraction of this functionality called demo_neillsimplescreen.c which using the functions found in neillsimplescreen.c
To build and execute this code, use the Makefile provided by typing:
www
$ make demo_neillsimplescreen
gcc demo_neillsimplescreen.c neillsimplescreen.c
-o demo_neillsimplescreen -Wall -Wextra -pedantic -std=c90 -g3 -fsanitize=undefined -fsanitize=address -lm $ ./demo_neillsimplescreen
Exercise 4.9 Adapt the Forest Fire code in Exercise 4.5 so that the output is displayed using the simple funtions demonstrated above, with trees being green, fire being red and background being empty. The main loop will update the forest, clear the screen, display it, wait for a very short time (e.g. 0.10s) and repeats.
4.8 ncurses
C has no inherent functionality to allow printing in colour etc. Therefore, a programming library know a ncurses was created in 1993 to allow terminals to interpret certain control-codes as colours and other effects.
The library itself is somewhat complex, allowing keyboard and mouse events to be captured and a whole range of simple graphics functionality. On the web page is my ‘wrapper’ for the library, along with a program demonstrating its use. This will only work in unix-style terminals. Note that after you begin ncurses mode (using Neill_NCURS_Init()) that you can’t print to stdout or stderr, until you switch it off (using Neill_NCURS_Done()).
To compile the code you’ll have to use both my code neillncurses.c and also link in the ncurses library. A typical compile might look like
If you’re running a virtual box you may also need to install the ncurses developer files, including ncurses.h, using:
sudo apt install libncurses -dev
Some terminals do not support ncurses, so make sure you are using an ‘xterm’ or equaivalent.
gcc yourcode.c neillncurses.c -Wall -Wfloat-equal -Wextra -O2 -pedantic -std=c90 -lncurses -lm
Exercise4.10 AdaptthewireworldcodeinExercise4.4sothattheoutputisdisplayedusing this library, with tails being red, heads being blue, copper being yellow and background being black. The main loop will update the board, display it, and repeat until a quit event occurs (e.g. a mouse click or the ESC key is pressed).
36 Chapter 4. 2D Arrays
Exercise 4.11 Adapt the life code in Exercise 4.1 so that the output is displayed using this library, with sensible choices made for cell colours. The main loop will update the board, display it, and repeat until a quit event occurs (e.g. a mouse click or the ESC key is pressed).
5.1 Anagrams
5. Strings, Recursion and SDL
An anagram is a collection of letters that when unscrambled, using all the leters, make a single word. For instance magrana can be rearranged to make the word anagram.
Exercise 5.1 Using a file of valid words, allow the user to enter an anagram, and have the answer(s) printed. For instance :
% ./anagram sternaig angriest
astringe
ganister
gantries ingrates
rangiest reasting stearing
Exercise 5.2 Using a file of valid words, find all words which are anagrams of each other. Each word should appear in a maximum of one list. Output will look something like :
% ./selfanagram .
.
7 merits mister miters mitres remits smiter timers
.
.
.
6 noters stoner tenors tensor toners trones .
.
.
6 opts post pots spot stop tops
Anagrams
Draw to Unlock
SDL – Intro
Word Ladders
The Devil’s Dartboard Maze
38 Chapter 5. Strings, Recursion and SDL
.
.
.
6 restrain retrains strainer terrains trainers transire .
If you wished to create “interesting” anagrams, rather than simply a random jumble of letters, you could combine together two shorter words which are an anagram of a longer one.
Exercise 5.3 Write a program which uses an exhaustive search of all the possible pairs of short words to make the target word to be computed. For instance, a few of the many pairs that can be used to make an anagram of compiler are :
% ./teabreak compiler LiceRomp
LimeCrop
LimpCore
MileCrop MoreClip PermCoil PromLice RelicMop
The name Campbell comes out as CalmPleb which is a bit harsh. Can’t ever remember being called calm …
5.2 Draw to Unlock
Rather than remembering passwords or passcodes, many mobile devices now allow the user to
draw a pattern on the screen to unlock them.
Here we will explore how many unique patterns are available when drawing such patterns to connect “dots”, such as shown in the figure. We assume that people put their finger on one “dot” and then only ever move one position left, right, up or down (but never diagonally) at a time.
5.3 SDL – Intro 39
You are not allowed to return to a “dot” once it has been visited once. If we number the first position in our path as 1, the second as 2 and so on, then beginning in the top left-hand corner, some of the possible patterns of 9 moves are :
123 123 123 654 894 874 789 765 965
Exercise 5.4 Write a program that computes and outputs all the valid paths. Use recursion to achieve this.
• How many different patterns of length 9 are available on a 3 × 3 grid, if the user begins in the top left corner ?
• How many different patterns of length 9 are available on a 3 × 3 grid, if the user begins in the middle left ?
• How many different patterns of length 7 are available on a 3 × 3 grid, if the user begins in the top left corner ?
• How many different patterns of length 25 are available on a 5 × 5 grid, if the user begins in the top left corner ?
5.3 SDL – Intro
Many programming languages have no inherent graphics capabilities. To get windows to appear on the screen, or to draw lines and shapes, you need to make use of an external library. Here we use SDL1, a cross-platform library providing the user with (amongst other things) such graphical capabilities.
https://www.libsdl.org/
The use of SDL is, unsurprisingly, non-trival, so some simple wrapper files have been created (neillsdl2.c and neillsdl2.h). These give you some simple functions to initialise a window, draw rectangles, wait for the user to press a key etc.
An example program using this functionality is provided in a file demo_neillsdl2.c.
This program initialises a window, then sits in a loop, drawing randomly positioned and coloured squares, until the user presses the mouse or a key.
5.4 Word Ladders
In this game, made famous by the author Lewis Caroll, and investigated by many Computer Scientists including Donald Knuth, you find missing words to complete a sequence. For instance, you might be asked how go from “WILD” to “TAME” by only changing one character at a time:
1 actually, we are using the most recent version SDL2, which is installed on all the lab machines
www
Exercise5.5 UsingtheMakefileprovided,compileandrunthisprogram,usingsomething like : make -f sdl_makefile
SDL is already installed on lab machines. At home, if you’re using a ubuntu-style linux machine, use: sudo apt install libsdl2-dev to install it.
40
Chapter 5. Strings, Recursion and SDL
WILD WILE TILE TALE TAME
In a heavily constrained version of this game we make some simplifying assumptions: • Words are always four letters long.
• We only seek ladders of five words in total.
• Only one letter is changed at a time.
• A letter is only changed from its initial state, to its target state. This is important, since if you decide to change the second letter then you will always know what it’s changing from, to what it’s changing to.
So, in the example above, it is enough to give the first word, the last word, and the position of the character which changed on each line. On line one, the fourth letter ‘D’ was changed to an ‘E’, on the next line the first character ‘W’ was changed to a ‘T’ and so on. The whole ladder can
be defined by “WILD”, “TAME” and the sequence 4,1,2,3.
WILD WILE TILE TALE TAME
Since each letter changes exactly once, the order in which this happens is a permutation of the numbers 1,2,3,4, which we have looked at elsewhere.
Exercise 5.6 For the constrained version of the game, given a file of valid four letter words, write a program which when given two words on the command line (argv[1] and argv[2]) outputs the correct solution, if available. Use an exhaustive search over the 24 permutations until one leads to no invalid words being required. Make sure your program works, with amongst others, the following:
COLD POKE CUBE CORD POLE CUBS CARD POLL TUBS WARD MOLL TUNS WARM MALL TONS
Exercise 5.7 Adapt the program above so that if the first and last words share a letter (the edit distance is less than 4), you can find the word ladder required, as in:
5.4 Word Ladders
41
but but cat cat cob cob cod cod cog cog con con con con cot cot cow cow cub cub cud cud cue cue cup cup cur cur cut cut dog dog dot dot got got gut gut hut hut jot jot lot lot not not nut nut pot pot put put rot rot rut rut tot tot
but cat cob cod cog con con cot cow cub cud cue cup cur cut dog dot got gut hut jot lot not nut pot put rot rut tot
Figure 5.1: Word Ladder from CAT to DOG. (Left) Words which are distance one from CAT are COT and CUT. (Middle) Words which are distance one from CUT and COT, which includes amongst others, DOT. (Right) DOG is distance one from DOT. We now have our route, via the pointers, back to CAT.
WASP WASH WISH
FISH
For the “full” version of Wordladder, you make no assumptions about the number of words that are needed to make the ladder, although we do assume that all the words in the ladder are the same size.
Exercise 5.8 To achieve this, you could make a list of all the words, and for all words an edit distance of 1 away from the initial word, mark these and store their ‘parent’. Now, go through this list, and for all words marked, find words which are distance 1 from these, and hence distance 2 from the initial word. Mark these and retain their parent. Be careful you don’t use words already marked. If the word ladder is possible, you’ll eventually find the solution, and via the record of the parents, have the correct route. This is shown for the word ladder CAT to DOG in Figure 5.1 using a very small subset of the possible three letter words.
42 Chapter 5. Strings, Recursion and SDL 5.5 The Devil’s Dartboard
Treble 20 Double 4
12 5 20 1 18 94
14 13 11 6 8 10
15
In the traditional ‘pub’ game, darts, there are 62 different possible scores : single 1 – 20 (the white and black areas), double 1 – 20 (the outer red and green segments) (i.e. 2, 4, 6, 8 . . .), treble 1 – 20 (i.e. 3, 6, 9, 12 . . .) (the inner red or green segments), 25 (small green circle) and 50 (the small red inner circle).
It’s not obvious, if you were inventing darts from scratch, how best to lay out the numbers. The London board shown seems to have small numbers near high numbers, so that if you just miss the 20 for example, you’ll hit a small number instead.
Here we look at a measure for the ‘difficulty’ of a dartboard. One approach is to simply sum up the values of adjacent triples, squaring this number. So for the London board shown, this would be: (20+1+18)2 +(1+18+4)2 +(18+4+13)2 …(5+20+1)2 = 20478
For our purposes a lower number is better2. For more details see : http://www.mathpages.com/home/kmath025.htm
16
7 19 3 17 2
www
Exercise 5.9 Write a program that repeatedly chooses two positions on the board and swaps their numbers. If this leads to a lower cost, keep the board. If not, unswap them. Repeat this greedy search 5000000 times, and print out the best board found. Begin with the trivial monotonic sequence. The output may look something like :
or
Total=19966: 31911 21812 12010 416 8
14 5 13 15 6 7 17 9
Total=19910: 31810 516 9 81411 419 6 720 21315 11712
The score of 19874 seems to be the lowest possible that may be obtained via this technique.
5.6 Maze
Escaping from a maze can be done in several ways (ink-blotting, righthand-on-wall etc.) but
here we look at recursion.
2It’s beyond the scope here to explain why!
5.6 Maze 43
Exercise 5.10 Write a program to read in a maze typed by a user via the filename passed to argv[1]. You can assume the maze will be no larger than 20 × 20, walls are designated by with a # and the rest are spaces. The entrance can be assumed to be the gap in the wall closest to (but not necessarilty exactly at) the top lefthand corner. The sizes of the maze are given on the first line of the file (width,height). Write a program that finds the route through a maze, read from this file, and prints out the solution (if one exists) using full stops. If the program succeeds it should exit with a status of 0, or if no route exists it should exit with a status of 1.
##########
..
#
……
#
#
.
#
.
#
.
##
.
#
#
.
#
.
####
.
#
#
.
#
….
#
.
#
#
.
#
.
####
.
#
#
.
#
….
#
.
#
#
.
####
.
#
.
#
#
……
#
..
##########
6. Lists, Insertion Sort & more Recursion
6.1 A Simple Spelling Checker
Here, we reads words one at a time from file and carefully place them in the correct part of our data structure. This has a complexty of O(n2).
For this purpose, a list of valid words (unsorted) is available from the usual place.
Exercise 6.1 Write a program which, based on a list implemented using arrays, reads the words in one at a time, inserting them into the correct part of the list so that the words are alphabetically sorted. The name of the file should be passed as argv[1], and you can assume the array is of a fixed-size, and large enough to hold all words. How long does it take to build the list ?
Exercise6.4 Writeaprogramwhich,basedonalinkedlistdatastructure,readsthewordsin one at a time, inserting them into the correct part of the list so that the words are alphabetically sorted. The name of the file should be passed as argv[1]. How long does it take to build the list ?
6.2 Prime Factors
en.wikipedia.org/wiki/Prime_factor
It is well known that any positive integer has a single unique prime factorization, e.g.: 210 = 7×5×3×2 (the numbers 7,5,3 and 2 are all prime).
www
Exercise 6.2 Now extend Exercise 6.1 so that when the user is prompted for a word, they are told whether this word is present in the list or not. Use a binary search to achieve this. How much faster is this than a linear search ?
Exercise 6.3 Now extend Exercise 6.1 so that when the user is prompted for a word, they are told whether this word is present in the list or not. Use an interpolation search to achieve this. How much faster is this than a linear search ?
A Simple Spelling Checker Prime Factors
Sierpinski Carpet Sierpinski Squares
46 Chapter 6. Lists, Insertion Sort & more Recursion
117 = 13×3×3 (the numbers 13 and 3 are all prime).
197 is prime, so has only itself (and 1, which we ignore) as a factor.
Exercise 6.5 Write a program that, for any given positive integer input using argv[1], lists the prime factors, e.g.:
[campbell@icy]% ./primefacts 210 7532
[campbell@icy]% ./primefacts 117 13 3 3
Exercise 6.6 To make the output of the above program briefer, many prefer to show the factors expressed by their power, as in :
768 = 2×2×2×2×2×2×2×3 could be better expressed as :
768 = 28 × 3
Write a program to show the factorisation of a number in this more compact style :
% ./primefactors 27000 27000 = 1 x 2^3 x 3^3 x 5^3
% ./ primefactors 31 31 = 1 x 31
% ./primefactors 38654705664
38654705664 = 1 x 2^32 x 3^2
Exercise 6.7 For a beautiful visualisation of prime factors, see:
www.datapointed.net/visualizations/math/factorization/animated-diagrams
Adapt the program above to output a pattern similar to the animated display above, using SDL, but only for a single number, not an animation.
www
6.3 Sierpinski Carpet 47
6.3 Sierpinski Carpet
en.wikipedia.org/wiki/Sierpinski_carpet
The square is cut into 9 congruent subsquares in a 3-by-3 grid, and the central subsquare is removed. The same procedure is then applied recursively to the remaining 8 subsquares, ad infinitum.
Exercise 6.8 Write a program that, in plain text, produces a Sierpinski Carpet.
Exercise 6.10 Write a program that, using SDL, produces a Sierpinski Carpet.
www
www
Exercise 6.9 Write a program that, using neillsimplescreen, produces a Sierpinski Car- pet.
6.4 Sierpinski Squares See also :
en.wikipedia.org/wiki/Sierpinski_triangle
The Sierpinski triangle has the overall shape of an equilateral triangle, recursively subdivded into four smaller triangles :
www
48 Chapter 6. Lists, Insertion Sort & more Recursion
However, we can approximate it by recursively drawing a square as three smaller squares, as show below :
The recursion should terminate when the squares are too small to draw with any more detail (e.g. one pixel, or one character in size).
Exercise 6.11 Write a program that, in plain text, produces a Sierpinski Triangle.
Exercise 6.13 Write a program that, using SDL, produces a Sierpinski Triangle.
Exercise 6.12 Write a program that, using neillsimplescreen, produces a Sierpinski Triangle.
7. Searching Boards
7.1 Conway’s Soldiers
The one player game, Conway’s Soldiers (sometimes known as Solitaire Army), is similar to peg solitaire. For this exercise, Conway’s board is a 7 (width) × 8 (height) board with tiles on it. The lower half of the board is entirely filled with tiles (pegs), and the upper half is completely empty. A tile can move by jumping another tile, either horizontally or vertically (but never diagonally) onto an empty square. The jumped tile is then removed from the board. A few possible moves are shown below:
…….
…….
…….
…….
XXXXXXX
XXXXXXX
XXXXXXX
XXXXXXX
..
…..
..
…..
.
……
..
X
….
XX
. .
XXXX
XX
XXXX
XXXXXXX
XXXXXXX
…
….
…
….
…
….
..
X
X
…
XX
. .
. .
XXX
XX
XXX
XXX
XXXX
XXX
XXXX
…….
…….
…….
….
X
..
XX
..
XXX
XX
..
XXX
XXXXXXX
XXXXXXX
The user enters the location of an empty square they’d like to get a tile into, and the program demonstrates the moves that enables the tile to reach there (or warns them it’s impossible). To do this you will use a list of boards. The initial board is put into this list. Each board in the list is, in turn, read from the list and all possible moves from that board added into the list. The next board is taken, and all its resulting boards are added, and so on.
Each structure in the list will contain (amongst other things) a board and a record of its parent board, i.e. the board that it was created from.
Use the algorithm described above and not anything else.
Exercise 7.1 Write a program that:
• Inputs a target location for a tile to reach (x in argv[1], y in argv[2]).
• Demonstrates the correct solution (reverse order is fine) using plain text.
Conway’s Soldiers The 8-Tile Puzzle Happy Bookcases
50 Chapter 7. Searching Boards 7.2 The 8-Tile Puzzle
The Chinese 8-Tile Puzzle is a 3 × 3 board, with 8 numbered tiles in it, and a hole into which
neighbouring tiles can move:
After the next move the board could look like: or The problem generally involves the board starting in a random state, and the user returning the board to the ‘ordered’ ”12345678” state.
In this problem, a solution could be found in many different ways; the solution could be recursive, or you could implement a queue to perform a breadth-first seach, or something more complex allowing a depth-first search to measure ‘how close’ (in some sense) it is to the correct solution.
123
456
78
123
45
786
1
23
4
56
7
8
Exercise 7.2 Read in a board using argv[1], e.g.:
To do this you will use a list of boards. The initial board is put into this list. Each board in the list is, in turn, read from the list and all possible moves from that board added into the list. The next board is taken, and all its resulting boards are added, and so on. This is, essentially, a queue.
However, one problem with is that repeated boards may be put into the queue and ‘cycles’ occur. This soon creates an explosively large number of boards (several million). You can solve this by only adding a board into the queue if an identical one does not already exisit in the queue. A linear search is acceptable for this task of identifying duplicates. Each structure in the queue will contain (amongst other things) a board and a record of its parent board, i.e. the board that it was created from.
Be advised that a solution requiring as ‘few’ as 20 moves may take 10’s of minutes to compute. If the search is successful, display the solution to the screen using plain-text.
Use the method described above and only this one. Use a static data structure to acheive this (arrays) and not a dynamic method such as linked-lists; a (large) 1D array of structures is acceptable. Because this array needs to be so large, it’s best to declare it in main() using something like:
$ 8tile “513276 48”
static boards[NUMBOARDS];
Exercise 7.3 Repeat Exercise 7.2, but use SDL for the ouput rather than plain-text.
Exercise7.5 RepeatExercise7.4,butusinga5×5boardinstead. 7.3 Happy Bookcases
In a quiet part of our building, there are some rather strange bookcases. They are (like most bookcases) generally happy, but they become unhappy when their books are not arranged
Exercise 7.4 Repeat Exercise 7.2, but using a dynamic (linked-list), so that you never have to make any assumptions about the maximum numbers of boards stored.
7.3 Happy Bookcases 51
correctly (which, even in a Computer Science Department, is somewhat unusual). After years of dedicated research, a team of scientists led by Simon Lock and Sion Hannuna came to understand the trick to making the bookcases happy again. It turned out that a bookcase is only happy if :
• Each shelf only has books of one colour (or is empty).
• All books of the same colour are on the same shelf.
• Theonlybooksthatexistsareblack(K),red(R),green(G),yellow(Y),blue(B),magenta(M),
cyan(C) or white(W).
However, to make things worse, there are some complex rules about how books may be re- arranged :
1. You can only move one book at a time.
2. The only book that can move is the rightmost one from each shelf. 3. The book must move to become the rightmost book on its new shelf. 4. You can’t put more books on a shelf than its maximum size.
So, for instance, the bookcase below has three shelves, each of which can fit three books; the first contains only one book, the second has three books, and the third shelf has two books on it.
By following the rules, highly-trained librarians can rearrange the books to make the bookcase happy again. One such way of re-arranging the books correctly is shown below :
Here’s another example of an unhappy bookcase, and how to reaarange the books to make it happy :
Y..
BBY
YB.
YY.
BB.
YB.
YY.
BBB
Y..
YYY
BBB
…
RG..
GR..
KK
..
KK
..
R…
GR..
KK
G.
KK
..
RR..
G…
KK
G.
KK
..
RR..
GG..
KK
..
KK
..
RR..
GG..
K K
…
KK
.
RR..
GG..
….
KKKK
Exercise 7.6 Write a program that reads in a bookcase definition file (specified on the command line), and shows the ‘moves’ to make the bookcase happy. Such a file looks something like :
The first line has two or three numbers on it; the height of the bookcase (number of shelves), the width (maximum books per shelf) and an optional hint as to the minimum number of bookcases involved when ‘solving’ this bookcase. (This number is meaningless for a bookcase that cannot be made happy.) The number includes the original bookcase, and the final ‘happy’ one in the count. For the bookcase shown in this file, one possible solution is :
In the file, an empty space is defined by a full-stop character. You may assume that the
437 RG.
GR. CY. YC.
RG.
GR.
CY.
YC.
R..
GR.
CYG
YC.
RR.
G..
CYG
YC.
RR.
GG.
CY.
YC.
RRY
GG.
C..
YC.
RRY
GG.
CC.
Y..
RR.
GG.
CC.
YY.
52 Chapter 7. Searching Boards
maximum height and width of a bookcase is 9.
The brute-force algorithm for searching over all moves to make the bookcase happy goes like this :
1. You will use a list of bookcases (here list could either be an array, or a linked list).
2. The initial bookcase is put into the front of this list.
3. Take a bookcase from the front of the list.
4. For this (parent) bookcase, find the resulting (child) bookcases which can be created
from all the valid possible single book moves. Put each of these bookcases into the end of the list. There may be as many as height × (height − 1) of these. If you have found a happy bookcase, stop. Else, go to 3.
To help with printing out the correct moves when a solution has been found, each structure in the list will need to contain (amongst other things) a bookcase and a record of its parent bookcase, i.e. the bookcase that it was created from. For an array, this could simply be which element of the array was the parent, or for a linked list, this will be a pointer.
The program reads the name of the bookcase definition file from argv[1]. If it finds a successful way to make the bookcase happy, it prints out the number of bookcases that would be printed in the solution and nothing else, or else exactly the phrase ‘No Solution?” if none can be found :
If the ‘verbose’ flag is used (argv[2]), your program will additionally print out the solution (reverse order is fine) :
Your program :
• Must use the algorithm detailed above (which is similar to a queue and therefore a
breadth-first search). Other search algorithms are possible (e.g. best-first, guided, recursive etc.) but the quality of coding is being assessed, not the quality of the algorithm used!
$ ./bookcase 7
$ ./ bookcase No Solution?
$ ./ bookcase 1
rrggccyy -437.bc rrrr -22. bc
ccbb -23. bc
$ ./bookcase ccbb -23.bc verbose 1
CC. BB.
$ ./bookcase rgbrmrykwrrr -3521.bc verbose No Solution?
$ ./ bookcase yby -222. bc verbose 2
Y. BY
YY B.
7.3 Happy Bookcases 53
• Should check for invalid bookcase definition files, and report in a graceful way if there is a problem, aborting with exit(EXIT_FAILURE) if so.
• May display the bookcases in colour if you wish – if so use neillsimplescreen to do so.
• Should not print anything else out to screen after successfully completing the search, except that which is shown above. Automated checking may be used, and therefore the ouput must be precise.
• Should call the function test() to perform any assertion testing etc. Extension
Basic assignment = 90%. Extension = 10%.
If you’d like to try an extension, make sure to submit extension.c and a brief description in a extension.txt file. This could involve a faster search technique, better graphical display, user input or something else of your choosing. The extension will be marked in the same way as the main assignment.
9. Huffman, Trees & Bitwise
9.1 Depth
The following program builds a binary tree at random:
#include
#include
#define STRSIZE 5000
struct node{ char c;
};
struct node *left; struct node *right;
typedef struct node Node;
Node *MakeNode(char c);
void InsertRandom(Node *t, Node *n); char *PrintTree(Node *t);
int main(void) {
char c;
Node *head = MakeNode(’A’); Node *n;
srand(time(NULL));
for(c = ’B’; c < ’G’; c++){
n = MakeNode(c);
InsertRandom(head, n); }
printf("%s\n", PrintTree(head));
return 0; }
Depth
Two Trees
Huffman Encoding Binary Tree Visualisation Boolean Arrays
Advent of Code Movember
56 Chapter 9. Huffman, Trees & Bitwise
Node *MakeNode(char c) {
Node *n = (Node *)calloc(1, sizeof(Node));
assert(n != NULL); n−>c = c;
return n;
}
void InsertRandom(Node *t, Node *n) {
if((rand()%2) == 0){ /* Left */ if(t−>left == NULL){
t−>left = n;
} else{
InsertRandom(t−>left, n); }
}
else{ /* Right */
if(t−>right == NULL){ t−>right = n;
} else{
InsertRandom(t−>right, n); }
} }
char *PrintTree(Node *t) {
char *str;
assert((str = calloc(STRSIZE, sizeof(char))) != NULL); if(t == NULL){
strcpy(str, “*”);
return str; }
sprintf(str, “%c (%s) (%s)”, t−>c, PrintTree(t−>left), PrintTree(t−>right)); return str;
}
Each node of the tree contains one of the characters ’A’ … ’F’. At the end, the tree is printed out in the manner described in the course lectures.
Exercise 9.1 Adapt the code so that the maximum depth of the tree is computed using a recursive function. The maximum depth of the tree is the longest path from the root to a leaf. The depth of a tree containing one node is 1.
9.2 Two Trees 57
9.2 Two Trees
Adapt the code shown in Exercise 9.1, so that two random trees are generated.
Exercise 9.2 Write a Boolean function that checks whether two trees are identical or not.
9.3 Huffman Encoding
Huffman encoding is commonly used for data compression. Based on the frequency of occurence of characters, you build a tree where rare characters appear at the bottom of the tree, and commonly occuring characters are near the top of the tree.
For an example input text file, a Huffman tree might look something like:
00101( 5* 125)
110 (
111001010 (
00100000 (
01100000100 (
01100001101 (
1001001 (
0010010 (
1001100 (
00100110000 (
11100110010 (
00100010 (
01100000101 (
01100001001 (
11100110011 (
01100001000 (
01100001100 (
001001101 (
10010000 (
01100001011 (
00100111 (
111001101 (
10011011 (
111001110 (
10011010 (
111001000 (
010:
‘’:
‘”’ :
‘’’ :
‘(’ :
‘)’ :
‘,’ :
‘-’ :
‘.’ :
‘/’ :
‘0’ :
‘1’ :
‘3’ :
‘4’ :
‘5’ :
‘6’ :
‘7’ :
‘8’ :
‘9’ :
‘:’ :
‘A’ :
‘B’ :
‘C’ :
‘D’ :
‘E’ :
‘F’ :
‘G’ : 0110000000 ( 10 * 4) ‘H’ : 1110011111 ( 10 * 7) ‘I’ : 1110010011 ( 10 * 6) ‘J’ : 11100111101 ( 11* 3) ‘K’ : 1110010111 ( 10 * 6)
‘L’ :
‘M’ :
‘N’ :
‘O’ :
‘P’ : 1110011000 ( 10 * 6) ‘R’ : 0110000111 ( 10 * 5) ‘S’ : 10010001 ( 8* 19) ‘T’ : 0010011001 ( 10 * 4) ‘U’ : 1110010010 ( 10 * 5) ‘W’ : 0110000001 ( 10 * 4) ‘a’ : 1010 ( 4* 339) ‘b’ : 1111110 ( 7* 60)
00100011 ( 8* 15) 11100111100 ( 11* 3) 01100001010 ( 11* 2) 01100000111 ( 11* 2)
3 * 792) 9* 12) 8* 15)
11* 2) 11* 2) 7* 39) 7* 31) 7* 40) 11* 1) 11* 3) 8* 15) 11* 2) 11* 2) 11* 3) 11* 2) 11* 2) 9* 8) 8* 18) 11* 2) 8* 16) 9* 13) 8* 22) 9* 13) 8* 19) 9* 11)
58
Chapter 9. Huffman, Trees & Bitwise
‘c’ :
‘d’ :
‘e’ :
‘f’ :
‘g’ :
‘h’ :
‘i’ :
‘j’ :
‘k’ :
‘l’ :
‘m’ :
‘n’ :
‘o’ :
‘p’ :
‘q’ :
‘r’ :
‘s’ :
‘t’ :
‘u’ :
‘v’ :
‘w’ :
‘x’ : 1110010110 ( 10 * 6) ‘y’ : 011001 ( 6* 72)
100101 ( 6*77) 01101 ( 5*143) 000 ( 3*473)
100111 (
111000 (
11110 (
0100 (
01100000110 (
00100001 (
10110 (
101111 (
0111 (
0101 (
101110 (
00100110001 (
11101 (
0011 (
1000 (
111110 (
0110001 (
1111111 (
6* 84) 6* 94) 5* 223) 4* 266)
11* 2) 8* 15) 5* 176) 6* 92) 4* 288) 4* 269) 6* 89)
11* 2) 5* 214) 4* 260) 4* 305) 6* 108) 7* 37) 7* 60)
2916 bytes
Each character is shown, along with its Huffman bit-pattern, the length of the bit-pattern and the frequency of occurrence. At the bottom, the total number of bytes required to compress the file is displayed.
9.4 Binary Tree Visualisation
The course notes showed a simple way to print out integer binary trees in this form :
20(10(5(*)(*))(17(*)(*)))(30(21(*)(*))(*))
You could also imagine doing the reverse operation, that is reading in a tree in the form above and displaying it in a ‘friendlier’ style :
The tree has left branches vertically down the page and right branches horizontally right. Another example is :
17(2(*)(3(*)(4(*)(*))))(6(8(*)(*))(*))
which is displayed as:
Exercise 9.3 Write a program that reads in a file (argv[1]) and, based on the characters it contains, computes the Huffman tree, displaying it as above.
20—-30 || 10 -17 21
| 5
9.5 Boolean Arrays 59
17—-6
|| 2-3-4 8
The above examples show the most ‘compact’ form of displaying the trees, but you can use simplifying assumptions if you wish:
• The integers stored in the tree are always ≥ 0.
• The integers stored in the tree are 5 characters (or less) in length. • It is just as valid to print the tree in either of these ways :
1-6 00001-00006 00001————-00006 ||||||
2 7 00002 00008 00002 00007 |||
3-4-5 00003-00004-00005 00003-00004-00005
Exercise 9.4 Write a program that reads in a tree using argv[1] and the tree displayed to stdout with no other printing if no error has occurred.
Exercise9.5 Writeaprogramthatreadsinatreeusingargv[1]anddisplaysthetreeusing SDL.
9.5 Boolean Arrays
In C90 there is no real way of having Booleans. You can typedef and/or enumerate one, but in reality this is just an integer, not truly a single bit. Therefore, we’d like to be able to create an ADT that allows Boolean Arrays to be created, such that the storage of a single bit, requires only one bit of memory.
Thiswillinvolvemanipulationofthebitsofa(forinstance)unsigned charusingtheCbitwise logical operators ‘xor’ (^), ‘or’ (|) and ‘and’ (&).
https://en.wikipedia.org/wiki/Bitwise_operation
www
Exercise 9.6 Given the files testboolarr.c and boolarr.h, complete the Boolean Array ADT using a reallocating array of unsigned chars. You will need to write both realloc.c, which will define the functions in boolarr.h, and specific.h, which will contain your header information. Each cell of the underlying array stores 8 bits of the data. Ignoring the overhead of the main structure, which might store the capacity of the underlying array, and the number of valid bits in use, the array should be nbits/8 bytes in length.
9.6 Advent of Code
Have a look through the archives of the rather wonderful ‘Advent of Code’ website, which is a
good place to begin your lifelong love of recreational programming.
https://adventofcode.com
and find a simple puzzle to solve.
www
60 Chapter 9. Huffman, Trees & Bitwise Exercise 9.7 Choose one of the simpler puzzles and solve it.
9.7 Movember
This year I’ve been growing a moustache to show my support for a charity called ‘Movember’ that, amongst other things, raises awareness of mental health issues. There’s some information on staying connected with people and lending your support at:
https://uk.movember.com/mens-health/mental-health
www
Exercise 9.8 Phone a friend you haven’t spoken to in over six months and check that they are doing OK.
10. ADTs and Algorithms I
10.1 Indexed Arrays
In the usual places are the files arr.h, arr.c, testarr.c and a makefile. These files enable you to build, and test, the ADT for a 1D Indexed Array. This simple replacement for C arrays is ‘safe’ in the sense that if you write the array out-of-bounds, it will be automatically resized correctly (using realloc()). The interface to this ADT is in arr.h and its implementation is in arr.c. Typing make testarr, compiles these files together with the test file testarr.c. Executing ./testarr should result in all tests passing correctly.
% make -f 1d_adt.mk run Basic Array Tests … Start Basic Array Tests … Stop
Exercise 10.1 Build testarr, and check that you understand the use of the functions, including initialization, reading, writing and freeing. Use the makefile provided to run the code, and do some memory-leak checking etc.
10.2 Sets
Sets are an important concept in Computer Science. They enable the storage of elements (members), guaranteeing that no element appears more than once. Operations on sets include initializing them, copying them, inserting an element, returning their size (cardinality), finding if they contain a particular element, removing an element if it exists, and removing one element from a random position (since sets have no particular ordering, this could be the first element). Other set operations include union (combining two sets to include all elements), and intersection (the set containing elements common to both sets).
https://www.mathsisfun.com/sets/sets-introduction.html
https://en.wikipedia.org/wiki/Set_(mathematics)
The definition of a Set ADT is given in set.h, and a file to test it is given in testset.c.
www
www
Indexed Arrays
Sets
Towards Polymorphism Double Hashing Separate Chaining
62 Chapter 10. ADTs and Algorithms I
Exercise 10.2 Write set.c, so that:
works correctly. Your Set ADT will build on top of the Indexed Array ADT introduced in Exercise 10.1. Only write set.c. Alter no other files, including arr.c, arr.h, set.h or the Makefile.
% make -f set_adt.mk run ./ testset
Basic Set Tests … Start
Basic Set Tests … Stop
10.3 Towards Polymorphism
Polymorphism is the concept of writing functions (or ADTs), without needing to specify which particular type is being used/stored. To understand the quicksort algorithm, for instance, doesn’t really require you to know whether you’re using integers, doubles or some other type. C is not very good at dealing with polymorphism – you’d need something like Python, Java or C++ for that. However, it does allow the use of void* pointers for us to approximate it.
10.4 Double Hashing
Here we use double hashing, a technique for resolving collisions in a hash table.
Exercise 10.3 Extend the array ADT discussed in Exercise 10.1, so that any type can be used – files varr.h and testvarr.c are available in the usual place – use the Makefile used there, simply swapping arr for varr at the top.
Exercise 10.4 Use double hashing to create a spelling checker, which reads in a dictionary file from argv[1], and stores the words.
Make sure the program:
• Use double hashing to achieve this.
• Makes no assumptions about the maximum size of the dictionary files. Choose an
initial (prime) array size, created via malloc(). If this gets more than 60% full, creates a new array, roughly twice the size (but still prime). Rehash all the words into this new array from the old one. This may need to be done many times as more and more words are added.
• Uses a hash, and double hash, function of your choosing.
• Once the hash table is built, reads another list of words from argv[2] and reports on
the average number of look-ups required. A perfect hash will require exactly 1.0 look- up. Assuming the program works correctly, this number is the only output required from the program.
10.5 Separate Chaining
Separate chaining deals with collisions by forming (hopefully small) linked lists out from a base array.
Exercise 10.5 Adapt Exercise 10.4 so that:
• A linked-list style approach is used.
• No assumptions about the maximum size of the dictionary file is made.
10.5 Separate Chaining 63
• The same hash function as before is used.
• Once the hash table is built, reads another list of words from argv[2] and reports
on the average number of look-ups required. A perfect hash will require exactly 1.0 look-up, on average. Assuming the program works correctly, this number is the only output required from the program.
11. ADTs and Algorithms II
11.1 Polymorphic Hashing
Polymorphism is the concept of writing functions (or ADTs), without needing to specify which particular type is being used/stored. To understand the quicksort algorithm, for instance, doesn’t really require you to know whether you’re using integers, doubles or some other type. C is not very good at dealing with polymorphism – you’d need something like Java or C++ for that. However, it does allow the use of void* pointers for us to approximate it.
Exercise 11.1 Here we write an implementation of a polymorphic associative array using hashing, see assoc.h. Both the key & data to be used by the hash function are of unknown types, so we will simply store void pointers to both of these, and the user of the associative array (and not the ADT itself) will be responsible for creating and maintaining such memory, and also ensuring it doesn’t change when in use by the associative array. In the testassoc.c file we show two uses for such a type : a simple string example (to find the longest word in English that is also a valid (but different) word when spelled backwards), and a simple integer example where we keep a record of how many unique random numbers in a range are chosen.
Since we do not know what the type of the key is, we need to be careful when comparing or copying keys. Therefore, in the assoc_init(int keysize) function, the user has to pass the size of the key used (e.g. sizeof(int)) or in the case or strings, the special value 0. Now we can use memcmp() and memcpy(), or in the case of strings, strcmp() and strcpy() for dealing with the keys. In the case of data, the ADT only ever needs to return a pointer to the data (not process it) via assoc_lookup(), so its size is not important.
Your hash table used to implement the ADT should be resizeable, and you may use open- addressing/double-hashing or separate chaining to deal with collisions. Make no assumptions about the maximum size of the array, and make the initial size of the array small e.g. 16 or 17 (a prime is useful if you’re double hashing).. You can use any hash function you wish, but if it’s off the internet (etc.) cite the source in a comment..
Submit a single assoc.zip file containing your code. Your standard submission will contain the directory structure, including the two files:
Realloc/specific.h and Realloc/realloc.c. I will use my assoc.mk Makefile to compile your code, so check that it works correctly.
Hint : when you do a resize you cannot simply copy the old table, but must rehash your data,
Polymorphic Hashing Cuckoo Hashing MultiValue Maps Rhymes
Faster MVMs
66 Chapter 11. ADTs and Algorithms II one entry at a time, into the new table. This is because your hash function is based on the table
size, and if the table size has changed you will be hashing keys into different locations.
11.2 Cuckoo Hashing
11.3 MultiValue Maps
Many data types concern a single value (e.g. a hash table), so that a string (say) acts as both the key (by which we search for the data) and also as the object we need to store (the value). An example of this a spelling checker, where one word is stored (and searched for) at a time. However, sometimes there is a need to store a value based on a particular key – for instance an associative array in Python allows you to perform operations such as :
population[“Bristol”] = 536000
where a value (the number 536000) is stored using the key (the string “Bristol”). One decision you need to make when designing such a data type is whether multiple values are allowed for the same key; in the above example this would make no sense – Bristol can only have one population size. But if you wanted to store people as the key, with their salary as the value, you might need to use a MultiValue Map (MVM) since people can have more than one job.
Here we write the abstract type for a MultiValueMap that stores key-value pairs, where both the key and the value are strings.
Exercise 11.2 Rewrite your associative array described in Exercise 11.1 using cuckoo hashing instead. This involves two tables, and a key is guaranteed to be found (if it has been inserted) in one of its two locations. Different hash functions must be used for each of the two tables. If you do the extension, add :
Cuckoo/specific.h and Cuckoo/cuckoo.c to your directory structure in assoc.zip.
Exercise11.3 ThedefinitionofanMVMADTisgiveninmvm.h,andafiletotestitisgiven in testmvm.c. Write mvm.c, so that:
works correctly. Use a simple linked list for this, inserting new items at the head of the list. Make no changes to any of my files.
Submit : mvm.c.
% make -f mvm_adt.mk
./ testmvm
Basic MVM Tests … Start
Basic MVM Tests … Stop
11.4 Rhymes
In the usual place is a dictionary which, for every word, lists the phonemes (a code for a distinct unit of sound) used to pronounce that word in American English. In this file the word itself, and its phonemes, are separated by a ‘#’). For instance:
BOY#B OY1
shows that the word BOY has two phonemes : B and OY1.
A simple attempt at finding rhymes for boy would match every word that has OY1 as its final
phoneme. This gives you:
11.4 Rhymes 67
POLLOI MCVOY LAFOY ALROY ILLINOIS CROIX DECOY REDEPLOY CLOY
LAVOY MOYE LOYE STOY PLOY KNOY EMPLOY ELROY JOY COY LACROIX DEVROY ENJOY LOY COYE FOYE MOY DOI BROY TOY LABOY ROI HOY ROYE NEU CROY SOY YOY MCCOY CHOY GOY ROY BOLSHOI MALLOY JOYE DESTROY DELACROIX(1) DEBOY MCROY CHOI UNDEREMPLOY FLOY MCKOY TOYE AHOY BOY OYE SGROI FOIE(1) TROY DEPLOY SAVOY UNEMPLOY SCHEU WOY BOYE HOYE FOY OI HOI KROY EMPLOY(1) FLOURNOY OIE
MCCLOY ANNOY OY DEJOY
Using two phonemes to do the matching is too many, since the only matches are for words that have exactly the same pronunciation:
LABOY DEBOY BOY BOYE
which are not really rhymes, but homophones. Therefore, using the correct number of phonemes will be key to finding ‘good’ rhyming words.
Here we will use the MutliValue Map written in Exercise 11.3 to create two maps. An MVM map1 stores the word (as the key) and its final n phonemes as a single string (the value). Now map2 stores the word (value), keyed by its final n phonemes as a single string. Looking up the phonemes associated with a word can be done using the word as a key 1 (via map1), and looking up a word given its phonemes can be achieved using map2.
Exercise 11.4 Read in the dictionary, and for each line in turn, store the data in the two maps. Now for a requested word to be ‘rhymed’, search for its phonemes using map1 and then search for matching words using map2. The number of phonemes specified for this rhyming is given via the command line, as are the words to be rhymed:
The -n flag specifies the number of phonemes to use (you may assume the number associated with it always is always separated from the flag by a space). If no -n flag is given then the value 3 is assumed. It only makes sense to use a value of n ≤ the number of phonemes in the two words being checked. If you use a value greater than this, the results are undefined.
The list of words to be matched may be found on the command line:
$ ./ homophones -n 3 RHYME
RHYME (R AY1 M): PRIME RHYME ANTICRIME(1) CRIME ANTICRIME GRIME RIME
$ ./homophones -n 4 CHRISTMAS PROGRAM PASSING
CHRISTMAS (S M AH0 S): ISTHMUS CHRISTMAS CHRISTMAS’ PROGRAM (G R AE2 M): CENTIGRAM ENGRAM HISTOGRAM WOLFGRAM
MONOGRAM LOGOGRAM HOLOGRAM MICROGRAM SONOGRAM ANGIOGRAM TELEGRAM PROGRAMME ELECTROCARDIOGRAM ELECTROPHORETOGRAM REPROGRAM MILLIGRAM ANAGRAM PEGRAM POLYGRAM DIAGRAM EPIGRAM PROGRAM MAILGRAM MILGRAM INGHRAM CABLEGRAM MAMMOGRAM KILOGRAM
PASSING (AE1 S IH0 NG): GASSING MASENG SURPASSING KASSING
HASSING PASSING AMASSING MASSING CLASSING HARASSING
Use the Makefile supplied for this task. Make the output as similar to that shown above as possible.
1Strictly speaking, we don’t need the map1 to be capable of storing multiple values, since every word in the dictionary is unique. We’ll use it here though for simplicity.
68 Chapter 11. ADTs and Algorithms II Submit : homophones.c.
11.5 Faster MVMs
The ADT for MVMs used in Exercise 11.3 is a simple linked list – insertion is fast, but searching is slow.
Exercise11.5 WriteanewversionofthisMVMADTcalledfmvm.cthatimplementsexactly the same functionality but has a faster search time. The file fmvm.h will change very little from mvm.h, with maybe only the structures changing, but not the function prototypes. A similar testing file to that used previously, now called testfmvm.c should also be written. Note that any ordering of data when using the mvm_print and mvm_multisearch functions is acceptable, so these can’t be tested in exactly the same manner.
By simply changing the #include from
Make it clear what you have done to speed up your searching (and how) using comments at the top of fmvm.h
Submit : fmvm.c, fmvm.h and testfmvm.c
12.1 Teletext
12. Parsing Data
In the early 1970s, Phillips Labs began work on transmitting digital information across the television network. The aim was to provide up-to-date news and weather information via a television set. This system was trialled first by the BBC in a system that eventually became known as “Ceefax” and then on other independant British terrestrial stations as “Oracle”. A very similar system was implemented on the BBC microcomputer, known as Mode 7. An example of
Figure 12.1: An example Ceefax page circa 1983. Taken from
http://teletext.mb21.co.uk/gallery/ceefax/main1.shtml
Teletext
Guido van Robot
Turtle Graphics
The UNIX awk program Neill’s Adventure Language
70 Chapter 12. Parsing Data
0x8 Unused/Reserved Red Alphanumeric Green Alphanumeric Yellow Alphanumeric Blue Alphanumeric Magenta Alphanumeric Cyan Alphanumeric White Alphanumeric Unused/Reserved Unused/Reserved Unused/Reserved Unused/Reserved Single Height Double Height Unused/Reserved Unused/Reserved
0x9 Unused/Reserved Red Graphics Green Graphics Yellow Graphics Blue Graphics Magenta Graphics Cyan Graphics White Graphics Unused/Reserved Contiguous Graphics Separated Graphics Unused/Reserved Black Background New Background Hold Graphics Release Graphics
0xA
! ” £ $ % & ’ ( ) * + , – . /
0xB 0 1 2 3 4 5 6 7 8 9
:
; < = > ?
0xC @ A B C D E F G H I
J K L M N O
0xD P Q R S T U V W X Y Z ← 1/2 → ↑ #
0xE – a b c d e f g h i
j k l m n o
0xF p q r s
t u v w x y z 1/4 || 3/4 ÷
0 1 2 3 4 5 6 7 8 9 A B C D E F
Table 12.1: The control codes and characters for alphanumeric mode. Note here (because we’re using white paper) foreground is shown in black and background in white. On a teletext screen we use white on a black background.
such a Ceefax screen is shown in Figure 12.1.
This project, inspired by such teletext systems, will allow a 40 × 25 (1000) character file
to be rendered to the screen, using similar control codes. However, some control codes are not implemented, including those to do with flashing or hidden text, and transparent backgrounds. In particular, our definition of the double height control code differs from that of the traditional one.
The Control Codes
This section is based to a large extent to Richard Russell’s description of Mode 7 on the BBC Micro: http://www.bbcbasic.co.uk/tccgen/manual/tcgen2.html.
Coloured Text
By using the control codes 129 − 135 (0x81 − 0x87 in hexadecimal) the rest of the line will have text in the selected foreground colour.
To change the background colour, you issue a foreground colour code first, and then the “New Background” character. All the following line will now have the appropriate background colour. You’ll typically then need to choose a new foreground text colour.
Block Graphics
Teletext has a very limited ability to output low-resolution block graphics. These shapes take the place of other characters in the font and are enabled by issuing one of the coloured graphics codes e.g. red graphics. At this point the characters available for printing are as displayed in Table 12.2. These new graphics characters are made up of six smaller boxes, known as sixels. Each individual sixel has a code, either, 1,2,4,8,16 or 64 as shown in Figure 12.2. By adding these values together we can define which of these sixels are ‘lit’ or not. If we wish the three left-hand ones to be lit we’d use the base code (160) plus 1, 4 and 16 = 181 (0xB5 in hexadecimal). Therefore issuing the coding green graphics and then code 181 puts a green vertical bar on the screen.
12.1 Teletext 71
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x8 Unused/Reserved Red Alphanumeric Green Alphanumeric Yellow Alphanumeric Blue Alphanumeric Magenta Alphanumeric Cyan Alphanumeric White Alphanumeric Unused/Reserved Unused/Reserved Unused/Reserved Unused/Reserved Single Height Double Height Unused/Reserved Unused/Reserved
0x9 Unused/Reserved Red Graphics Green Graphics Yellow Graphics Blue Graphics Magenta Graphics Cyan Graphics White Graphics Unused/Reserved Contiguous Graphics Separated Graphics Unused/Reserved Black Background New Background Hold Graphics Release Graphics
0xA
0xB
0xC @ A B C D E F G H I
J K L M N O
0xD P Q R S T U V W X Y Z ← 1/2 → ↑ #
0xE
0xF
Table 12.2: The control codes and characters for contiguous graphics mode.
72 Chapter 12. Parsing Data
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x8 Unused/Reserved Red Alphanumeric Green Alphanumeric Yellow Alphanumeric Blue Alphanumeric Magenta Alphanumeric Cyan Alphanumeric White Alphanumeric Unused/Reserved Unused/Reserved Unused/Reserved Unused/Reserved Single Height Double Height Unused/Reserved Unused/Reserved
0x9 Unused/Reserved Red Graphics Green Graphics Yellow Graphics Blue Graphics Magenta Graphics Cyan Graphics White Graphics Unused/Reserved Contiguous Graphics Separated Graphics Unused/Reserved Black Background New Background Hold Graphics Release Graphics
0xA
0xB
0xC @ A B C D E F G H I
J K L M N O
0xD P Q R S T U V W X Y Z ← 1/2 → ↑ #
0xE
0xF
Table 12.3: The control codes and characters for separated graphics mode.
12.1 Teletext 73
1
2
4
8
16
64
Figure 12.2: Values for computing graphics codes, as added to the base code 160 (0xA0 in hexadecimal).
Notice in Table 12.2 that some other characters are still available, particularly all capital letters. This allow simple printing of capitals, even when in graphics mode, and is know as blast-through Text.
There is another set of block graphics, as shown in Table 12.3. For these, each sixel is separated from others by thin vertical and horizontal lines. This mode is known as separated graphics mode.
Held Graphics
Generally all control codes are displayed as spaces, in the current background colour. In the held graphics mode, control code 158 (0x9E in hexadecimal), control codes are instead displayed as a copy of the most recently displayed graphics symbol. This permits a limited range of abrupt display colour changes. The held graphics character is displayed in the same contiguous/separated mode as when it was first displayed. If there has been a change in the text/graphics mode or the normal/double-height mode since the last graphics character was displayed, the held graphics character is cleared and control codes once again display as spaces.
To switch held graphics mode off, use the release graphics control code.
Double Height
By using the double height control code, characters are displayed as twice their normal size. Since they span two lines, the control codes and characters have to be repeated on the next line too, for them to be correctly displayed. The rule here, is that if a character is to be displayed as double height, the top half of the character is displayed on the first line, and the bottom half on the next line. The bottom half is only displayed as double height if the character vertically above it was also displayed in double height mode. The character in question need not be the same one.
Note: here we deviate from other definitions of this control code.
Some General Guidelines
• Characters are considered 7-bit (the 8th bit was typically used for parity checking over the noisy television signal). Therefore any character less than 128(0x80) should have 128 added to it. For, example if you read in character 32 (space), it should be ‘converted’ to character 160.
• Each newline on the Teletext page automatically begins with White text, single height, contiguous graphics, black background, release graphics.
• With the the exception of hold graphics (see above), control characters are generally rendered in the same manner as a space would be. If the background is currently red and text colour yellow, say, then the control code would show as an empty red background.
Exercise 12.1 Implement a teletext rendering system. The 1000 character input file should be read in using argv[1].
There are many ambiguities as to how various sequences of codes should be rendered. To help with this, several example files have been posted on the unit web page. If there is still doubt, make a best-guess and state your assumptions in the code.
74 Chapter 12. Parsing Data
Submit the testing you have undertaken, including examples and a description of your strategies. This should convince us that you have tested every line of code, and that it works correctly. If there are still issues/bugs state them clearly. Also, point out any bugs that you have successfully found using these approaches. Submit a file named testing.txt, along with any other files you feel necessary in the appropriate directory.
No particular strategy is mandated. You may wish to explore a couple and briefly discuss strengths and weaknesses.
Undertake an extension of your choosing. Examples of these include:
• A system that allows you to quickly author teletext pages (perhaps using a recursive-
descent parser?)
• Automatic image to teletext conversion.
• Automatic (simple) html to teletext conversion (and/or vice-versa).
Remember, that the assessment is based on the quality of your coding, so choose something to demonstrate an aspect of programming or software engineering that you haven’t had a chance to use in the main assignment. Submit a file named extension.txt outlining, in brief, your contribution.
Hints
• Don’t add graphics too early – the code is easier to test and debug with textual output to begin with.
• I advise you to use SDL for your graphics output. The library provided previously contained two functions to deal with printing characters : Neill_SDL_ReadFont() and Neill_SDL_DrawString(). The font file m7fixed.fnt provides the basic characters required here, but not the sixels. By understanding how the font data is rendered, the double height version of the characters should be relatively simple.
• Don’t try to do all aspects of this at once – begin with coloured characters only. Add more advanced functionality later.
• Plan how you are going to store your data early in the design process. Does each character need its own data structure? Does each line? Can this be abstracted?
Please create a directory structure, so that I can easily find the different subsections. Your testing strategy will be explained in testing.txt, and your extension as extension.txt. For the source and extension sections, make sure there’s a Makefile, so that I can easily build the code.
——Top Directory—— ||| ||| ||| |||
source testing extension |||
…
… Makefile
… extension.txt
… Makefile
… …
testing.txt
Bundle all of these up as one single .zip submission – not one for each subsection.
12.2 Guido van Robot 75
12.2 Guido van Robot
www
From
http://gvr.sourceforge.net/
Guido van Robot can face in one of four directions, north, east, south, and west. He turns only 90 degrees at a time, so he can’t face northeast, for instance. In Guido’s world, streets run east-west, and are numbered starting at 1. There are no zero or negative street numbers. Avenues run north-south, and are also numbered starting at 1, with no zero or negative avenue numbers. At the intersection of a street and avenue is a corner. Guido moves from one corner to to the next in a single movement. Because he can only face in one of four directions, when he moves he changes his location by one avenue, or by one street, but not both!
Simple .wld File
Robot 5 4 N 1
Wall 3 2 N 6
Wall 2 3 E 4
Wall 3 6 N 6
76 Chapter 12. Parsing Data Wall 8 3 E 2
Wall 8 6 E
Simple .gvr File
move
move
move
move
turnoff
12.2 Guido van Robot 77
Do Loops
do 2 :
putbeeper
move turnoff
Conditional Loop
while front_is_clear :
putbeeper
move turnoff
Branching
do 13 :
if front_is_clear :
putbeeper
move else :
turnleft
turnoff
The Formal Grammar
::=
::= “turnoff” |
“move”
“turnleft”
“pickbeeper”
“putbeeper”
::= “do”
::= “while”
::= “if”
“if”
“else” “:”
::=
::= “front_is_clear” |
“front_is_blocked” |
“left_is_clear” |
“left_is_blocked” |
“right_is_clear” |
“right_is_blocked”
::= “next_to_a_beeper” |
“not_next_to_a_beeper” |
“any_beepers_in_beeper_bag” |
78
Chapter 12. Parsing Data
“no_beepers_in_beeper_bag”
::= “facing_north” |
“not_facing_north” |
“facing_south” |
“not_facing_south” |
“facing_east” |
“not_facing_east” |
“facing_west” |
“not_facing_west”
This ignores some Guido instructions, e.g. elseif and define. It also doesn’t well explain how to spot the end of a DO etc. which is marked by a reduction in indentation. The definition of .wld files is so simple a recursive descent parser (and hence grammar) is not required.
Exercise 12.2 • (25%) To implement a recursive descent parser – this says whether or not the given .gvr and .wld follow the formal grammar or not. The input files are specified via argv[1] (.gvr) and argv[2] (.wld) .
• (25%) To implement an interpreter, so that the instructions are executed. Printing the world and robot to screen using simple characters is fine, but many will wish to use SDL.
• (25%) To show a testing strategy on the above – you should give details of white-box and black-box testing done on your code. Describe any test-harnesses used. Give examples of the output of many different programs. Convince me that every line of your C code has been tested.
• (25%) To show an extension to the project in a direction of your choice. It should demonstrate your understanding of some aspect of programming or S/W engineering. If you extend the formal grammar make sure that you show the new, full grammar. Submit the program(s) and a Makefile so that I can:
• Compile the parser by typing ‘make parse’.
• Compile the interpreter by typing ‘make interp’.
• Compile the extension by typing ‘make extension’.
• Submit a test strategy report called test.txt. This will include sample outputs, any code
written especially for testing etc.
• Submit an extension report called ‘extend.txt’. This is quite brief and explains the
extension attempted.
• You need to be able to load a world file and a .gvr and say if they are valid of not.
• Don’t try to write the entire program in one go. Try a cut down version of the grammar
first, e.g.:
::=
::= “turnoff” |
“move”
“turnleft”
“pickbeeper”
“putbeeper”
::= “do”
• Some issues, such as what happens if you hit a wall are not clear from the formal grammar. In this case, use your common sense, or do what the real program does.
12.3 Turtle Graphics 79
12.3 Turtle Graphics History
• Many attempts have been made to create programming languages which are intuitive and easy to learn.
• One of the best of these was LOGO which allowed children as young as 3 to learn a computer language.
• A subset of this language involved a “turtle” which could be driven around the screen using simple instructions. The turtle, when viewed from above, was represented by a triangle.
An Example
{
FD 30 LT 45
FD 30 LT 45 FD 30 LT 45 FD 30 LT 45
FD 30 LT 45 FD 30 LT 45 FD 30 LT 45
FD 30
LT 45 }
Adding Loops
{
DO A FROM 1 TO 8 {
FD 30
80 Chapter 12. Parsing Data
LT 45 }
}
Using Variables
{
DO A FROM 1 TO 100 {
SET C := A 1.5 * ;
FD C
RT 62 }
}
Nested Loops
{
DO A FROM 1 TO 50 {
FD A
RT 30
DO B FROM 1 TO 8 {
SET C := A 5 / ; FD C
RT 45
} }
}
12.3 Turtle Graphics 81
The Formal Grammar
“}”
82 Chapter 12. Parsing Data
not write a new program for this, simply extend your existing parser. Output is via SDL. You may find the function call SDL_RenderDrawLine useful.
Show a testing strategy on the above – you should give details of unit testing, white/black- box testing done on your code. Describe any test-harnesses used. In addition, give examples of the output of many different turtle programs. Convince me that every line of your C code has been tested.
Show an extension to the project in a direction of your choice. It should demonstrate your understanding of some aspect of programming or S/W engineering. If you extend the formal grammar make sure that you show the new, full grammar.
Hints
• •
• •
•
•
All four sections above are equally weighted.
Don’t try to write the entire program in one go. Try a cut down version of the grammar first, e.g.:
“}”
The language is simply a sequence of words (even the semi-colons), so use fscanf(). Some issues, such as what happens if you use an undefined variable, or if you use a variable before it is set, are not explained by the formal grammar. Use your own common-sense, and explain what you have done.
Once your parser works, extend it to become an interpreter. DO NOT aim to parse the program first and then interpret it separately. Interpreting and parsing are inseparably bound together.
Start testing very early – this is a complex beast to test and trying to do it near the end won’t work.
Submission
Your testing strategy will be explained in testing.txt, and your extension as extension.txt. For the parser, interpreter and extension sections, make sure there’s a Makefile, so that I can easily build the code using make parse, make interp and make extension. Submit
a single turtle.zip file.
12.4 The UNIX awk program
Sometimes handling files containing numerical data in C may be somewhat arduous. A ‘simple’ program to swap the first and second columns of a file is quite long in C.
For this reason, there is a simple language called awk which allows simple manipulation to be done on a line by line basis. For example:
{
print $2, $1;
}
12.4 The UNIX awk program 83
swaps the first and second fields in a file. The whole program is applied to every line of the input file in turn.
Other examples of awk would include printing fields in reverse order:
{ for (i = NF; i > 0; –i) print $i }
Adding up first column, print sum and average
{ s += $1 }
END{print “sum is”,s,” average is”,s/NR}
Notice the ‘special’ variables NR and NF. Other special variables include FS which defines how fields are separated (default space and tab). See ‘man awk’ for more information.
The CAWK formal grammar
This assignment involves writing a recursive descent parser for a simple, cut-down version of awk called ‘cawk’. It is based on the following formal grammar:
:= “{”
“{”
:=
:=
:= “LET”
:=
:= “$#” | “$@”
:= number |
:=
:=
:= “IF” “(”
:=
:= “PRINT”
Note, you may assume that the program consists of a list of words separated by white-space.
$# is the number of records.
$@ is the number of fields in the current record.
$[ 0 ] is the entire line
$[ i ] is the ith field of the record
Examples of CAWK
{
PRINT $[ 2 ]
PRINT $[ 1 ]
PRINT “\n”
}
Print fields in reverse order:
84
Chapter 12. Parsing Data
{
FOR $I = $@ TO 1 STEP -1 {
PRINT $[ $I ] }
PRINT “\n” }
Exercise 12.4 Write a C program to implement the above formal grammar. Your program should read in a cawk program (argv[1]) and expect the data file to be read from standard input (or from argv[2] if specified).
The marks are split as follows:
• (25%) To implement a recursive descent parser – this says whether or not a given
CAWK program follows the formal grammar or not.
• (25%) To implement an interpreter, so that the instructions are executed.
• (25%) To show a testing strategy on the above – you should give details of white-box
and black-box testing done on your code. Describe any test-harnesses used. Give
examples of the output of many different cawk programs.
• (25%) To show an extension to the project in a direction of your choice. It should
demonstrate your understanding of some aspect of programming or S/W engineering.
If you extend the formal grammar make sure that you show the new, full grammar. Submit the program(s) and a Makefile so that I can:
• Compile the parser by typing ‘make parse’.
• Compile the interpreter by typing ‘make interp’.
• Compile the extension by typing ‘make extension’.
In addition:
• Submit a test strategy report called test.txt. This will include sample outputs, any code
written especially for testing etc.
• Submit an extension report called ‘extend.txt’. This is quite brief and explains the
extension attempted.
12.5 Neill’s Adventure Language
Some of the very earliest computer games were text-based adventures e.g. Colossal Cave
https://en.wikipedia.org/wiki/Colossal_Cave_Adventure
Here we write a simple language (NAL) which allows us to write (simplified) versions of such games, focussing on setting variables and printing. The grammar is as follows:
www
% Execute the instructions in file, then return here e.g. : % FILE “test1.nal”
% Halt/abort all execution right now !
12.5 Neill’s Adventure Language 85
% Fill a number−variable with a number, or 2 string−variables with string :
% IN2STR ( $C, $ZER ) or INNUM ( %NV )
:= “IN2STR” “(”
% Jump to the nth word in this file (the first word is number zero!)
% Brackets count as one word, “things in quotes” count as one word, e.g. : % JUMP 5
% Output the value of variable, or constant, to screen with (without a linefeed)
% Set a variable to a random number in the range 0 − 99 e.g. :
% RND ( %N )
% Number should be seeded via the clock to be different on successive executions
% If condition/test is true, execute INSTRS after brace, else skip braces
% Add 1 to a number−variable e.g. : % INC ( %ABC )
% Set a variable. All variables are GLOBAL, and persist across the use of FILE etc.
% $A = “Hello” or %B = 17.6
% Some helpful variable/constant rules
% (Here ROT18 is ROT13 for letters and rot5 for digits)
:=
or a ROT18 string in hashes e.g. #URYYB.GKG#
www
Note that string constants can be entered via the use of double quotes, or using hashes to encode strings using ROT18
https://en.wikipedia.org/wiki/ROT13
in which characters are encoded/decoded according to:
Plain: ABCDEFGHIJKLMNOPQRSTUVWXYZ
ROT13: NOPQRSTUVWXYZABCDEFGHIJKLM Plain: abcdefghijklmnopqrstuvwxyz ROT13: nopqrstuvwxyzabcdefghijklm Plain: 0123456789
ROT5 : 5678901234
86 Chapter 12. Parsing Data
the algorithm allows obvious ‘spoilers’ to be hidden from users browsing through .nal files, but is simple to apply.
A simple program showing the use of assignment, conditionals and ROT18 is shown:
Here string-variables ($) and number-variables (%) are initialised.
The use of ‘JUMP’ is shown next – this allows execution to move a chosen word in the
program. Strings inside quotes count as one word, so the following contains six words:
‘FILE’ allows another file to be executed (as
‘RND’ sets a variable to a number in the range 0 − 99, while ‘INC’ adds one to a number- variable.
{
$A = “Neill”
%E = 12.4
IFEQUAL ( $A , #Arvyy# ) {
PRINT #Uryyb Jbeyq!# }
}
{
PRINT “Warning : Infinite Loop !” JUMP 1
}
{
PRINT “In test4, before”
FILE “test1.nal”
PRINT “In test4, after” }
{
%C = 0
RND ( %A ) PRINT %A
INC ( %C ) IFGREATER ( %C , 9 ) {
ABORT }
JUMP 4 }
‘ABORT’ halts execution instantly.
A simple guessing game is shown below:
{
PRINT “I’m thinking of a number (0−99).\nCan you guess it?”
RND ( %MINE ) %CNT = 0
INC ( %CNT )
PRINT “Type in your guess” INNUM ( %GUESS )
IFGREATER ( %CNT , 7 ) { PRINT #Gbb znal gevrf :−(# ABORT
12.5 Neill’s Adventure Language 87
}
IFGREATER ( %GUESS , %MINE ) {
PRINT “Too Big ! Try again … ”
JUMP 10 }
IFGREATER ( %MINE , %GUESS ) { PRINT “Too Small ! Try again … ” JUMP 10
}
IFEQUAL ( %MINE , %GUESS ) {
PRINT #Lbh thrffrq pbeerpgyl, lbh jva :−)#
PRINTN “Number of goes = ” PRINT %CNT
ABORT
} }
‘INNUM’ gets a number (float) from the user, and the use of ‘PRINT’ (with a newline after), and ‘PRINTN’ (without a newline after) is demonstrated.
Exercise 12.5
• (40%) Implement a parser. The .nal file should be read in using argv[1]. If the file is parsed correctly, the only output should be:
• (30%) Implement an interpreter, building on top of the parser in the manner described in the lectures. Do not write a brand new program – interpretation will be done alongside parsing.
• (20%) Submit the testing you have undertaken, including examples and a description of your strategies. This should convince us that you have tested every line of code, and that it works correctly. If there are still issues/bugs state them clearly. Also, point out any bugs that you have successfully found using these approaches. Submit a file named testing.txt, along with any other files you feel necessary. Due to the recursive nature of this assignment testing is non-trivial – simply submitting many .nal files that
‘work’ is not sufficient. No particular strategy is mandated. You may wish to explore a couple and briefly discuss strengths and weaknesses.
• (10%) Undertake an extension of your choosing. Remember, that the assessment is based on the quality of your coding, so choose something to demonstrate an aspect of programming or software engineering that you haven’t had a chance to use in the main assignment. Submit a file named extension.txt outlining, in brief, your contribution.
Hints
first. Build-up from the 01s example given in lectures.
• Some issues, such as what happens if you use an undefined variable, or if you use
a variable before it is set, are not explained by the formal grammar. Use your own
common-sense, and explain what you have done.
• Once your parser works, extend it to become an interpreter. DO NOT aim to parse the
program first and then interpret it separately. Interpreting and parsing are inseparably bound together.
Don’t try to write the entire program in one go. Try a cut down version of the grammar
Parsed OK
88 Chapter 12. Parsing Data Submission
Your testing strategy will be explained in testing.txt, and your extension as extension.txt. For the parser, interpreter and extension sections , make sure there’s one Makefile, so that
I can easily build the code using make parse, make interp and make extension. I’ve given an example makefile in the usual place, but this is an example only – yours may be different. I wrote only one program nal.c and built the two different version by setting
a #define via the makefile with -DINTERP. Inside the code itself #ifdef INTERP and #endif are used. Also make sure that basic testing is available using make testparse and make testinterp.
Place all the files required for your submission in a single .zip file called nal.zip – this file will not contain other zipped files.
A. House Style
A.1 Correctness
These style rules ensure your code is as-correct-as-can-be with the aid of the compiler and other tools:
FLAGS Having no warnings (or errors!) when compiling and executing with the flags:
For array bounds checking, NULL pointers being dereferenced etc:
For memory leaks:
then run:
valgrind –leak-check=full ./myexec
For ‘final’ production-ready code:
You can use more flags than this, obviously, but these will make sure a few of the essential warnings that commonly indicate the presence of bugs and leaks are checked. These guidelines are meant to be independent of the particular compiler used though. Sometimes it is helpful to use many compilers too, e.g. gcc and clang.
If you have unused variables (for example) in your code, it doesn’t matter whether your
compiler happened to tell you about it or not – it’s still wrong !
BRACE Always brace all functions, fors, whiles, if/else etc. Somewhat contraversial, this
ensures that ‘extra’ lines tagged onto loops are dealt with correctly. For instance:
-Wall -Wextra -Wfloat -equal -pedantic -std=c90 -fsanitize=undefined -fsanitize=address -g3
-Wall -Wextra -Wfloat -equal -pedantic -std=c90 -g3
-Wall -Wextra -Wfloat -equal -pedantic -std=c90 -O3
while(i < 10) printf("%d\n", i); i++;
Correctness Prettifying Readability
90 Chapter A. House Style looks like it should print out i 10 times, but instead runs infinitely. The programmer
probably meant:
GOTO You do not use any of the keywords continue, goto or break. The one exception is inside switch, where break is allowed because it is essential ! These keywords usually lead to tangled and complex ‘spaghetti’ coding style. I often recommend that you rewrite the offending code uing functions, which can have multiple returns in them.
NAMES Meaningful identifiers. Make sure that functions names and variables having meaning- ful, but succinct, names.
REPC Repetitive code. If you’ve cut-and-paste large chunks of code, and made minor changes to it, you’ve done it wrong. Make it a function, and pass parameters that make the changes required.
while(i < 10){ printf("%d\n", i); i++;
}
int inbounds1(int i){
if(i >=0 && i < MAX){ return 1;
} else{
return 0; }
}
int inbounds2(int i){
if(i >=0 && i < LEN){
return 1; }
else{
return 0;
} }
might make more sense as:
GLOB No global variables. Global variables are declared ‘above’ main(), are in scope of all functions, and can be altered anywhere in the code. This makes it rather unclear which functions should be reading or writing them. You can make a case for saying that occasionally they could be useful (or better) than the alternatives, but for now, they are banned !
RETV Any functions that returns a value, should have it used: scanf("%d", &i);
is incorrect. It returns a value that is ignored. Instead do:
int inbounds2(int i, int mx){
if(i >=0 && i < mx){ return 1;
} else{
return 0; }
}
A.2 Prettifying 91
if(scanf("%d", &i) != 1{ /* PANIC */
The only exceptions are printf and putchar which do return values but which are
typically ignored.
MATCH For every fopen there should be a matching fclose. For every malloc there should
be a free. This helps avoid memory leaks, when your program or functions are later used
in a larger project.
STDERR When exiting your program in an error state, make sure that you fprintf the error
on stderr and not stdout. Use exit, e.g.
A.2 Prettifying
These rules are about making your code easier to read and having a consistent style in a form that others are expecting to see.
LLEN Line length. Many people use terminal and editors that are of a fixed-width. Having
excessively long lines may cause the viewer to scroll to off the screen. Keep lines short, perhaps < 60 characters. However, in a similar way to the FLEN rule below, it’s really about the complexity of the line that’s the issue, not its absolute length. A programmer would generally find:
bool arrcleanse(cell oldarr[HEIGHT][WIDTH], cell newarr[HEIGHT][WIDTH], int h, int w) a great deal easier to read than:
if(a < b && j++ >= szpar(e ? true : false) || h==4){
despite it being twice as long.
TABS Don’t use tabs to indent your code. Every editor views these differently, so you have no
guarentee that I’m seeing the same layout as you do. Use spaces. This also prevents issues
when cuttig-and-pasting from one source to another.
INDENT Indentation: choose a style for indentation and keep to it. I happen to use 3 spaces,
put opening braces for functions on a new line, but at the end of if,else, for, while etc, then close them on a new line, underneath the ‘i’ of the if:
You can use any style you like, as long as it’s consistent.
MAIN The code should have function prototypes/definitions first, then main(), followed by
the function implementation. This means the reader always know where to find main(),
since it will be near the top of the file.
CAPS Constants are #defined, and use all CAPITALS. For instance:
if(argc != 2){
fprintf(stderr, “Usage : %s
}
int smallest(int a, int b) {
if(a < b){ return a;
} else{
return b; }
}
#define WEEKS 52
#define MAX(a,b) (a < b ? b:a)
92 Chapter A. House Style
FLEN Short functions. All functions are short. It’s quite difficult to put a maximum number of lines on this, but use 20 as a starting point. Exceptions include a function that simply prints a list of instructions. There would be no benefit in splitting it into smaller functions. Short functions are easier to plan, write and test.
I find it more useful to think about how hard the function is to understand, rather than its length. Therefore, a 30 line, simple function is fine, but an extremely complex and dense 15 line function might need to be split up, or more self-documentation added.
A.3 Readability
Your code should be self-documenting. Comments will be written when there is something complex to explain, and only read when something has gone catastrophically wrong. In many cases clever use of coding will avoid the need for them. The compiler never sees them, so cannot check them. If you change your code, but not your comments, they can be highly misleading.
As Kevlin Henney said :
A common fallacy is to assume authors of incomprehensible code will somehow be able to express themselves lucidly and clearly in comments.
MAGIC No magic numbers. There should be no inexplicable numbers in your code, such as: if(i < 36){
It’s probably unclear to the reader where the 36 has come from, or what it means, even if it is obvious to the programmer at the time of writing the code. Instead, #define them with a meaningful name. Array overruns are often cured by being consistent with #defines.
BRIEF Comments are brief, and non-trivial. Worthless commenting often looks something like:
The programmer extracts no additional information from it. However, for more difficult edge cases, a comment might be useful.
TYPE You should use typedefs, enums and structs to increase readability.
INFIN No loops should be infinite. I’ll never ask you to write a program that is meant to run
forever. Therefore statements such as
while(1){ or
for(;;;){
are to be avoided.
2DINDEX 2D Arrays in C are indexed [row][col]. Sometimes it may still work correctly,
especially if you’ve consistently confused the two. Therefore, if you write code that indexes it [col][row], or [x][y] it will confuse anyone else trying to understand (or reuse) your code. If you were to sketch a graph using (i,j) you’d almost certainly make i the horizontal axis, and j the vertical. Therfore, for any two variables it makes more sense to write [b][a] or [j][i].
/* Set the variable i to zero */
int i = 0;
/* Have we reached the end of the list ? */
if(t1−>h == NULL){
B. Lab Tests
Examination Rules
This is an exam. Please:
• Wait to be seated by a Lab. supervisor.
• Do not talk to anyone else.
• Do not use a Web browser, unless instructed to do so (e.g. to submit files). • Do not use an email connection.
• Do not use other electronic devices such as laptops, tablets or phones.
• Do not look in other peoples directories.
• Do not use code written in ‘groups’ with others students.
• DO use text books, lecture notes, man pages etc.
If you do not obey the above rules you will be asked to leave.
People arriving late may not be allowed into the Lab.
You have one hour. There is additional time in case a server crashes or some other problem
occurs.
Submitting Your Work
As you complete each part, submit it online using the Blackboard submission system. Do not submit files that don’t work.
For multi-part questions, if you complete Part 1, submit a file called part1.c. If you complete Part 2, submit a file called part2.c and so on.
No partial marks are available. The only possible scores for each sub-part are pass (100%) or fail (0%).
Code Style
• Your code bf must compile without warnings using the gcc flags:
-pedantic -Wall -Wextra -Wfloat-equal -ansi -O2
• Do NOT use any global variables.
Anagrams Isograms Mutating Boards
94 Chapter B. Lab Tests B.1 Anagrams
Part 1 (60%)
An anagram is a rearrangement of a word, using all the letters. Two words are said to be anagrams if they have the same characters, but in a different order. For instance the words
‘parsley’, ‘players’ and ‘replays’ are all anagrams of each other.
Since you need to rearrange the words, two identical words, by definition, are not anagrams. Using the following template, fill in the function anagram(), so that the program runs
successfully :
#include
#include
#include
#include
int anagram(char s1[], char s2[]);
int main(void)
{
assert(anagram(“elvis”, “lives”) == 1);
assert(anagram(“dreads”, “sadder”) == 1);
assert(anagram(“replays”, “parsley”) == 1);
assert(anagram(“listen”, “silent”) == 1);
assert(anagram(“orchestra”, “carthorse”) == 1);
/* Two identical words are not anagrams */
assert(anagram(“elvis”, “elvis”) == 0);
assert(anagram(“neill”, “neil”) == 0);
assert(anagram(“neil”, “neill”) == 0);
assert(anagram(“horse”, “short”) == 0);
return 0; }
int anagram(char s1[], char s2[])
{
}
Part 2 (40%)
A deranged anagram has two words with the same characters, but the same character does not appear in the same position.
The words ‘elvis’ and ‘lives’ are not a derangement since the ‘s’ is in the same position in both words. However, ‘dreads’ and ‘sadder’ are, since no letter appears in the same position between the two words.
Extend the program above so that the following assertions, inside main() are correct:
Obviously, your program will need to run for other, unseen but similar, test cases.
B.2 Isograms 95
assert(derange(“elvis”, “lives”) == 0);
assert(derange(“dreads”, “sadder”) == 1);
assert(derange(“replays”, “parsley”) == 1);
assert(derange(“listen”, “silent”) == 0);
assert(derange(“orchestra”, “carthorse”) == 1);
B.2 Isograms Part 1 (60%)
An isogram is a word that has no repeating letters. For instance the words ‘graciously’, ‘disgrace- ful’ and ‘productively’ are all isograms. However, the word ‘dazzlingly’ is not (it contains the letters ‘z’ and ‘l’ twice).
Using the following template, fill in the function isogram(), so that the program runs successfully :
#include
#include
#include
#include
int isogram(char *s);
int main(void)
{
assert(isogram(“programming”) == 0);
assert(isogram(“housewarmings”) == 0);
assert(isogram(“abductions”) == 1);
assert(isogram(“housewarming”) == 1);
assert(isogram(“hydromagnetics”) == 1);
assert(isogram(“uncopyrightable”) == 1);
return 0;
}
int isogram(char *s)
{
}
Part 2 (40%)
Using the isogram() function written above, write a program which finds the longest isogram in a file of words. The name of the file is provided via the use of argv. On success, the program simply outputs the longest word and its length, nothing else. For example :
$ ./parttwo eowl_shuffle.txt
waveringly (10)
The file may contain many isograms of (equal) longest length. In this case, outputting any one of them will do.
96 Chapter B. Lab Tests B.3 Mutating Boards
Part 1 (60%)
Write a function that fills up a square board, randomly, with an integer 0…9. Use a:
#define N 20
to define the size of the board. Write another function to print the board. The board may look something like:
36753562912709360626
18792023759228973612
93194784503610632061
55476569374525474430
78688431492068926649
50487172722610615949
09177115977673656394
81293908850963856115
98481030444476317596
21785741859753883189
64333860488897764303
09254059469224775481
28936802105110850646
25862847240629908131
10340391969338056640
04626756987282996027
61321599149107587048
04296104222055290283
80409196254499360502
94351743146942264128
Write a function to ‘mutate’ the board. Mutating is done like this:
• Choose two random locations which are horizontally adjacent (next to each other left-
right).
• Swap these two numbers on the board if the left one is greater than the right one, numeric-
ally.
• Choose two random locations which are vertically adjacent (next to each other up-down).
• Swap these two numbers on the board if the upper one is greater than the lower one,
numerically.
• Repeat the above steps (N*N*N) times.
Now print out the board. It should look something like :
00000000000001111224
00000001111111233456
00000111112222244456
00001122222333445666
00112222223333555667
01112223333334556678
01112223334445556779
01122333344445556789
01223334444455666789
01223344445556666789
01223344455666667789
01224444456666777889
01234455566677777889
01234555666677788899
01234555666778888899
B.3 Mutating Boards 97
12234566677788888999
12345567777888889999
12445677788888899999
34446678889999999999
46678899999999999999
Ensure that if you change the size of your array, by changing your #define that the program still operates correctly.
Part 2 (40%)
Adapt the code above, so that the algorithm is:
• Choose two numbers at random locations on the board.
• Check that of these two numbers, the one closest to the centre of the array is numerically
less than the number furthest away from the centre. If not, swap them. • Repeat the above steps (N*N*N*N) times.
Once again randomise the array initially, and ensure that after changing your #define the program still works correctly.
When N = 21, the array may look something like:
999998887777788899999
999987666656666788999
998876554444555678899
998665443333444566889
987654333222233456789
876543322112223345678
865443211111112334568
765432111000011234568
765422100000001234567
764321100000001223467
764321100000001123457
764322100000001223467
765422100000001224567
865432110000011234568
865432211111122334568
876543322212223345678
987654333222233456789
988665444333344567889
998876555444455678899
999887666656666789999
999998887777788899999