1 Introduction
Assignment 6 Huffman Coding
Prof. Darrell Long CSE 13S – Spring 2021
First DESIGN.pdf draft due: May 13th at 11:59 pm PST Assignment due: May 23rd at 11:59 pm PST
You’re trying to take something that can be described in many, many sentences and pages of prose, but you can convert it into a couple lines of poetry and you still get the essence, so that’s compression. The best code is poetry.
When David Huffman was a graduate student in a class at MIT, the professor gave the class an unsolved problem: How to construct an optimal static encoding of information. The young Huffman came back a few days later with his solu- tion, and that solution changed the world. Data compression is now used in all aspects of communication. David Huffman joined the faculty of MIT in 1953, and in 1967 he joined the faculty of University of California, Santa Cruz as one of its earliest members and helped to found its Computer Science Department, where he served as chairman from 1970 to 1973. He retired in 1994, and passed away in 1999.
The key idea is called entropy, originally defined by Claude Shannon in 1948. Entropy is a measure of the amount of information in a, say, set of symbols. If we define I(x) = log2 Pr[x] to be the information content of a symbol, then the entropyofthesetX ={x1,…,xn}is
∑n ∑n
H(X)= Pr[xi]I(xi)=− Pr[xi]log2Pr[xi].
—Satya Nadella
David A. Huffman
i=1 i=1
It should be easy to see that the optimal static encoding will assign the least number of bits to the most
common symbol, and the greatest number of bits to the least common symbol.
2 The Encoder
Your first task for this assignment is to implement a Huffman encoder. This encoder will read in an input file, find the Huffman encoding of its contents, and use the encoding to compress the file. Your encoder program, named encode, must support any combination of the following command-line options:
© 2021 Darrell Long 1
• -h : Prints out a help message describing the purpose of the program and the command-line op- tions it accepts, exiting the program afterwards. Refer to the reference program in the resources repo for an idea of what to print.
• -i infile:SpecifiestheinputfiletoencodeusingHuffmancoding.Thedefaultinputshouldbe set as stdin.
• -o outfile:Specifiestheoutputfiletowritethecompressedinputto.Thedefaultoutputshould be set as stdout.
• -v : Prints compression statistics to stderr. These statistics include the uncompressed file size, the compressed file size, and space saving. The formula for calculating space saving is:
100 × (1 − (compressed size/uncompressed size)). Refer to the reference program in the resources repository for the exact output.
The algorithm to encode a file, or to compress it, is as follows:
3
1. Computeahistogramofthefile.Inotherwords,countthenumberofoccurrencesofeachunique symbol in the file.
2. ConstructtheHuffmantreeusingthecomputedhistogram.Thiswillrequireapriorityqueue.
3. Constructacodetable.Eachindexofthetablerepresentsasymbolandthevalueatthatindexthe
symbol’s code. You will need to use a stack of bits and perform a traversal of the Huffman tree.
4. Emit an encoding of the Huffman tree to a file. This will be done through a post-order traversal of
the Huffman tree. The encoding of the Huffman tree will be referred to as a tree dump.
5. Stepthrougheachsymboloftheinputfileagain.Foreachsymbol,emititscodetotheoutputfile.
The Decoder
The second task for this assignment is to implement a Huffman decoder. This decoder will read in a com- pressed input file and decompress it, expanding it back to its original, uncompressed size. Your decoder program, named decode, must support any combination of the following command-line options.
• -h : Prints out a help message describing the purpose of the program and the command-line op- tions it accepts, exiting the program afterwards. Refer to the reference program in the resources repo for an idea of what to print.
• -i infile:SpecifiestheinputfiletodecodeusingHuffmancoding.Thedefaultinputshouldbe set as stdin.
• -o outfile : Specifies the output file to write the decompressed input to. The default output should be set as stdout.
© 2021 Darrell Long 2
• -v : Prints decompression statistics to stderr. These statistics include the compressed file size, the decompressed file size, and space saving. The formula for calculating space saving is:
100 × (1 − (compressed size/decompressed size)).
Refer to the reference program in the resources repository for the exact output. The algorithm to decode a file, or to decompress it, is as follows:
1. Read the emitted (dumped) tree from the input file. A stack of nodes is needed in order to recon- struct the Huffman tree.
2. Read in the rest of the input file bit-by-bit, traversing down the Huffman tree one link at a time. Reading a 0 means walking down the left link, and reading a 1 means walking down the right link. Whenever a leaf node is reached, its symbol is emitted and you start traversing again from the root.
4 Nodes
The first ADT that we will cover is a node. Huffman trees are composed of nodes, with each node con- taining a pointer to its left child, a pointer to its right child, a symbol, and the frequency of that symbol. The node’s frequency is only needed for the encoder.
1 2 3 4 5 6 7 8
typedef struct Node Node;
struct Node { Node *left;
};
// Pointer to left child. // Pointer to right child. // Node’s symbol.
Node *right;
uint8_t symbol;
uint64_t frequency; // Frequency of symbol.
Immediately, we notice that a symbol is a uint8_t, and not a char. This is because we want to interpret the input file as raw bytes, not as a string. The following subsections define the interface for a Node and will be supplied in node.h. The definition of a Node will be made transparent in order to simplify things.
4.1 Node *node_create(uint8_t symbol, uint64_t frequency)
The constructor for a node. Sets the node’s symbol as symbol and its frequency as frequency.
4.2 void node_delete(Node **n)
The destructor for a node. Make sure to set the pointer to NULL after freeing the memory for a node.
© 2021 Darrell Long 3
4.3 Node *node_join(Node *left, Node *right)
Joins a left child node and right child node, returning a pointer to a created parent node. The parent node’s left child will be left and its right child will be right. The parent node’s symbol will be ‘$’ and its frequency the sum of its left child’s frequency and its right child’s frequency.
4.4 void node_print(Node *n)
A debug function to verify that your nodes are created and joined correctly.
5 Priority Queues
As stated in the encoding algorithm, the encoder will make use of a priority queue of nodes. A priority queue functions like a regular queue, but assigns each of its elements a priority, such that elements with a high priority are dequeued before elements with a low priority. Assuming that elements are enqueued at the tail and dequeued from the head, this implies that the enqueue() operation does not simply add the element at the tail. Of course, the dequeue() operation could search for the highest priority element each time, but that is a bad idea.
How you implement your priority queue is up to you. There are a couple choices: 1) mimicking an insertion sort when enqueuing a node, finding the correct position for the node and shifting everything back, or 2) using a min heap to serve as the priority queue. Why a min heap? Because we want nodes with lower frequencies to be dequeued first. The lower the frequency of a node, the higher its priority. Your priority queue, no matter the implementation, must fulfill the interface that will be supplied to you in pq.h.
5.1 PriorityQueue *pq_create(uint32_t capacity)
The constructor for a priority queue. The priority queue’s maximum capacity is specified by capacity.
5.2 void pq_delete(PriorityQueue **q)
The destructor for a priority queue. Make sure to set the pointer to NULL after freeing the memory for a
priority queue.
5.3 bool pq_empty(PriorityQueue *q)
Returns true if the priority queue is empty and false otherwise.
5.4 bool pq_full(PriorityQueue *q)
Returns true if the priority queue is full and false otherwise.
5.5 uint32_t pq_size(PriorityQueue *q) Returns the number of items currently in the priority queue.
© 2021 Darrell Long 4
5.6 bool enqueue(PriorityQueue *q, Node *n)
Enqueues a node into the priority queue. Returns false if the priority queue is full prior to enqueuing
the node and true otherwise to indicate the successful enqueuing of the node. 5.7 bool dequeue(PriorityQueue *q, Node **n)
Dequeues a node from the priority queue, passing it back through the double pointer n. The node de- queued should have the highest priority over all the nodes in the priority queue. Returns false if the priority queue is empty prior to dequeuing a node and true otherwise to indicate the successful de- queuing of a node.
5.8 void pq_print(PriorityQueue *q)
A debug function to print a priority queue. This function will be significantly easier to implement if your enqueue() function always ensures a total ordering over all nodes in the priority queue. Enqueuing nodes in a insertion-sort-like fashion will provide such an ordering. Implementing your priority queue as a heap, however, will only provide a partial ordering, and thus will require more work in printing to assure you that your priority queue functions as expected (you will be displaying a tree).
6 Codes
After constructing a Huffman tree, you will need to maintain a stack of bits while traversing the tree in order to create a code for each symbol. We will create a new ADT, a Code, that represents a stack of bits.
1 typedef struct Code {
2 uint32_t top;
3 uint8_t bits[MAX_CODE_SIZE];
4 } Code;
The struct definition of a Code will be made transparent. This is done for the sole purpose of being able to pass a struct by value. The macro MAX_CODE_SIZE reflects the maximum number of bytes needed to store any valid code. The definition of this macro — and other macros — will be given in defines.h. You will need to combine your knowledge of bit vectors and stacks in order to implement this ADT. The interface, given in code.h, is defined in the the following subsections.
© 2021 Darrell Long 5
Macros defined in defines.h
1 // 4KB blocks.
2 #define BLOCK 4096
3
4 // ASCII + Extended ASCII.
5 #define ALPHABET 256
6
7 // 32-bit magic number.
8 #define MAGIC 0xDEADBEEF
9
10 // Bytes for a maximum, 256-bit code.
11 #define MAX_CODE_SIZE (ALPHABET / 8)
12
13 // Maximum Huffman tree dump size.
14 #define MAX_TREE_SIZE (3 * ALPHABET – 1)
6.1 Code code_init(void)
You will immediately notice that this “constructor” function is unlike any of the other constructor func- tions you have implemented in the past. You may also have noticed, if you glanced slightly ahead, that there is no corresponding destructor function. This is an engineering decision that was made when con- sidering the constraints of the Huffman coding algorithm.
This function will not require any dynamic memory allocation. You will simply create a new Code on the stack, setting top to 0, and zeroing out the array of bits, bits. The initialized Code is then returned.
6.2 uint32_t code_size(Code *c)
Returns the size of the Code, which is exactly the number of bits pushed onto the Code.
6.3 bool code_empty(Code *c)
Returns true if the Code is empty and false otherwise.
6.4 bool code_full(Code *c)
Returns true if the Code is full and false otherwise. The maximum length of a code in bits is 256, which we have defined using the macro ALPHABET. Why 256? Because there are exactly 256 ASCII characters (including the extended ASCII).
6.5 bool code_push_bit(Code *c, uint8_t bit)
Pushes a bit onto the Code. The value of the bit to push is given by bit. Returns false if the Code is full
prior to pushing a bit and true otherwise to indicate the successful pushing of a bit.
© 2021 Darrell Long 6
6.6 bool code_pop_bit(Code *c, uint8_t *bit)
Pops a bit off the Code. The value of the popped bit is passed back with the pointer bit. Returns false if
the Code is empty prior to popping a bit and true otherwise to indicate the successful popping of a bit. 6.7 void code_print(Code *c)
A debug function to help you verify whether or not bits are pushed onto and popped off a Code correctly. 7 I/O
Now that we have covered all the essential ADTs necessary for the encoder, we will discuss I/O. Instead of the buffered I/O functions from
7.1 int read_bytes(int infile, uint8_t *buf, int nbytes)
This will be a useful wrapper function to perform reads. As you may know, the read() syscall does not always guarantee that it will read all the bytes specified (as is the case with pipes). For example, a call could be issued to read a a block of bytes, but it might only read part of a block. So, we write a wrapper function to loop calls to read() until we have either read all the bytes that were specified (nbytes) into the byte buffer buf, or there are no more bytes to read. The number of bytes that were read from the input file descriptor, infile, is returned. You should use this function whenever you need to perform a read.
7.2 int write_bytes(int outfile, uint8_t *buf, int nbytes)
This functions is very much the same as read_bytes(), except that it is for looping calls to write(). As you may imagine, write() is not guaranteed to write out all the specified bytes (nbytes), and so we must loop until we have either written out all the bytes specified from the byte buffer buf, or no bytes were written. The number of bytes written out to the output file descriptor, outfile, is returned. You should use this function whenever you need to perform a write.
7.3 bool read_bit(int infile, uint8_t *bit)
You should all know by now that it is not possible to read a single bit from a file. What you can do, however, is read in a block of bytes into a buffer and dole out bits one at a time. Whenever all the bits in the buffer have been doled out, you can simply fill the buffer back up again with bytes from infile. This is exactly what you will do in this function. You will maintain a static buffer of bytes and an index into the buffer that tracks which bit to return through the pointer bit. The buffer will store BLOCK number of bytes, where BLOCK is yet another macro defined in defines.h. This function returns false if there are no more bits that can be read and true if there are still bits to read. It may help to treat the buffer as a bit vector.
© 2021 Darrell Long 7
7.4 void write_code(int outfile, Code *c)
The same bit-buffering logic used in read_bit() will be used in here as well. This function will also make use of a static buffer (we recommend this buffer to be static to the file, not just this function) and an index. Each bit in the code c will be buffered into the buffer. The bits will be buffered starting from the 0th bit in c. When the buffer of BLOCK bytes is filled with bits, write the contents of the buffer to outfile.
7.5 void flush_codes(int outfile)
It is not always guaranteed that the buffered codes will align nicely with a block, which means that it is possible to have bits leftover in the buffer used by write_code() after the input file has been completely encoded. The sole purpose of this function is to write out any leftover, buffered bits. Make sure that any extra bits in the last byte are zeroed before flushing the codes.
8 Stacks
You will need to use a stack in your decoder to reconstruct a Huffman tree. The interface of the stack should be familiar from assignments 3 and 4. The difference is that the stack this time around will store nodes. The interface for the stack is defined in stack.h.
1 2 3 4 5
8.1
struct Stack { uint32_t top;
uint32_t capacity;
Node **items; };
Stack *stack_create(uint32_t capacity)
The constructor for a stack. The maximum number of nodes the stack can hold is specified by capacity. 8.2 void stack_delete(Stack **s)
The destructor for a stack. Remember to set the pointer to NULL after you free the memory allocated by the stack.
8.3 bool stack_empty(Stack *s)
Returns true if the stack is empty and false otherwise.
8.4 bool stack_full(Stack *s)
Returns true if the stack is full and false otherwise.
8.5 uint32_t stack_size(Stack *s) Returns the number of nodes in the stack.
© 2021 Darrell Long 8
8.6 bool stack_push(Stack *s, Node *n)
Pushes a node onto the stack. Returns false if the stack is full prior to pushing the node and true
otherwise to indicate the successful pushing of a node.
8.7 bool stack_pop(Stack *s, Node **n)
Pops a node off the stack, passing it back through the double pointer n. Returns false if the stack is
empty prior to popping a node and true otherwise to indicate the succesuccessful popping of a node. 8.8 void stack_print(Stack *s)
A debug function to print the contents of a stack.
9 A Huffman Coding Module
A interface for a Huffman coding module that you will need to implement will be given in huffman.h. Do not worry if you do not initially understand the exact purpose of each function, as they will be clar- ified in §10 and §11. The interface is just given now as a reference for which functions are used in the aforementioned sections.
9.1 Node *build_tree(uint64_t hist[static ALPHABET])
Constructs a Huffman tree given a computed histogram. The histogram will have ALPHABET indices, one index for each possible symbol. Returns the root node of the constructed tree. The use of static array indices in parameter declaration is a C99 addition. In this case, it informs the compiler that the histogram hist should have at least ALPHABET number of indices.
9.2 void build_codes(Node *root, Code table[static ALPHABET])
Populates a code table, building the code for each symbols in the Huffman tree. The constructed codes
are copied to the code table, table, which has ALPHABET indices, one index for each possible symbol. 9.3 Node *rebuild_tree(uint16_t nbytes, uint8_t tree_dump[static nbytes])
Reconstructs a Huffman tree given its post-order tree dump stored in the array tree_dump. The length in bytes of tree_dump is given by nbytes. Returns the root node of the reconstructed tree.
9.4 void delete_tree(Node **root)
The destructor for a Huffman tree. This will require a post-order traversal of the tree to free all the nodes.
Remember to set the pointer to NULL after you are finished freeing all the allocated memory.
© 2021 Darrell Long 9
10 Specifics for the Encoder
For this section, the input file to compress will be referred to as infile and the compressed output file as outfile. Your encoder, after parsing command-line options with getopt(), must perform the following steps exactly to produce the correct Huffman encoding:
1. Read through infile to construct a histogram. Your histogram should be a simple array of 256 (ALPHABET) uint64_ts.
2. Increment the count of element 0 and element 255 by one in the histogram. This is so that at the very minimum, the histogram will have two elements present. Do this regardless of what you read in. While doing this may result in a slightly sub-optimal Huffman tree later on, it is a quick and clean solution to handling the case when a file has no bytes or contains only one unique symbol.
3. ConstructtheHuffmantreeusingapriorityqueue.Thiswillbedoneusingbuild_tree().
(a) Create a priority queue. For each symbol histogram where its frequency is greater than 0 (there should be at minimum two elements because of step 2), create a corresponding Node and insert it into the priority queue.
(b) While there are two or more nodes in the priority queue, dequeue two nodes. The first de- queued node will be the left child node. The second dequeued node will be the right child node. Join these nodes together using node_join() and enqueue the joined parent node. The frequency of the parent node is the sum of its left child’s frequency and its right child’s frequency.
(c) Eventually, there will only be one node left in the priority queue. This node is the root of the constructed Huffman tree.
4. Construct a code table by traversing the Huffman tree. This will be done using build_codes(). The code table is a simple array of 256 (ALPHABET) Codes.
(a) CreateanewCodecusingcode_init().StartingattherootoftheHuffmantree,performa post-order traversal.
(b) If the current node is a leaf, the current code c represents the path to the node, and thus is the code for the node’s symbol. Save this code into code table.
(c) Else,thecurrentnodemustbeaninteriornode.Pusha0tocandrecursedowntheleftlink.
(d) Afteryoureturnfromtheleftlink,popabitfromc,pusha1tocandrecursedowntheright
link. Remember to pop a bit from c when you return from the right link.
5. Constructaheader.Aheaderisdefinedbythefollowingstructdefinition,whichwillbesupplied
to you in header.h:
1 typedef struct Header {
2 uint32_t magic; // 32-bit magic number.
3 uint16_t permissions; // Input file permissions.
4 uint16_t tree_size;
5 uint64_t file_size;
6 } Header;
© 2021 Darrell Long
// Emitted tree size in bytes. // Input file size.
10
The header’s magic number field, magic, should be set to the macro MAGIC, as defined in defines.h. This magic number identifies a file as one which has been compressed using your encoder. It is crucial that you use this magic number and nothing else.
The header’s permissions field stores the original permission bits of infile. You can get these permissions by using fstat(). You should also set the permissions of outfile to match the per- missions of infile using fchmod().
The header’s tree_size field represents the number of bytes that make up the Huffman tree dump. This size is calculated as (3 × unique symbols) − 1.
Finally, the header’s file_size field is the size in bytes of the file to compress, infile. You get this size using fstat() as well.
6. Writetheconstructedheadertooutfile.
7. Perform a post-order traversal of the Huffman tree to write the tree to the outfile. This should write an ‘L’ followed by the byte of the symbol for each leaf, and an ‘I’ for interior nodes. You should not write a symbol for an interior node.
8. Startingatthebeginningofinfile,writethecorrespondingcodeforeachsymboltooutfilewith write_code(). When finished with all the symbols, make sure to flush any remaining buffered codes with flush_codes().
9. Closeinfileandoutfile.
11 Specifics for the Decoder
For this section, the input file to decompress will be referred to as infile and the compressed output file as outfile. Your decoder, after parsing command-line options with getopt(), must perform the following steps exactly to decode the file:
1. Readintheheaderfrominfileandverifythemagicnumber.Ifthemagicnumberdoesnotmatch 0xDEADBEEF (defined as MAGIC in defines.h), then an invalid file was passed to your program. Display a helpful error message and quit.
2. Thepermissionsfieldintheheaderindicatesthepermissionsthatoutfileshouldbesetto.Set the permissions using fchmod().
3. The size of the dumped tree is given by the tree_size field in the header. Read the dumped tree from infile into an array that is tree_size bytes long. Then, reconstruct the Huffman tree using rebuild_tree().
(a) The array containing the dumped tree will be referred to as tree_dump. The length of this array will be nbytes. A stack of nodes will be needed to reconstruct the tree.
(b) Iterateoverthecontentstree_dumpfrom0tonbytes.
(c) If the element of the array is an ‘L’, then the next element will be the symbol for the leaf node. Use that symbol to create a new node with node_create(). Push the created node onto the stack.
© 2021 Darrell Long 11
(d) If the element of the array is an ‘I’, then you have encountered an interior node. Pop the stack once to get the right child of the interior node, then pop again to get the left child of the interior node. Note: the pop order is important. Join the left and right nodes with node_join() and push the joined parent node on the stack.
(e) Therewillbeonenodeleftinthestackafteryoufinishiteratingoverthecontentstree_dump. This node is the root of the Huffman tree.
4. Read infile one bit at a time using read_bit(). You will be traversing down the tree one link at a time for each bit that is read.
(a) Begin at the root of the Huffman tree. If a bit of value 0 is read, then walk down to the left child of the current node. Else, if a bit of value 1 is read, then walk down to the right child of the current node.
(b) If you find yourself at a leaf node, then write the leaf node’s symbol to outfile. Note: you may alternatively buffer these symbols and write out the buffer whenever it is filled (this will be more efficient). After writing the symbol, reset the current node back to the root of the tree.
(c) Repeatuntilthenumberofdecodedsymbolsmatchestheoriginalfilesize,whichisgivenby the file_size field in the header that was read from infile.
5. Closeinfileandoutfile.
12 Deliverables
You will need to turn in:
1. encode.c:ThisfilewillcontainyourimplementationoftheHuffmanencoder.
2. decode.c:ThisfilewillcontainyourimplementationoftheHuffmandecoder.
3. entropy.c: This file will be provided in the resources repository, but should be included in your repo as well. You must not modify this file.
4. defines.h:Thisfilewillcontainthemacrodefinitionsusedthroughouttheassignment.Youmay not modify this file.
5. header.h: This will will contain the struct definition for a file header. You may not modify this file.
6. node.h: This file will contain the node ADT interface. This file will be provided. You may not modify this file.
7. node.c:ThisfilewillcontainyourimplementationofthenodeADT.
8. pq.h: This file will contain the priority queue ADT interface. This file will be provided. You may
not modify this file.
© 2021 Darrell Long 12
9. pq.c: This file will contain your implementation of the priority queue ADT. You must define your priority queue struct in this file.
10. code.h: This file will contain the code ADT interface. This file will be provided. You may not modify this file.
11. code.c:ThisfilewillcontainyourimplementationofthecodeADT.
12. io.h:ThisfilewillcontaintheI/Omoduleinterface.Thisfilewillbeprovided.Youmaynotmodify
this file.
13. io.c:ThisfilewillcontainyourimplementationoftheI/Omodule.
14. stack.h: This file will contain the stack ADT interface. This file will be provided. You may not
modify this file.
15. stack.c:ThisfilewillcontainyourimplementationofthestackADT.Youmustdefineyourstack struct in this file.
16. huffman.h:ThisfilewillcontaintheHuffmancodingmoduleinterface.Thisfilewillbeprovided. You may not modify this file.
17. huffman.c:ThisfilewillcontainyourimplementationoftheHuffmancodingmoduleinterface.
18. Makefile:Thisisafilethatwillallowthegradertotypemaketocompileyourprograms.
• CC=clangmustbespecified.
• CFLAGS=-Wall -Wextra -Werror -Wpedanticmustbeincluded.
• makeshouldbuildtheencoder,thedecoder,andthesuppliedentropy-measureprogram,as shouldmake all.
• make encodeshouldbuildjusttheencoder.
• make decodeshouldbuildjustthedecoder.
• make entropyshouldbuildjustthesuppliedentropy-measureprogram.
• make cleanmustremoveallfilesthatarecompilergenerated.
• make formatshouldformatallyoursourcecode,includingtheheaderfiles.
19. Your code must pass scan-build cleanly. If there are any bugs or errors that are false positives, document them and explain why they are false positives in your README.md.
20. README.md:ThismustbeinMarkdown.Thismustdescribehowtobuildandrunyourprogram.
21. DESIGN.pdf: This must be a PDF. The design document should answer the pre-lab questions, describe the purpose of your program, and communicate its overall design with enough detail such that a sufficiently knowledgeable programmer would be able to replicate your implementation. This does not mean copying your entire program in verbatim. You should instead describe how your program works with supporting pseudocode. C code is not considered pseudocode.
© 2021 Darrell Long 13
13 Submission
To submit your assignment through git, refer to the steps shown in asgn0 Remember: add, commit, and push! Your assignment is turned in only after you have pushed and submitted the commit ID to Canvas. Your design document is turned in only after you have pushed and submitted the commit ID to Canvas. If you forget to push, you have not turned in your assignment and you will get a zero. “I forgot to push” is not a valid excuse. It is highly recommended to commit and push your changes often.
14
Supplemental Readings
15
– Chapter2§2.1
– Chapter6§6.5
– Chapter10§10.1 – Chapter12§12.1 – Chapter16§16.3
Strategy
•
•
TheCProgrammingLanguagebyKernighan&Ritchie – Chapter8
IntroductiontoAlgorithmsbyT.Cormen,C.Leiserson,R.Rivest,&C.Stein
The more that you read, the more things you will know. The more that you learn, the more places you’ll go.
I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships.
—Linus Torvalds
—Dr. Seuss
Let’s talk strategy.
First, develop the data structures that you will need. Draw pictures, work them out, implement them,
and test them. Test them again. A program called printtree will be supplied in the resources reposi- tory. It will help you determine whether or not you are constructing and dumping your Huffman trees correctly.
Do things in small, incremental steps. Make a histogram. Test it with a small input file. Create a node using a symbol in the histogram. Does that work? Good, now try putting a tree together. Do the same for the rest of the data structures. You’ll thank yourself later down the road if you do this.
© 2021 Darrell Long 14
As with all assignments, a working encoder and decoder will be supplied in the resources repository. Test if you can decode what the reference encoder encodes. Test if the reference decoder and decode what your encoder encodes.
Build your toolkit. Build components. Test them.
Mafbalrrpyb py Z ph kpdn bpcpyb l rfydnx l zilpyhlj.
© 2021 Darrell Long 15