CSCI 1130A/B Introduction to Computing Using Java 2018-2019 First Term Department of Computer Science and Engineering
The Chinese University of Hong Kong
Assignment 4: Encoding ASCII Art
Due date: 7 November 2018 (Wed) Full mark: 100 Expected normal time spent on Tasks 1 and 2: 8-10 hours
Aims: 1. To encode and decode pictures of ASCII art using run-length encoding. 2. Practise reading/writing text files.
3. Practise the use of String and related methods.
BACKGROUND
ASCII Arts
ASCII stands for American Standard Code for Information Interchange. It is a character encoding standard for electronic communication. The character set consists of 95 printable characters together with other special non-printable characters, making up a total of 128 characters in the standard 7-bit ASCII code.
ASCII art is a graphic design technique that uses printable ASCII characters to create pictures. It is in general regarded as text-based visual art. ASCII art was invented at around 1966. It is mainly because early printers often lacked graphics ability. Therefore, characters were used in place of graphic marks. Below is the picture of a building making up with ASCII characters.
.
. |~~ .
.
. ___|___ . ((((())))
(((((()))))
|————-|
+ I_I_I_I_I_I_I_I_I +
(() |—————| (()
|—| ||-| |-| |-| |-|| |—| _________|—–|_|—————|_|—–|_________ I_I_I_I_I_I_I_I|I_I_I_I_I_I_I_I_I_I|I_I_I_I_I_I_I_| |——-|——|——————-|——|——-| ||-| |-|| |-| ||-| |-| |-| |-| |-|| |-| ||-| |-|| ((|——-|——|——————-|——|——-|)) ()| |_| | |_| |::::: ——- :::::| |_| | |_| |() ))| |_| | |_| | |_| |_.-“-._| |_| | |_| | |_| |(( ()|——-|——| |_| | | | | | |_| |——|——-|() @@@@@@@@@@@@@@@@@|—–|_|_|_|_|—–|@@@@@@@@@@@@@@@@@ @@@@/=============\@@@@
/\
.
Source: www.asciiworld.com
Run-length Encoding
Run-length encoding (RLE) is one of the simplest form of lossless data compression. A sequence of data in which the same data value occurs in many consecutive data elements are stored as a single pair of data value and count instead of the original form. It is quite effective in compressing data that contains frequent repetitions and repeated patterns, for example, the ACSII art described above.
1
Task 3:
? hours
PROBLEM DEFINITION
In this assignment, you are required to write some classes to encode ASCII art and decode the compressed ASCII art file.
Task 1: Encoding a text file containing a picture of ASCII art (50%)
To encode the file, you need to open and read the text file line-by-line. Then you need to figure out whether
there are any occurrences of consecutive elements on each single line.
Example 1:
The original line
The encoded line
Syntax of the encoded line
<positive integer> <ASCII character(s)> <positive integer> <ASCII character(s)> <positive integer> <ASCII character(s)> … repeats until the end of line.
The encoded lines are written to a new text file for output. The positive integer denotes the number of times that the succeeding character(s) is repeated in the original line. There is a space between the positive integer and the character pattern. The line continues until the end of the input line is reached.
If the positive integer equals 1, the length of the succeeding ASCII character pattern can be larger than or equal to 1. If the positive integer is greater than 1, the length of the succeeding pattern should be 1 in this task.
As spaces are used as the delimiter in the encoded file, we here use negative integers to encode spaces in the original file. Whenever you encounter spaces in the original line, you are required to write a negative integer in the encoded file. For N spaces, the integer written is -N.
Example 2:
The original line
The encoded line
Syntax of the encoded line
<negative integer> <positive integer> <ASCII character(s)> <negative integer> <positive integer> <ASCII character(s)> <positive integer> <ASCII character(s)>
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
| |
– |
– |
– |
– |
– |
| |
_ |
| |
_ |
| |
_ |
| |
_ |
| |
– |
– |
– |
– |
– |
| |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
1 |
7 |
@ |
1 |
| |
5 |
– |
1 |
| |
_ |
| |
_ |
| |
_ |
| |
_ |
| |
5 |
– |
1 |
| |
1 |
7 |
@ |
. |
( |
( |
( |
( |
( |
( |
) |
) |
) |
) |
) |
– |
1 |
0 |
1 |
. |
– |
1 |
1 |
6 |
( |
5 |
) |
2
There are 10 spaces at the beginning of the original line. Then, a ‘.’ appears. After that, we have 11 spaces. It is then followed by 6 ‘(‘s and 5 ‘)’s.
The suggested algorithm:
- Read a single line of text from the file containing a picture of ASCII art.
- Process this line by first determining whether there is a space.
- If there are some spaces, count the number of spaces and write a negative integer to the encoded
file.
- If it is not a space, determine whether there is a repetition of a single character.
- If repetition exists, write a positive integer to represent the number of the repeated character.
Also write the repeated character to the file.
- If there is no repetition, write 1 and extract the character pattern up to a position where a space
is present OR repetition of a single character starts to occur.
- Repeat Step 2 to process the remaining characters in the same line.
- The encoding process ends when all lines in the original file are encoded in the output file.
For an original file that contains M lines of texts, there must also be M lines of texts in the corresponding encoded file.
Task 2: Decoding the compressed ASCII art file (40%)
In this task, you are asked to reverse the above process. Given the run-length encoded file, you are required
to decode it to the original ASCII picture.
You should also open and read the encoded file line-by-line. For each line, you need to scan for integers and strings in this single line so that you can write an appropriate number of the repeated characters to the decoded ASCII picture file. Once a negative integer is read, you should convert it to the correct number of spaces in the decoded file.
The suggested algorithm:
- Read a line of characters from the encoded ASCII art file.
- Scan this line for an integer.
- If it is negative, write a corresponding number of spaces in the outputting line of the decoded
file.
- If it is positive, scan for a string that follows in the input line. Write an appropriate number of
this character pattern in the decoded file.
- Repeat Step 2 until the end of line is reached.
- The decoding process ends when all lines in the encoded file are read.
Task 3: Increasing the efficiency of the encoded file (10%)
In task 1, we only consider the repetition of a single character in the original line. We can actually look for
repeated string patterns to increase the efficiency in terms of storage space.
3
Example 3:
The original line
The encoded line
If we make use of the pattern “|_” in the original line, the encoded line becomes shorter than its counterpart as shown in Example 1.
Example 4:
The original line
The encoded line
Indeed, the repeated string pattern may be longer than 2 characters. The longer the pattern that you are able to take into account, the smaller is the size of the encoded file.
Please note that a space in the original file breaks the pattern. It is because the <ASCII character(s)> pattern cannot contain any spaces in the encoded file. As we are writing our encoded file in text format and making it to be able to decode using a simple algorithm, the resulting file may have a size larger than the original one.
Example 5:
The original line
The encoded line
In this task, your marks are given with reference to the size of your encoded file. The smaller the file size, the higher is your mark. To make sure your encoding scheme is correct, you should verify the resulting file by decoding it once with the method(s) written in Task 2.
For all the tasks above, you can assume that the dimension of the original ASCII art picture is smaller than 500-by-500 characters., including newline characters.
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
| |
– |
– |
– |
– |
– |
| |
_ |
| |
_ |
| |
_ |
| |
_ |
| |
– |
– |
– |
– |
– |
| |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
@ |
1 |
7 |
@ |
1 |
| |
5 |
– |
4 |
| |
_ |
1 |
| |
5 |
– |
1 |
| |
1 |
7 |
@ |
I |
_ |
I |
_ |
I |
_ |
I |
_ |
I |
_ |
I |
_ |
I |
_ |
I |
| |
I |
_ |
I |
_ |
I |
_ |
I |
_ |
I |
_ |
I |
_ |
I |
_ |
I |
_ |
I |
_ |
I |
| |
I |
_ |
I |
_ |
I |
_ |
I |
_ |
I |
_ |
I |
_ |
I |
_ |
| |
– |
2 |
7 |
I |
_ |
1 |
I |
| |
9 |
I |
_ |
1 |
I |
| |
7 |
I |
_ |
1 |
| |
| |
| |
@ |
| |
| |
@ |
| |
| |
@ |
| |
| |
@ |
| |
| |
@ |
| |
| |
2 |
| |
1 |
@ |
| |
– |
1 |
1 |
| |
@ |
| |
– |
1 |
1 |
| |
@ |
| |
– |
1 |
1 |
| |
@ |
| |
– |
1 |
1 |
| |
@ |
2 |
| |
4
PROCEDURE
1. Create a new project named Assignment4 in folder Assignment4. There are four source files named Assignment4.java, RunLengthEncoder.java, RunLengthDecoder.java and RunLengthEncoderAdvanced.java that contains the classes Assignment4, RunLengthEncoder (for task 1), RunLengthDecoder (for task 2) and RunLengthEncoderAdvanced (for task 3). You shall define them in one package named assignment4.
2. In the main method of class Assignment4, you are required to read a file name from the standard input by users as follows:
If the input name is “testcase1” as shown above, the program should read the ASCII art file “testcase1.txt”. Then, your program generates the following four files:
(1) “testcase1_e.txt”: the encoding results from Tasks 1;
(2) “testcase1_d.txt”: the results from decoding (1);
(3) “testcase1_ae.txt”: the encoding results from Tasks 3; (4) “testcase1_ad.txt”: the results from decoding (3).
In the main method, you also need to make use of the other classes defined to perform encoding and decoding.
3. If you have completed writing the classes, try build the project (press the function key [F11] on the keyboard). If there are errors, don’t panic. Double-click on the first error message in the Output window. Check the error, correct it and re-compile. Feel tired? Take a rest.
4. If you have many opened projects, close others or click menu [Run] [Set Main Project].
5. You may insert println() statements in your work to inspect variables and intermediate results.
6. When you finish and there is no more error, you are ready to try out the program by pressing the function key [F6] on the keyboard. Then you can type the input number in the standard input. Enjoy your work.
SUBMISSION
1. Locate your NetBeans project folder, e.g. H:\Assignment4\.
2. ZIP the project folder Assignment4 and Submit the file Assignment4.zip via our Online Assignment
Collection Box on Blackboard https://blackboard.cuhk.edu.hk
The original ASCII art picture file: testcase1
5
MARKING SCHEME AND NOTES
1. The submitted program should be free of any typing mistakes, compilation errors and warnings.
2. Comment/remark, indentation, style is under assessment in every programming assignments unless specified otherwise. Variable naming, proper indentation for code blocks and adequate comments are important. Insert your name, SID, section, date as well as a declaration statement on academic honesty in a header comment block in the source file.
3. Test your work using different sets of inputs.
4. For Task 3, the smaller the size of the encoded file, the higher is the mark that you can get.
5. Remember to do your submission before 6:00 p.m. of the due date. No late submission would be accepted.
6. If you submit multiple times, ONLY the content and time-stamp of the latest one would be counted. You may delete (i.e. take back) your attached file and re-submit. We ONLY take into account the last submission.
UNIVERSITY GUIDELINE FOR PLAGIARISM
Attention is drawn to University policy and regulations on honesty in academic work, and to the disciplinary guidelines and procedures applicable to breaches of such policy and regulations. Details may be found at http://www.cuhk.edu.hk/policy/academichonesty/. With each assignment, students are required to submit a statement that they are aware of these policies, regulations, guidelines and procedures, in a header comment block.
FACULTY OF ENGINEERING GUIDELINES TO ACADEMIC HONESTY
MUST read: https://www.erg.cuhk.edu.hk/erg/AcademicHonesty
(you may need to access via CUHK campus network/ CUHK1x/ CUHK VPN)
6