Instructions for processing and submitting Exercise 1: run length coding (8 points)
• Please use MARS to simulate your solution. Make sure your submission is in MARS can be run.
• You will receive a separate file for each task, consisting of a specification and solution section consists. Please add your name and matriculation number in the space provided.
To solve the problem, only work on the part of the solution below the mark:
# + Solution section
# + —————–
• Your solution must also work with input values other than those specified. For your
To test code with other inputs, you can change the sample data in the solution section. • Please do not make any modifications to the specification section and leave the defaults
Markings (lines beginning with # +) unchanged.
• Even if the simulation provides correct results for your solution, the solution can fail
contain. A correct solution must also adhere to the known register conventions! • Please make your assembler code comprehensible and use detailed com-
mentare to demonstrate how your assembly code works.
Exercise 1: run length coding
Run-length encoding is a simple, lossless data compression algorithm
mus. A byte string is to be translated into a sequence of byte pairs: The first byte of a
Pair indicates the character; the second byte the number of repetitions of this character. A row of several identical consecutive characters can thus save space as a single byte pair
be coded.1 examples:
https://translate.googleusercontent.com/translate_f[2021/1/29 5:48:43]
Instructions for processing and submitting Exercise 1: run length coding (8 points)
input
“aaaaaaaaaa” “Hello” “BBBBZZABB”
“abBBBBBBBB”
output
(‘a’ , 10)
(‘H’ , 1) (‘a’ , 1) (‘l’ , 2) (‘o’ , 1) (‘B’ , 4) (‘Z’ , 2) (‘A’ , 1) (‘B’ , 2)
(‘a’ , 1) (‘b’ , 1) (‘B’ , 8)
Implement the function rle, which encodes an input string str_in run-length in
transmits an output buffer buf_out. str_in is a zero-terminated ASCII character sequence (string). The compression result is to be stored in buf_out in the form of consecutive byte pairs
1 See the Wikipedia article for more information: https://de.wikipedia.org/wiki/Lauflängenkodierung#Mehrwerte_Symbollösungen
1
are: The ASCII character is to be saved in the first byte, the input in the second byte.
number of repetitions as a number (not ASCII-coded). You can assume no sequences
of more than 127 identical consecutive characters appear in the input, so that overlays
fe of the repetition value do not have to be taken into account. After successfully processing the The input string should return the number of byte pairs written.
For the input “BBBBZZABB”, for example, the following byte sequence should be entered in the output buffer be written:
Byte sequence hexadecimal: 42 04 5A 02 41 01 42 02 (ASCII characters: ‘B’ – ‘Z’ – ‘A’ – ‘B’ -)
Below you can see the C signature of the required function and the MIPS registers for parameters and return value, which result from the MIPS register convention:
int rle (char * str_in, char * buf_out);
$ v0 $ a0 $ a1
Assistance
• Implement a loop that iterates over the string. Load the individual Characters using lb (or lbu).
• If a zero is read, the end of the character string has been reached.
• To determine the repetition value, the same consecutive characters must
will count.
• As an intermediate step, you can note the function as pseudocode or in C / Java.
https://translate.googleusercontent.com/translate_f[2021/1/29 5:48:43]
Instructions for processing and submitting Exercise 1: run length coding (8 points)
Task 2: Animal identification using a bloom filter
The bloom filter is a probabilistic data structure that is used to test quickly whether a given element is included in a recognition set. It can be too false positive results, but not false negative results.2
The basis of our bloom filter is a bit map with a length of 256 bits. Originally all are bits set to 0 in the row.
To add a new value to the detection set of the Bloom filter, the hash function
hash_str executed k times on the input value. The start values (seed) 0 , 1 , … k – 1
used. The return values of the hash function are values in the range 0 to 255. These can be saved as Bit index can be interpreted in the bit sequence. Every time the hash function is executed, this is Return value as bit index corresponding bit of the bit sequence set to 1.
In order to check within the scope of an evaluation function whether an entry belongs to the detection set, is just like when adding the hash function hash_str executed k times on the input value. There
the start values(seed) 0 , 1 , … k− 1 are used. If after each execution of the hash function
the bit of the bit sequence corresponding to the return value as a bit index is set (1), is the result
positive / true (input belongs to the recognition set with high probability). Is at least
2 Further explanations and a detailed example can also be found here in the Wikipedia article: https://de.wikipedia.org/wiki/Bloomfilter
2
if one of the bits is not set (0), the result is negative / false (input is definitely not an option to the detection amount).
Use the given function hash_str with the following signature:
int hash_str (char * str_in, int seed);
$ v0 $ a0 $ a1
In addition, the following two functions are available for reading out (bitmap_getbit) and set (bitmap_setbit) of individual bits in the bitmap:
int bitmap_getbit (char * bitmap, int index);
$ v0 $ a0 $ a1
void bitmap_setbit (char * bitmap, int index);
https://translate.googleusercontent.com/translate_f[2021/1/29 5:48:43]
Instructions for processing and submitting Exercise 1: run length coding
$ a0 $ a1
Page 4
Part 2.1: Evaluation function
In this function you should create the bloom filter evaluation function bf_evaluate. The argument
str_in contains the input to be checked as a zero-terminated character string. The argument
bitmap points to the bit sequence of the Bloom filter, which encodes the detection amount. The argument k indicates the number of hash function calls. If the recognition result is positive (true), the
Function returns the value 1, with a negative recognition result (false) the value 0.
int bf_evaluate (char * str_in, char * bitmap, int k);
$ v0 $ a0 $ a1 $ a2
In the default file there is already a bit sequence which, as a recognition set, is a series of
Coded animal names. The main function of the default performs your evaluation function for a number of Inputs from. In our example, the inputs “cat”, “dog”, “mole”, “mouse” and
“Goldfish” give a positive result. The entries “tree”, “lamp”, “lake” and “boat”
should deliver a negative result.
Part 2.2: Extending the recognition set
Unfortunately, the detection set does not yet include all animals: “Earthworm”, for example, delivers one negative result. You should therefore now implement the function bf_add, which requires an input str_in to add to the detection set. The bitmap argument continues to point to the bit sequence of the
Bloom filters. The argument k specifies the number of hash function calls.
void bf_add (char * str_in, char * bitmap, int k);
$ a0 $ a1 $ a2
To test the function, the function bf_add is executed with “earthworm” as default. Once added, the new word should be recognized as positive by bf_evaluate. “Tree”, “Lamp” etc. should still be recognized as negative.
3
Assistance
• A loop must be used for both subtasks to use the hash function hash_str
execute k times. Make sure that the correct values for the Function parameter seed (start value).
https://translate.googleusercontent.com/translate_f[2021/1/29 5:48:43]
Instructions for processing and submitting Exercise 1: run length coding
• To work on subtask 2.1 leave the default empty function bf_add first
unchanged. This is called by the main function of the default, but still has empty no effect. To work on subtask 2.2 you then add the functionality of
bf_add.
• After completing subtask 2.1, all animals except
correctly recognized by the earthworm and no false positive results occur. After Completion of subtask 2.2, all animals should be recognized correctly and still none false positive results occur.
• If you add more test words that are not animals to test_words
gen (this is not required), there is a high probability that your function should be negative Return result (false). Since it is a probabilistic algorithm, the
Possibility of false positives.
• Despite the probabilistic properties of the bloom filter, we made sure that the given test words must not give false positive results!
• Do not access the bitmap bitmap directly with load / store commands, but use it The functions bitmap_getbit and bitmap_setbit.
• As an intermediate step, you can note the functions as pseudocode or in C / Java.
4th
https://translate.googleusercontent.com/translate_f[2021/1/29 5:48:43]