python模式识别代写 ENGG1811 Assignment 1: Pattern recognition

ENGG1811 Assignment 1: Pattern recognition

Due date: 5pm, Friday 14 September 2018 (week 8). Late submissions will be penalised at the rate of 10% per day. The penalty applies to the maximum available mark. Submissions will generally not be accepted after 5pm, Wednesday 19 September 2018.

Updates

(Corrections issued on 3 Sept 2018) Line #55 of the file test_find_matching_patterns_alt.py currently reads

# dissim_threshold = 1 # expected [1,2,4]

This is incorrect. You should replace this line by:

# dissim_threshold = 1 # expected [1,4]
# dissim_threshold = 1.02 # expected [1,2,4]

Pattern recognition

We are born with the ability to recognise patterns. We recognise faces, places, alphabets, objects etc. Some computer programs also have the ability to recognise patterns. For example, the following is a picture taken from the Apple website on its software on recognising faces in photos.

Pattern recognition is also very useful for engineering. For example, pattern recognition has been used to help to monitor the health of infrastructure such as bridges. As an other example, in biomedical engineering, pattern recognition is applied to electrocardiogram (ECG) in order to tell whether a person may have congestive heart failure (CHF) or not. The picture below, which is taken from a research paper, shows two specimens of normal ECGs and one specimen of the ECG coming from a person with congestive heart failure.

You can visually see that the ECG from the person with congestive heart failure is missing the sharp positive peaks that are found in the normal ECGs. We use our visual senses to do pattern recognition but have you ever wondered how software does it. In this assignment, you will use your knowledge of Python

to program a pattern recognition algorithm.

Learning objectives

By completing this assignment, you will learn:

  1. To apply basic programming concepts of variable declaration, assignment, conditional, functions and loops.
  2. To use the Python data types: list, list of lists and Boolean
  3. To translate an algorithm described in a natural language to a computer language.
  4. To organize programs into modules by using functions
  5. To use good program style including comments and documentation
  6. To get a practice on software development, which includes incremental development, testing and

    debugging.

Prohibition
You are not allowed to use numpy for this assignment. This is an individual assignment, so no group

work.

Algorithm for pattern recognition

We will describe the pattern recognition algorithm that you will program in this assignment. The algorithm has been adapted so that you only need the Python skills that you have learnt in the first six weeks’ of the lectures in this course. This also means that the algorithm is a very simple minded one compared to those that people really use nowadays.

We will begin by describing a real-life analogy which will help you to understand the more abstract algorithm description later on. Let us assume that you are given the photos of 3 people, who we will call Angela, Beatrice and Catherine. There are 2 photos of Angela, 2 photos of Beatrice and 3 photos of Catherine. You are then presented with a new photo (which is different from those 7 that you are given) and this photo does come from one of these three ladies. You are then asked to tell whether the person in the new photo is Angela, Beatrice or Catherine. In this situation, you will compare the new photo with those 7 that are given and find the closest match. For the moment, let us assume that these three ladies look very different, then you will be able to identify the person in the new photo and you give one answer. However, this may not always be the case. Let us say Beatrice and Catherine are identical twins and the new photo belongs to Beatrice, then you may not be able to say definitely whether the photo is Beatrice’s or Catherine’s; in this case you may give two answers: Beatrice or Catherine, because you cannot tell them apart.

We now describe the data that the algorithm will operate on. Consider the following Python code:

# specimens of Pattern 1 pattern0_specimen0 = [1.1, 0, 0, 0, 1.1] pattern0_specimen1 = [1.1, 0, 0, 0, 0.9]

# specimens of Pattern 2
pattern1_specimen0 = [0.01, 1.0, 1.05, 0.02, 0.03] pattern1_specimen1 = [0.02, 0.9, 1.02, 0.01, -0.02]

# specimens of Pattern 3 pattern2_specimen0 = [0, 1.0, 1.1, 0.2, 0] pattern2_specimen1 = [0, 0.9, 1.1, 0.1, 0]

The above code defines three lists called pattern0, pattern1 and pattern2. The list pattern0 is a list of lists; it consists of two lists: pattern0_specimen0 and pattern0_specimen1. The lists pattern1 and pattern2 have similar structure. There is also a list called test_specimen. You may find this setup rather abstract so we will use the real-life analogy that we discuss earlier to help you to understand it. A “specimen” in the code corresponds to a photo in the real-life analogy, and a “pattern” in the code corresponds to a person. The following table shows you how you can interpret the variables in the Python code using the real-life analogy.

Variables in the Python code

Real-life analogy

pattern0_specimen0

First photo of Angela

pattern0_specimen1

Second photo of Angela

pattern0 (which has 2 specimens)

Angela (who has 2 photos)

pattern1 (which has 2 specimens)

Beatrice (who has 2 photos)

pattern2 (which has 3 specimens)

Catherine (who has 3 photos)

test_specimen

The new photo

Note that the variables pattern0_specimen0, pattern0_specimen1 etc. are a list of numbers. You may wonder why we use a list of numbers to store a specimen. You may recall that we counted the number of heart beats in Week 4’s lecture using a list of voltage values. You can view that list of voltage values as a specimen. So, we are doing something similar here. As a remark, we want to point out that when photos, videos, music etc. are stored on the computers, their main component is a sequence of numbers.

Now, let us get back to the overall aim. In the real-life analogy, the aim is to determine the person in the new photo. In the pattern recognition algorithm, the aim is to determine the pattern that is the most similar to test_specimen. Instead of using the concept of the most similar, the algorithm uses the equivalent concept of the least dissimilar (or the least unlike, the least difference). In other words, the aim of the algorithm is to determine the pattern that is the least dissimilar to test_specimen. In order to achieve this, the algorithm compares test_specimen against each of the given specimens, and determines the level of dissimilarity between test_specimen and each of the given specimens. (In the real-life analogy, you would compare the new photo against each of the given photos.) We will now specify how the algorithm computes the level of dissimilarity between two specimens.

pattern2_specimen2 = [0, 1.1, 1.1, 0.1, 0]

# Assemble all the specimens from all the patterns into a list of lists pattern0 = [pattern0_specimen0,pattern0_specimen1]
pattern1 = [pattern1_specimen0,pattern1_specimen1]
pattern2 = [pattern2_specimen0,pattern2_specimen1,pattern2_specimen2] #

all_patterns = [pattern0, pattern1, pattern2]

# Specimen to be tested test_specimen = [0, 0.9, 0.8, 0, 0]

# threshold dissim_threshold = 0.2

# Call the function (which you will write in the assignment) to determine the matching patterns pattern_list = find_matching_patterns(test_specimen,all_patterns,dissim_threshold)

(Level of dissimilarity between two specimens) We will use an example to explain how the level of dissimilarity between 2 specimens is computed. In this example, the two specimens are the variables test_specimen and pattern1_specimen0. Both of these variables are a list of the same length. The steps are:

1. Compute the difference of the corresponding entries in the two lists. 2. Take the absolute value of the results in Step 1.
3. The level of dissimilarity is the sum of the absolute values in Step 2.

Let us work this through using the numbers in the variables test_specimen and pattern0_specimen0. From the code listing above,

test_specimen is [0, 0.9, 0.8, 0, 0] pattern1_specimen0 is [0.01, 1.0, 1.05, 0.02, 0.03]

The following table summarises the computation in Steps 1 and 2. The differences between the corresponding entries are in the third column and the absolute differences are in the fourth column.

Entries of test_specimen

The corresponding entry in pattern1_specimen0

Difference of the corresponding entries

Absolute value

0

0.01

0 – 0.01= -0.01

0.01

0.9

1.0

0.9 – 1.0 = -0.1

0.1

0.8

1.05

0.8 – 1.05 = -0.25

0.25

0

0.02

0 – 0.02 = -0.02

0.02

0

0.03

0 – 0.03 = -0.03

0.03

The level of dissimilarity is the sum of the absolute values shown in the fourth column. Therefore, the level of dissimilarity between test_specimen and pattern1_specimen0 is 0.01 + 0.1 + 0.25 + 0.02 + 0.03 = 0.41. If the level of dissimilarity between test_specimen and a specimen is small, then there is little dissimilarity between them and vice versa.

If you perform these calculations for all the 7 specimens from the 3 patterns, you should obtain the results in the second column in the table below.

Specimen

Level of dissimilarity between test_specimen and the specimen shown in the first column

Level of dissimilarity between test_specimen and a pattern

pattern0_specimen0

3.90

Level of dissimilarity between test_specimen and pattern0
= mean of

level of dissimilarity between test_specimen and pattern0_specimen0 and

level of dissimilarity between test_specimen and pattern0_specimen1
= (3.90 + 3.70) / 2

= 3.8

pattern0_specimen1

3.70

pattern1_specimen0

0.41

Level of dissimilarity between test_specimen and pattern1
= mean of

Now that we have calculated all the levels of dissimilarity between test_specimen and all the specimens, the next step is to calculate the level of dissimilarity between test_specimen and a pattern. This is best illustrated by examples and you can find them in the third column in the table above.

(Determining the least dissimilar patterns) Now that we have calculated the level of dissimilarity between test_specimen and each pattern, we are now ready to determine the patterns that best match test_specimen. This is the same as finding the patterns which have little dissimilarity to test_specimen.

We can see from the third column of the table above that test_specimen has the least dissimilarity with pattern1. So pattern1 is certainly a choice. Recall from the real-life analogy that it may be possible that the new photo is similar to those from more than one person, so the pattern recognition algorithm may return one or more answers. In order for the algorithm to return all the least dissimilar patterns, it will make use of the variable dissim_threshold in the Python code. The steps to determine all the least dissimilar patterns are:

  1. Assuming that you have calculated the levels of dissimilarity between test_specimen and all the patterns, you determine the smallest level of dissimilarity and let us denote this by smallest_dissimilarity.
  2. Compute the sum of smallest_dissimilarity and dissim_threshold, and we will denote the result by acceptable_dissimilarity_limit.
  3. If the level of dissimilarity between test_specimen and a pattern is less than acceptable_dissimilarity_limit, then the pattern is considered to be a least dissimilar pattern.

Let us continue with the example. The levels of dissimilarity for pattern0, pattern1 and pattern2 have been calculated to be, respectively, 3.8, 0.34 and 0.53. According to Step 1, we determine the smallest of these three numbers, which is 0.34. In the Python listing above, the variable dissim_threshold is 0.2, so the acceptable_dissimilarity_limit is 0.34 + 0.2 = 0.54; this completes Step 2. Finally, in Step 3, we determine the patterns whose level of dissimilarity is less than the acceptable_dissimilarity_limit, so both pattern1 and pattern2 are least dissimilar patterns. Note that the number of least dissimilar patterns depends on the variable dissim_threshold. If dissim_threshold is 0.1 instead, then pattern1 is the only least dissimilar pattern.

pattern1_specimen1

0.27

level of dissimilarity between test_specimen and pattern1_specimen0 and

level of dissimilarity between test_specimen and pattern1_specimen1
= (0.41 + 0.27) / 2

= 0.34

pattern2_specimen0

0.60

Level of dissimilarity between test_specimen and pattern2
= mean of

level of dissimilarity between test_specimen and pattern2_specimen0 and

level of dissimilarity between test_specimen and pattern2_specimen1 and

level of dissimilarity between test_specimen and pattern2_specimen2
= (0.60 + 0.40 + 0.60) / 3

= 0.53

pattern2_specimen1

0.40

pattern2_specimen2

0.60

Remark: You may ask why we do not consider the possibility that there are no matching patterns at all. After all, in the real-life analogy, one can use a photo of Doreen (who look very different from the other three) as the new test photo. We remark that we have not included this possibility to keep the project simple.

Assumptions on data and data checking

The Python code above shows a template on how data are organised. You can assume that

Each specimen (e.g. pattern0_specimen0, test_specimen) is a list. These lists are always non-empty, and their entries are either int or float.
Each pattern (e.g. pattern0, pattern1) is a list containing at least one specimen.
The variable all_patterns is a list containing all the patterns. You can assume that the variable all_patterns contains at least one pattern.

The variable dissim_threshold is a float and is always strictly positive, i.e. it cannot be zero.

In the Python code shown above, all the specimens are lists of the same length. However, in real-life data processing, sometimes the data are corrupted. In order to imitate the possibility of data corruption, we will also give you data sets where not all specimens have the same length. We will explain what you need to do for the data corruption scenario under implementation requirements.

Implementation requirements

You need to implement the following five functions. All these five functions working together will implement the the pattern recognition algorithm and deal with the data corruption.

The requirement is that you implement each function in a separate file. This is so that we can test them independently and we will explain this point more later on at here. We have provided template files, see the Getting Started.

1. def dissim_two_specimens(specimen0,specimen1):

The aim of this function is to compute the level of dissimilarity between two specimens. The algorithm is described earlier at here.
The function has 2 inputs. Each input is a list. Both lists should have the same length.
The function should return one output which is the level of dissimilarity between two lists. For example, following on the earlier example which computes the level of dissimilarity between test_specimen and pattern1_specimen0 (which is stored in all_patterns[1][0]), the function call dissim_two_specimens(test_specimen,all_patterns[1][0]) should return the value of 0.41.

This function can be tested using the file test_dissim_two_specimens.py. 2. def dissim_specimen_pattern(specimen,pattern):

The aim of this function is to compute the level of dissimilarity between a specimen and a pattern. The algorithm is described earlier at here where we explain how to compute the level of dissimilarity between test_specimen and pattern0, pattern1 and pattern2.
The function has 2 inputs. The first input is a list. The second input is a list of lists.
The function should return one output which is the level of dissimilarity between a specimen and a pattern.
For example, following on the earlier example which computes the level of dissimilarity between test_specimen and pattern2 (which is stored in all_patterns[2]), the function call dissim_specimen_pattern(test_specimen,all_patterns[2]) should return the value of 0.53.

This function can be tested using the file test_dissim_specimen_pattern.py.

3. def find_least_dissim_patterns(test_specimen,all_patterns,dissim_threshold):

The aim of this function is to determine the least dissimilar patterns. The algorithm is described earlier at here.
The function has 3 inputs. The names of the input in the def line has been chosen to match their roles in the example earlier.
The function should return one output which is a Python list containing the indices of the patterns (in the variable all_patterns) that are least dissimilar to the input test_specimen. We will explain this using two examples which are based on the algorithm description earlier at here.

The function call find_least_dissim_patterns(test_specimen,all_patterns,0.2) should return the list [1,2] (or [2,1]) because pattern1 and pattern2 are both least dissimilar patterns. Note that pattern1 has an index of 1 in the variable all_patterns.
The function call find_least_dissim_patterns(test_specimen,all_patterns,0.1) should return the list [1] because pattern1 is the only least dissimilar patterns

Note we do not require you to order the indices for the least dissimilar patterns in any way. For example, if the indices of the least dissimilar patterns are 1, 5 and 6, then any of the following lists is acceptable: [1,5,6], [1,6,5], [6,1,5], [6,5,1], [5,1,6] and [5,6,1].
This function can be tested using the file test_find_least_dissim_patterns.py.

4. def are_all_lengths_the_same(test_specimen,all_patterns):

The aim of this function is to determine whether there is data corruption
The function has 2 inputs. The names of the input in the def line has been chosen to match their roles in the example earlier.
The function should return one output whose data type is Boolean.

If the length of the list test_specimen and the lengths of all the specimens in all_patterns are the same, then the function call are_all_lengths_the_same(test_specimen,all_patterns) should return True; otherwise, it should return False.

This function can be tested using the file test_are_all_lengths_the_same.py.

5. def find_matching_patterns(test_specimen,all_patterns,dissim_threshold):

This function is the interface between the data, see the last line in the example Python code

here

The function has 3 inputs. The names of the input in the def line has been chosen to match their roles in the example earlier.
The function should return one output whose datatype can be either a list or a string depending on the situation

The expected steps within the function find_matching_patterns() are:
The function should first check whether there is data corruption.
If there is no data corruption, the function should determine the least dissimilar patterns and return the indexes of the least dissimilar patterns in a list. Using our earlier examples:

The function call find_matching_patterns(test_specimen,all_patterns,0.2) should return the list [1,2] or [2,1]
The function call find_matching_patterns(test_specimen,all_patterns,0.1) should

return the list [1]
If there is data corruption, then the function should not perform the process of determining the matching patterns. It should return the string ‘Corrupted data’.

Note we do not require you to order the indices for the least dissimilar patterns in any way. For the case where data are not corrupted, this function can be tested using the files test_find_matching_patterns.py and test_find_matching_patterns_alt.py. These two test files use two different data sets.

For the case where data are corrupted, this function can be tested using the file test_find_matching_patterns_corrupted.py. The data is this file are the same as those in the file test_are_all_lengths_the_same.py.

Additional requirements: In order to facilitate testing, you need to make sure that within each submitted file, you only have the code required for that function. Do not include test code in your submitted file.

Getting Started

  1. Download the zip file ass1_prelim.zip, and unzip it. This will create the directory (folder) named ‘ass1_prelim’.
  2. Rename/move the directory (folder) you just created named ‘ass1_prelim’ to ‘ass1’. The name is different to avoid possibly overwriting your work if you were to download the ‘ass1_prelim.zip’ file again later.
  3. First browse through all the files provided including the test files.
  4. (Incremental development) Do not try to implement too much at once, just one function at a time

    and test that it is working before moving on.

  5. Start implementing the first function, properly test it using the given testing file, and once you are

    happy, move on to the the second function, and so on.

  6. Please do not use ‘print’ or ‘input’ statements. We won’t be able to assess your program properly if

    you do. Remember, all the required values are part of the parameters, and your function needs to return the required answer. Do not ‘print’ your answers.

Testing

Test your functions thoroughly before submission.

Test your functions thoroughly before submission. You can use the provided Python programs (files like test_dissim_two_specimens.py etc.) to test your functions. Please note that each file covers only one test case. We have purposely not included all the cases because we want you to think about how you should be testing your code. You are welcome to use the forum to discuss additional tests that you should use to test your code.

We will test each of your files independently. Let us give you an example. Let us assume we are testing three files: prog_a.py, prog_b.py and prog_c.py. These files contain one function each and they are: prog_a(), prog_b() and prog_c(). Let us say prog_b() calls prog_a(); and prog_c() calls both prog_b() and prog_a(). We will test your files as follows:

We will first test your prog_a().
When we test your prog_b(), we will test your prog_b() together with our working version of prog_a(). In this way, if your prog_a() does not work for some reason, there is a chance that your prog_b() may work and you may still receive marks for prog_b().
When we test your prog_c(), we will test your prog_c() together with our working version of prog_a() and prog_b().

Submission

You need to submit the following five files. Do not submit any other files. For example, you do not need to submit your modified test files.

dissim_two_specimens.py dissim_specimen_pattern.py find_least_dissim_patterns.py are_all_lengths_the_same.py find_matching_patterns.py

To submit this assignment, go to the Assignment 1 page and click the tab named “Make Submission”.

Assessment Criteria

We will test your program thoroughly and objectively. This assignment will be marked out of 25 where 20 marks are for correctness and 5 marks are for style.

Correctness

The 20 marks for correctness are awarded according to these criteria.

Criteria

Nominal marks

Function dissim_two_specimens.py

4

Function dissim_specimen_pattern.py

4

Function find_least_dissim_patterns.py

5

Function are_all_lengths_the_same.py

5

Function find_matching_patterns.py (The no data corruption case)

1

Function find_matching_patterns.py (The data corruption case)

1

Style

Five (5) marks are awarded by your tutor for style and complexity of your solution. The style assessment includes the following, in no particular order:

Use of meaningful variable names where applicable
Use of sensible comments to explain what you’re doing
Use of docstring for documentation to identify purpose, author, date , data dictionary, parameters, return value(s) and program description at the top of the file

Assignment Originality

You are reminded that work submitted for assessment must be your own. It’s OK to discuss approaches to solutions with other students, and to get help from tutors and consultants, but you must write the Python code yourself. Sophisticated software is used to identify submissions that are unreasonably similar, and marks will be reduced or removed in such cases.

Further Information

We will run Help Sessions for this assignment during Weeks 6-8. These are face-to-face consultation in the lab on a first-come-first-serve basis. The timetable for the Help Sessions can be

found on the course website.

Use the forum to ask general questions about the assignment, but take specific ones to Help Sessions.
Keep an eye on the course webpage notice board for updates and responses.

Reference:

Sudarshan et al. Automated diagnosis of congestive heart failure using dual tree complex wavelet

transform and statistical features extracted from 2s of ECG signals. Computers in Biology and Medicine (2017)