COMP 204 – Assignment 1 Yue Li
Important instructions:
• Release date: January 17, 2020 at 12:00 AM
• Due date: February 1, 2020 at 23:59
• Submit each of your Python programs on MyCourses. Do not zip multiple files together. Submit each file separately
• Write your name and student ID at the top of each program
• For each question, start off from the Python file given to you. Do not change its name.
• Do not use any modules or any pieces of code you did not write yourself
• For each question, your program will be automatically tested on a variety of test cases, which will account for 75% of the mark. To be considered correct, your program should print out exactly and only what the question specifies. Do not add extra text, extra spaces or extra punctuation. Do not ask the user to enter any other information than what is needed.
• For each question, 25% of the mark will be assigned by the TA based on (i) your appro- priate naming of variables; (ii) commenting of your program; (iii) simplicity of your program (simpler = better).
1
1 Decision tree risk assessment for Deficiency of Adenosine Deaminase 2 (DADA2) (20 points)
Required knowledge: User prompt, conditionals and nested conditionals. Download the q1 dada2DecisionTree.py Python program on MyCourses.
Livedoid skin rash
Neurological disorder
No
>= 38
ADA2 pathogenic mutations
Yes
CRP (mg/dL)
Body temp (°C)
No
Yes
< 38
>= 38
low-risk Body temp (°C)
< 38
< 5
low-risk
>= 5
CRP (mg/dL)
ADA2 pathogenic mutations
ADA2 pathogenic mutations
< 5
low-risk
low-risk
>= 5
012 012 012
medium-risk high-risk low-risk medium-risk high-risk low-risk medium-risk high-risk
ADA2 pathogenic mutations
low-risk
012
medium-risk high-risk
Figure 1: Decision tree for genetic diagnosis of DADA2. The schematics is adapted from Figure 3 presented by Rama et al., (2018) [1].
Question
Livedoid skin rash:
Body temp:
Neurological disorder:
CRP (mg/dL):
ADA2 pathogenic mutations:
Valid answers Yes or No numeric float Yes or No numeric float 0, 1, or 2
Table 1: Valid questions and answer from the program
Deficiency of adenosine deaminase 2 (DADA2) is a recently discovered rare autosomal re- cessive systemic autoinflammatory disorder [2]. It is due to mutations in the ADA2 gene, which encodes the adenosine deaminase 2 protein. The disease onset usually occurs prior to 24 years of age. The clinical spectrum ranges from single cutaneous lesions to severe systemic inflam- matory disease with cerebral complications.
To help diagnosing DADA2, we will implement a decision tree (Figure 1) adapted from [1]. Based on this decision tree, write a Python program that asks a series of questions to the user
2
to determine their level of risk. Your program should only ask the questions that are necessary. Specifically, use exactly the text of the questions and only the user answers listed in Table 1 are valid answers:
If the answer provided by the user is invalid (see below), the program should print invalid, stop asking further questions, and print nothing else. At the end, your program should print either low-risk, medium-risk, or high-risk, based on the user’s answers, or invalid.
Below are a few examples of the way your program should behave. The text in blue are the user’s answers.
Livedoid skin rash (Yes or No)? No
Neurological disorder (Yes or No)? No
low-risk
Livedoid skin rash (Yes or No)? No
Neurological disorder (Yes or No)? Yes
Body temp: 39
ADA2 pathogenic mutations? 2
high-risk
Livedoid skin rash? Yes
Body temp: 37.5
CRP (mg/dL)? 5
ADA2 pathogenic mutations: 0
low-risk
Livedoid skin rash: Maybe
invalid
Livedoid skin rash: Yes
Body temp: 37
CRP (mg/dL): high
invalid
Livedoid skin rash: Yes
Body temp: high
invalid
Livedoid skin rash? Yes
Body temp: 38.5.1
invalid
3
2 Palindromic sequence (20 points)
Required knowledge: conditionals, loops, strings
Download the q2 palindrome.py Python program on MyCourses.
A palindromic sequence is a nucleic acid sequence on double-stranded DNA or RNA wherein reading 5’ (five-prime) to 3’ (three prime) forward on one strand matches the sequence read- ing 5’ to 3’ on the complementary strand with which it forms a double helix. For example, the DNA sequence ACCTAGGT is palindromic because its nucleotide-by-nucleotide complement is TGGATCCA, and reversing the order of the nucleotides in the complement gives the original sequence. Palindromic sequences play an important role in molecular biology. Many restriction endonucleases (restriction enzymes) recognize specific palindromic sequences and cut them. Learn more from Wikipedia (https://en.wikipedia.org/wiki/Palindromic_sequence). Write a program that checks whether a user-input DNA sequence is a palindromic sequence.
For example, if user inputs GAATTC, then your program should first convert GAATTC to the complementary sequence CTTAAG. It will then reverse the sequence to GAATTC. Finally, it will compare GAATTC with the original sequence to see whether the two sequences are identical. It will then prints to the screen “GAATTC is a palindromic sequence”. If user inputs AGCGAT, your program should output “AGCGAT is not a palindromic sequence”. If user inputs a sequence containing characters other than A, C, T, and G, your program should output “Invalid sequence” and nothing else.
Here are the above examples shown during the runtime of the program (user input is coloured in blue):
Enter a DNA sequence: GAATTC
GAATTC is a palindromic sequence
Enter a DNA sequence: AGCGAT
AGCGAT is not a palindromic sequence
Enter a DNA sequence: A*CGAUGXYZ
Invalid sequence
4
3 Counting CpG sites (20 points)
Required knowledge: Loops, Strings, Conditionals. Download the q3 CpG.py Python program on MyCourses.
The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5’ → 3’ direction. Cytosines in CpG dinucleotides can be methylated to form 5-methylcytosine. Enzymes that add a methyl group are called DNA methyltransferases. In mammals, 70% to 80% of CpG cytosines are methylated. Methylating the cytosine within a gene can change its expression, a mechanism that is part of a larger field of science studying gene regulation that is called epigenetics.
Write a program that outputs the positions of all of the CpG site harboured within a given sequence. We will use zero-based system with the first nucleotide position indexes at 0. If the DNA sequence contains no CpG, your program should output “No CpG site is detected in the input sequence”. If the user-input sequence contains letters other than A, C, G, and T, your program should print “Invalid sequence” and nothing else.
For example, your program should behave as follows (user inputs are coloured in blue):
Enter a DNA sequence: ACGTGCGCT
CpG site is detected at position 1 of the sequence
CpG site is detected at position 5 of the sequence
Enter a DNA sequence: AGCTGGCCT
No CpG site is detected in the input sequence.
Enter a DNA sequence: A*CGAUGXYZ
Invalid sequence
5
4 Turing Machine Part 1 (20 points)
Required knowledge: Loops, Strings, Conditionals.
Download the q4 turing machine part1.py Python program on MyCourses.
start
A
0/P,L 0/P,R
C
1/P,R
B
H
1/P,L
0/P,L 1/P,R
Figure 2: The 3-state Busy Beaver Turing machine
Write a Python program to simulate the 3-state Busy Beaver Turing Machine (Figure 2) we covered in Lecture 2. The Turing Machine (TM) will start with an initial tape of all zeros. The tape has infinite length. The head of the TM is centred at the tape. TM will print, shift tape, and transition from one state to another state as defined in Figure 2. The first 3 states are shown in Figure 3.
Upon successfully running it, your program should print the states visited by the TM to the IPython Console and nothing else:
1 current state: A
2 current state: B
3 current state: A
4 current state: C
5 current state: B
6 current state: A
7 current state: B
8 current state: B
9 current state: B
10 current state: B
11 current state: B
12 current state: A
13 current state: C
14 current state: H
6
5 Turing Machine Part 2 (20 points)
Required knowledge: Loops, Strings, Conditionals.
Download the q5 turing machine part2.py Python program on MyCourses.
Recall that the tape is infinitely long and initially set to zeros at every position of the tape. Therefore, we are only viewing at a small fixed size window of the tape centering at the head of the TM. Let’s assume the window size is 21-letter. Shifting the tape to the left or to the right changes our view of the tape via the lens of the 21-letter window.
Improve the output of the TM by printing the head centered at the middle of the 21-letter window as shown below. Here _0_ means that the head of the TM scans a 0 on the tape, and _1_ means that the head of the TM scans a 1 on the tape.
You may reuse the code you have written for turing machine part1.py to complete this question of the assignment.
1 current state: A; shift: ; current tape: 0000000000_0_0000000000
2 current state: B; shift: R; current tape: 0000000000_0_1000000000
3 current state: A; shift: L; current tape: 0000000001_1_0000000000
4 current state: C; shift: L; current tape: 0000000011_0_0000000000
5 current state: B; shift: L; current tape: 0000000111_0_0000000000
6 current state: A; shift: L; current tape: 0000001111_0_0000000000
7 current state: B; shift: R; current tape: 0000000111_1_1000000000
8 current state: B; shift: R; current tape: 0000000011_1_1100000000
9 current state: B; shift: R; current tape: 0000000001_1_1110000000
10 current state: B; shift: R; current tape: 0000000000_1_1111000000
11 current state: B; shift: R; current tape: 0000000000_0_1111100000
12 current state: A; shift: L; current tape: 0000000001_1_1111000000
13 current state: C; shift: L; current tape: 0000000011_1_1110000000
14 current state: H; shift: R; current tape: 0000000001_1_1111000000
7
start
A
0/P,L 0/P,R
C
0/P,L
1/P,R
B 0/P,L
head
0
0
0
0
0
0
0
0
0
0
0
1/P,L
… …
1/P,R
1/P,R
H
0/P,R AB
0
0
0
0
0
1
0
0
0
0
0
1/P,L
0/P,L 1/P,R
C
0/P,L
H 1/P,R
0/P,R AB
0
0
0
0
0
1
0
0
0
0
0
1/P,L
0/P,L 1/P,R
C
0/P,L
H
1/P,R
0/P,R AB
0
0
0
0
0
0
1
0
0
0
0
1/P,L
0/P,L
C
0/P,L 1/P,R
H 1/P,R
0/P,R
AB
0
0
0
0
0
0
1
0
0
0
0
1/P,L
0/P,L 1/P,R
C
0/P,L
H 1/P,R
0/P,R AB
0
0
0
0
0
1
1
0
0
0
0
1/P,L
0/P,L 1/P,R
C
0/P,L
H 1/P,R
0/P,R AB
0
0
0
0
0
1
1
0
0
0
0
1/P,L
0/P,L 1/P,R
C
0/P,L
H
1/P,R
0/P,R AB
0
0
0
0
1
1
0
0
0
0
0
1/P,L
0/P,L 1/P,R
C
0/P,L
H 1/P,R
0/P,R
AB
0
0
0
0
1
1
0
0
0
0
0
1/P,L
0/P,L 1/P,R
C
0/P,L
H 1/P,R
0/P,R AB
0
0
0
0
1
1
0
0
0
0
0
1/P,L
0/P,L 1/P,R
C
0/P,L
H 1/P,R
0/P,R AB
0
0
0
0
1
1
0
0
0
0
0
1/P,L
0/P,L 1/P,R
C
0/P,L
H
1/P,R
0/P,R AB
0
0
0
1
1
0
0
0
0
0
0
1/P,L
0/P,L 1/P,R
C
H
Figure 3: Demonstration of the 3-state TM. See Lecture 2 for more details.
8
References
[1] Me ́lanie Rama, Claire Duflos, Isabelle Melki, Didier Bessis, Axelle Bonhomme, He ́le`ne Mar- tin, Diane Doummar, Ste ́phanie Valence, Diana Rodriguez, Emilie Carme, David Genevieve, Ketil Heimdal, Antonella Insalaco, Nathalie Franck, Viviane Queyrel-Moranne, Nathalie Tieulie, Jonathan London, Florence Uettwiller, Sophie Georgin-Lavialle, Alexandre Belot, Is- abelle Kone ́-Paut, Ve ́ronique Hentgen, Guilaine Boursier, Isabelle Touitou, and Guillaume Sarrabay. A decision tree for the genetic diagnosis of deficiency of adenosine deaminase 2 (DADA2): a French reference centres experience. European Journal of Human Genetics, 26(7):960–971, 2018.
[2] Qing Zhou, Dan Yang, Amanda K Ombrello, Andrey V Zavialov, Camilo Toro, Anton V Za- vialov, Deborah L Stone, Jae Jin Chae, Sergio D Rosenzweig, Kevin Bishop, et al. Early- onset stroke and vasculopathy associated with mutations in ada2. New England Journal of Medicine, 370(10):911–920, 2014.
9