Exercise sheet 2 (Python)
Submition deadline: 09.01.2019, by 10:30 am.
Submit to: cklos@uni-bonn.de
How to submit:
• Change the notebook name to include your last name, e.g. “Exercise sheet 2 (Python) Memmesheimer”
• Download your completed notebook (File -> Download as -> Notebook (.ipynb))
• Send it as an email attachment, set email subjuect to “Exercise sheet 2 (Python)”
Exercise 1
Write a function called is_valid_DNA() that returns True if a DNA is valid (for this exercise, a DNA is considered valid if it only contains the uppercase characters ‘A’, ‘T’, ‘C’ and ‘G’ and no others) or False otherwise.
Test your function on these two sequences:
• AGTCGTACTAGA
• GACGTACCGXAGGTAA
(2 points)
Exercise 2
Write a function, call it relam, to calculate the content (relative amount) of a particular amino acid in the protein: 𝑟𝑒𝑙𝑎𝑚(protein, amino acid)=
(number of amino acid in protein) / (total number of amino acids in protein).
(1 point)
Test your function on the protein below and find relative ampount of ‘V’:
PCWVPTISNVERANFSNGTSCSYAHEDCKPQPTIGACYYCCWFLIHEIQVMHAVDCNEADNGLRPWWRWANDNEVPEKARYKHPAVWERSIIMIHWGLWE
(1 point)
You are given two protein sequences:
• PCWVPTISNVERANFSNGTSCSYAHEDCKPQPTIGACYYCCWFLIHEIQVMHAVDCNEADNGLRPWWRWANDNEVPEKARYKHPAVWERSIIMIHWGLWE
• LMLITGPEAEAWNRVISQTHCQQSNIFNVGFCLLCSGNFPTWFQSQSWSLMGRDLIDHFNMTKSNMDYFKKMSQTAVAEQENYYEECDST
Calculate the difference in content of amino acid ‘N’ between two proteins: |𝑟𝑒𝑙𝑎𝑚(protein 1, amino acid)−𝑟𝑒𝑙𝑎𝑚(protein 2, amino acid)|
, vertical bars denote absolute value. You can use abs() function in Python to calculate absolute value.
(1 point)
#Answer
Write a function to calculate the total difference in content of all amino acids, call it as_diff.
as_diff(protein 1, protein 2)=∑20𝑖=1(|𝑟𝑒𝑙𝑎𝑚(protein 1, aminoacid𝑖)−𝑟𝑒𝑙𝑎𝑚(protein 2, amino acid𝑖)|)
(2 points)
# You may find the following commented out code useful
# to initialize a list all unique amino acids that are present in our proteins
#unique_acids = [‘A’,’R’,’N’,’D’,’C’,’Q’,’E’,’G’,’H’,’I’,’L’,’K’,’M’,’F’,’P’,’S’,’T’,’W’,’Y’,’V’]
#Answer
Using the as_diff function you have written, determine which two of three proteins given below are most similar:
• PCWVPTISNVERANFSNGTSCSYAHEDCKPQPTIGACYYCCWFLIHEIQVMHAVDCNEADNGLRPWWRWANDNEVPEKARYKHPAVWERSIIMIHWGLWE
• SDRQSFPTRFDTVTIEARVVIDWSPTLWPKYTHSSSGYTIKEMIAKKGDIPEGQVDHKKVQNSWTSTPQFHPMAHC
• LMLITGPEAEAWNRVISQTHCQQSNIFNVGFCLLCSGNFPTWFQSQSWSLMGRDLIDHFNMTKSNMDYFKKMSQTAVAEQENYYEECDST
(2 point)
#Answer
Exercise 3
One of the useful skills for programming (and for research) is being able to figure things out by reading and searching on the internet! Read this page of the Python for biologists tutorial:
https://pythonforbiologists.com/working-with-files/
Figure out how to write FASTA files, and then do two sub-exercises given below:
Write the DNA sequence given below using Python to a file named oneseq.fasta file in the FASTA format. See the above tutorial page, or https://en.wikipedia.org/wiki/FASTA_format for the FASTA format.
Sequence header: ABC123
Sequence: ATCGTACGATCGATCGATCGCTAGACGTATCG
Open the file using Python, print its contents to confirm that you have written it correctly.
(4 points)
#Answer
file = open(“example.fasta”)
file.read()
print()
file.close()
—————————————————————————
FileNotFoundError Traceback (most recent call last)
1 #Answer
—-> 2 file = open(“example.fasta”)
3 file.read()
4 print()
5 file.close()
FileNotFoundError: [Errno 2] No such file or directory: ‘example.fasta’
Write the below sequences using Python into a single file called threeseq.fasta in the FASTA format.
Always convert to uppercase, and for this exercise, remove any ‘-‘ (dashes) in the sequence.
Sequence header
DNA sequence
ABC123
ATCGTACGATCGATCGATCGCTAGACGTATCG
DEF456
actgatcgacgatcgatcgatcacgact
HIJ789
ACTGAC-ACTGT–ACTGTA—-CATGTG
Open the file using Python, print its contents to confirm that you have written it correctly.
(2 points)
#Answer
Exercise 4
Read in the DNA sequence from the example.fasta file (the reading direction is from left to right). Search for start codon ‘ATG’. Search for any possible of the end codons ‘TAA’, ‘TAG’, ‘TGA’, (left to right) in steps of three starting from the start codon. Store this open reading frame in a list (including start codon and excluding end codon) and print out the list. Search for the next start codon, and print another reading frame. Continue on until you find all reading frames. (4 points)
#Answer
Find complement (you can and should reuse previously written functions). For complement the reading direction is reverse. Find all open reading frames of the complement. (1 point)
#Answer