1213-1 COMPSCI 369 (28/06/2021 17:30) Computational Biology (Exam)
By submitting this assessment, I agree to the following declaration:
As a member of the University’s student body, I will complete this assessment in a fair, honest, responsible and trustworthy manner. This means that:
· I will not seek out any unauthorised help in completing this assessment. (NB. Unauthorised help includes a tutorial or answer service whether in person or online, friends or family, etc.)
Copyright By PowCoder代写 加微信 powcoder
· I will not discuss or share the content of the assessment with anyone else in any form, including but not limited to, Canvas, Piazza, Chegg, Facebook, Twitter, Discord or any other social media within the assessment period.
· I will not reproduce and/or share the content of this assessment in any domain or in any form where it may be accessed by a third party.
· I am aware the University of Auckland may use Turnitin or any other plagiarism detecting methods to check my content.
· I declare that this assessment is my own work, except where acknowledged appropriately (e.g., use of referencing).
· I declare that this work has not been submitted for academic credit in another University of Auckland course, or elsewhere.
Note: Include as applicable e.g., STEM subjects (Science, Technology, Engineering and Mathematics)
• I declare that I generated the calculations and data in this assessment independently, using only the tools and resources defined for use in this assessment.
• I declare that I composed the writing and/or translations in this assessment independently, using only the tools and resources defined for use in this assessment.
I understand the University expects all students to complete coursework with integrity and honesty. I promise to complete all online assessment with the same academic integrity standards and values.
Any identified form of poor academic practice or academic misconduct will be followed up and may result in disciplinary action.
1213-1 COMPSCI 369 (28/06/2021 17:30) Computational Biology (Exam)
Download the full exam from here
Answer all questions in a single document and upload it as a PDF file to Inspera. The following file types are allowed: .pdf Maximum file size is 1 GB
Select file to upload
Maximum marks: 120
Question 1
THE UNIVERSITY OF AUCKLAND
FIRST SEMESTER, 2021 Campus: Central City
COMPUTER SCIENCE
Computational Biology
(Time allowed: 3 hours + 30 minutes additional time) (Allowable Materials: Open Book)
This assessment has 10 questions, and it is worth 120 marks in total.
Answer all questions in a single document and upload it as a PDF file to Inspera.
You can do the whole exam in hand-writing, or you can embed scanned hand-written parts in an electronic text document. Write as clearly as possible, and clearly indicate the question you are answering. Cross out notes and other work you do not wish to be assessed.
The work must be your own. You are not allowed to copy text or code or solutions. Use your own words when answering the questions.
You are given additional 30 minutes for file preparation and submission.
Please don’t leave it to the last minute to submit your exam answers. It is your re- sponsibility to ensure your exam file is successfully submitted on time, and that the submitted file is your exam file. Please download the submission to verify this.
COMPSCI 369
QUESTION SHEET 2 COMPSCI 369
Section A: Integrity Statement
By completing this assessment, I agree to the following declaration:
I understand the University expects all students to complete coursework with integrity and honesty. I promise to complete all online assessment with the same academic integrity standards and values. Any identified form of poor academic practice or academic misconduct will be followed up and may result in disciplinary action.
As a member of the University’s student body, I will complete this assessment in a fair, honest, responsi- ble and trustworthy manner. This means that:
• I declare that this assessment is my own work.
• I will not seek out any unauthorised help in completing this assessment.
• I am aware the University of Auckland may use plagiarism detection tools to check my content.
• I will not discuss the content of the assessment with anyone else in any form, including Canvas, Piazza, Facebook, Twitter or any other social media or online platform within the assessment period.
• I will not reproduce the content of this assessment anywhere in any form at any time.
• I declare that I generated the calculations and data in this assessment independently, using only the
tools and resources defined for use in this assessment.
• I will not share or distribute any tools or resources I developed for completing this assessment.
QUESTION SHEET 3 COMPSCI 369
Section B: Computational Biology and Numerical Integration
1. Your research assistant generated the bifurcation plot below but excluded some important details. Aside from the plot you know that:
i) The difference equation of the system being studied is: xt+1 = exp(−ax2t ) + b ii) In this plot, the parameter a = 4.9
iii) When b = 0, there is a periodic attractor.
Based on the plot and the information you have, answer the following questions.
(a) What would be an appropriate label for the horizontal axis?
(b) What would be an appropriate label for the vertical axis?
[1 mark] [1 mark]
(c) Why is it impossible for someone who only has access to this plot to identify an exact set of
parameter values where the system’s dynamics are certain to be chaotic? [2 marks]
(d) Given the information provided above, describe an approximate region in parameter space
where this system has a periodic attractor that involves 4 different states. [1 mark]
(e) Describe the technique presented in lectures for creating a diagram like this one. You may
use pseudo-code to do so, but are not required to. [10 marks]
(f) The technique you have just described has two important parameters that must be tuned to make the bifurcation diagram legible and accurate. Identify these parameters and explain why the value selected for each should not be too high nor too low. [7 marks]
QUESTION SHEET 4 COMPSCI 369
Section C: Sequence Alignments and Assembly
2. You are given a function global(A, B, match, mismatch, gap) that solves the global alignment problem. It returns the optimal score and the optimal alignment when given match/mismatch scores, a linear gap penalty, and a pair of sequences.
(a) Canyouusetheglobalfunctiontoaligntwosequencesinsuchawaythatthealignmenthas only matches and gaps? If so explain how, otherwise suggest a modification of global(A, B, match, mismatch, gap)thatwouldenablethis.Inthelattercase,youdonothave to use pseudo-code; just explain your idea in one sentence. [2 mark]
(b) You want to calculate the optimal overlap alignment of two sequences of equal length. Can this problem be solved using the global function, while not modifying global? Explain your answer. You may use pseudo-code to do so, but are not required to. [6 marks]
3. Theglobalalignmentfunctionglobal(A, B, match=2, mismatch=-3, gap =-1)pro- duced the optimal path traced here:
− B1 B2 B3 B4 B5 −X
Answer the following questions and explain your answers.
[2 marks] optimal score. If not, explain why not. [3 marks]
(a) What will the optimal alignment look like?
(b) Is it possible with the given information to calculate the optimal score? If so, write down the
QUESTION SHEET 5
COMPSCI 369
4. You are given the following profile matrix: A 14 0 14 0
G 12 0 14 0 T0111
– 14 0 0 0
(a) Write down a multiple sequence alignment that corresponds to this profile matrix. [2 marks]
(b) How many different alignments are there that correspond to this profile matrix? Are all multiple sequence alignments corresponding to this profile the same size (in terms of number of sequences and alignment length)? Explain your answers. [2 marks]
5. You are given a set of error-free reads S = {AGC, CGC, CTT, GCG, GCT} and you want to reconstruct the DNA sequence X from which the reads originated. Show step-by-step how you will solve this sequence reconstruction problem using the following two approaches:
(a) For the following, use the greedy algorithm described in lectures.
i. Write down the reconstructed sequence and explain how you obtained it. That is, explain the order in which the reads were considered and why you selected that order. [4 marks]
ii. Is the obtained sequence unique? If it is not unique, write down an alternative solution as an ordered sequence of reads. [2 mark]
(b) For the following, use the approach based on the Hamiltonian path. Work with the 2-mer
composition of the provided reads.
i. Show the graph and explain how you constructed it. [3 marks]
ii. Explain how you read the sequence from the graph. Provide the reconstructed sequence, if it exists, and explain if the reconstructed sequence is unique. [3 marks]
QUESTION SHEET 6 COMPSCI 369
Section D: Simulation, HMMs and Trees
What is the distribution of the random variable that the randomVar pseudo-code below generates given that randomUniform produces a uniformly distributed random variable between 0 and 1? Explain your answer.
(b) GivenamethodPoissPro1(t)thatreturnstheeventtimesofaPoissonprocesswithfixed rate 1 over an arbitrary time period, t, and randomUniform as in part (a), write a pseudo- codemethodPoissPro(rate, t)thatreturnstheeventtimesofaPoissonprocesswith arbitrary rate over an arbitrary time period, t. Describe one experiment you could run to check the accuracy of your implementation. [4 marks]
In the genomes of some phages, the coding regions are richer in purines (bases A and G) than pyrimidines (bases C and T). An HMM with states Y for pyrimindine and U for purine correspond to coding/non-coding regions in these sequences. In the model, the probability that state U is followed by state Y is 3%. The probability that state Y is followed by state U is 1%. The sequence starts in state U 80% of the time.
Each state can emit any of the 4 bases A, C, G, or T. Emission probabilities are given by
ACGT U 0.35 0.1 0.45 0.1 Y 0.2 0.25 0.2 0.35
(a) Sketch a diagram of the HMM, showing all states, possible transitions and transition proba- bilities. Include the begin state but no end state. [3 marks]
(b) What is the joint probability P(x,π) of the state sequence π = UUUY and the symbol sequence x = ACCC? Leave your answer as a product or sum of numbers. [3 marks]
(c) Suppose you have multiple sequences of the same length x1 , . . . , xn . How could you use P(xj) for j = 1,…,n to decide whether or not the sequences came from the model de- scribed here? [3 marks]
(d) Themodelwastrainedusingobservedsequences.Describehowyoucouldtrainthealgorithm in the case where you knew where coding and non-coding regions were to start with, and in the case you didn’t have this information. What are the strengths and weaknesses of the two training approaches? [7 marks]
randomVar(rate)
x = – log(randomUniform())/rate
while x < 1
var = var + 1
x = x - log(randomUniform())/rate
return var
QUESTION SHEET 7 COMPSCI 369
(e) Completetheentriesa,b,andcintheViterbimatrixbelowanddrawinthetracebackpointers. Remember to show your working.
8. Let the symmetric matrix
0GC 0100 U0ac
0 11 0
D=1340
21 12 10 0
specify the pairwise distances, Dij , between the four sequences x1 , . . . , x4 . (a) Construct a UPGMA tree from D showing your working.
(b) Perform the first step in the neighbour-joining algorithm on D (up to where the first pair of neighbours i, j are joined and the distances of i and j to the new node are calculated). [4 marks]
(c) PGMA or neighbour-joining (or both or neither) reconstruct the correct tree in this
case? Explain your answer.
9. Consider the four aligned sequences, W,X,Y, and Z:
(a) Explain parsimony informative means, and identify the parsimony informative sites in the alignment. [3 marks]
(b) By calculating the parsimony score for each possible tree topology for these four taxa, find the maximum parsimony tree. [4 marks]
(c) Add two more columns to this alignment so that the new alignment has a parsimony tree which is topologically different to the parsimony tree for the original alignment. [3 marks]
(d) What typically prevents us from being able to find the maximum parsimony tree and why? [2 marks]
(e) Under what circumstances might we opt to use a parsimony method over a likelihood based method? [2 marks]
QUESTION SHEET 8 COMPSCI 369
Describe a heuristic which can be used to try to find the maximum likelihood tree. Will this heuristic typically find the tree that maximises the likelihood? Explain your answer. [4 marks]
(b) Given mutation rate parameter μ and normalised rate matrix Q, how do you calculate the probability that an A mutates to a C along a lineage of length t = 2? (Recall we denote, for example, the (A, A)th entry of a matrix B by BAA .) [3 marks]
(c) Let X and Y be sequences of length L. How can you use the calculation in part (b) to calculate the probability that X mutates into Y over a lineage of length t = 2? Explain any assumptions you are making. [3 marks]
(d) Explain why it is not always possible to estimate the height of a tree and the mutation rate at the same time. What data can be used to overcome this problem? [4 marks]
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com