Name:
I followed the University’s Honor Code.
Student ID:
CMSC 423 Fall 2021 : Midterm 2
Solve each of the following problems. There are 5 problems and 100 total points. You may use NO notes,
calculators, books, phones etc.. If you need additional space, use the back of the exam pages (and note
that you’ve written there). If you need still more space, use the blank paper provided. If you think a
problem cannot be solved as stated, explain why. For the algorithm design questions, you should
describe your algorithm either in clear and unambiguous prose, or using clearly-written pseudocode. Please,
write as neatly as you can. Partial credit is granted, but we cannot grade what we cannot interpret. Good
luck!
Problem 1. (30 points)
i. True or False (circle the correct one) : For a pair of strings X, Y, global pairwise sequence alignment is
asymptotically more e�cient than local pairwise sequence alignment because you need not determine
where the alignment must start or end.
ii. Given a pair of strings X of length n and Y of length m, the asymptotic time complexity for finding the
optimal global alignment between X and Y under a a�ne gap penalty function is O( ).
iii. Given the fm-index over a string T of length n, determining if pattern P of length m exists in the text
requires O( ) time.
iv. Given strings X and Y of length |X| and |Y | respectively, finding the optimal global alignment requires
at most what amount of space?
(a) O(|X| |Y |)
(b) O(|X| log |Y |)
(c) O(|X|+ |Y |)
(d) O(|X|� |Y |)
v. Describe (in one sentence) a reason for the prevalence of reads multi-mapping among transcripts (i.e.
reads aligning equally well to multiple transcripts) in an RNA-seq experiment.
1
0
nm
:
Alternative splicing of isoforms leads to
shared
sequence .
( Also
, gene families share extensive sequence)
vi. True or False (circle the correct one) : The asymptotic time complexity for building the FM-index is
larger than that of building the su�x array.
vii. Given a pair of strings X, Y , both of length n, the asymptotic time complexity for finding the optimal
global alignment between X and Y under a general gap penalty function is O( ).
viii. Given the observations x = {H,H, T,H,H, T,H, T, T,H} for a series of 10 flips of a coin. Given that
the outcomes follow a Bernoulli distribution with parameter ✓ (the probabiliy of observing heads on a
random flip of the coin), the maximum likelihood estimator for ✓ is ✓̂ = .
ix. True or False (circle the correct one) : The Sanko↵ algorithm to solve the small phylogeny problem
can only be applied when the character state transition cost is symmetric (i.e. when the cost of c ! c0
= the cost of c0 ! c) along a tree edge.
x. True or False (circle the correct one) : Given a likelihood function L(X; ✓), the value of ✓ that maximizes
logL(X; ✓) also maximizes L(X; ✓).
xi. True or False (circle the correct one) : The splits S1 = {A,B,C | D,E, F} and S2 = {A,D,C | B,E, F}
are incompatible.
xii. Write down a pair S1 and S2 of (non-identical) compatible splits over the set of taxa {A,B,C,D,E, F}:
S1 =
S2 =
xiii. The asymptotic time complexity for computing the “fitting” alignment of string X of length n against
string Y of length m is O( ).
xiv. Given a weighted graph G with positive edge weights, the problem of finding the longest simple path
(the longest path that does not have any repeated vertices) in the graph is NP-hard. Given a weighted
directed acyclic graph D = (V,E) with positive edge weights, the problem of finding the longest simple
path in the graph can be solved in O( ) time.
xv. We discussed two distinct approaches to solving the small phylogeny problem, one approach is maxi-
mum likelihood, and the other approach is .
2
0
m3
640
:
0
{ A ,B
,
c l D
,
E
,
F} Note : there are many correct
{ A ,B / C , D ,E ,F}
answers
. Just need that one of
Suh Sze
,
S
, pnszc , 5 , , nszr is,pMSzR
is 0
.
nm
Vt F-
maximum parsimony
Problem 2. (15 points) NOTE: This problem has 2 parts
i. Consider the string S = “wanafanta“. What is BWT(S) (i.e., the last column of the BW matrix)? Show
your work . reminder: don’t forget to add the sentinel $ to S.
ii. You are given the BWT of a string T such that BWT(T ) = “G$TCTAAAGATGG”. What was the
original string, T (show your work)?
3
7.5 points
atnwfaaanls
If not correct
,
Yz credit for drawing out
the rows cyclic permutations , 74 credit for
correctly sorting them into the DWM .
7.5 points
ACAGGTAGTTAG $
If not correct
,
42 credit for drawing out the BWM
Problem 3. (15 points) NOTE: This problem has 3 parts
Consider the strings x = “ALIGNMENT” and y = “CONFINEMENT”. Below is a partial matrix for the
global alignment between x and y, where we are attempting to maximize a given score function.
A
B
C
D
E
i. Knowing that the matrix was created as part of the algorithm for global alignment with linear gap
scores, what are the gap, match, and mismatch scores?
4
no
partial
credit gap
=
-g
Frein
match = 2
mismatch = -1
ii. Fill in the 5 missing entries (the boxes outlined with a dotted line) of the matrix above and include
the backtracking arrows for those entries. Note: The boxes have been labeled so that, if it is easier,
you can write down the solution as such A: VAL, arrow : DIR where VAL is the numeric value and
DIR is one of LEFT, DOWN, DIAGONAL.
iii. Describe (in words, not pseudocode) how to find the optimal global alignment from the matrix, and
write the optimal alignment you find. You should write the alignment in the format we discussed in
class; e.g., as follows for some alignment of the strings “CAR” and “CPRT”:
C A R –
| X |
C P R T
5
5 pts 1 point for each correct cell
A : -9
,
arrow : DIAGONAL ( DOWN is also valid)
B : -15
,
arrow : DOWN
C : – 9
, arrow :
LEFT
D:
– 15
,
arrow : DOWN
E : -4
,
arrow : DIAGONAL
2. 5 points for correct alignment
–
– ALIGNMENT
✗ ✗ I ✗ ✗ 1 1
11
CONFINEMENT
2. 5 points for correct description of
backtracing procedure
Problem 4. (25 points) Note: This problem has 2 parts.
i. Consider the problem of global alignment of a pair of strings under a linear gap scoring function (as
above). Likewise, assume you are seeking to maximize the score of the resulting alignment. Provide
a dynamic program that will compute the number of optimal alignments between the strings. Please
write your algorithm in pseudocode (not prose). Assuming that arithmetic operations on numbers
can be done in constant time (regardless of the size of the numbers), your algorithm should run in
quadratic time.
6
5 pts for using 2nd matrix to track counts
5 pts
for properly combining counts from predecessors
(when co – optimal)
12.5 points
match
mismatch
/
✓
✓gap
def count
–
Sol ( X ,Y
,
M
, X , g) {
n= lenlx)
m= lenly)
M= matrix (nm) 11 initialized to 0
MIO 🙂 :] = [ i. g
for i from 0 to m]
ME : ,O] = Ei • g
for i from 0 to n]
( =matrix (nm) 11 initialized to 0
,
def Sla,b,m , ×){
CEO
,
:] = 1
“” ” ”
J
” “” ” £
return M
for i =\ to n {
} else {
for j= 1 to m { return ✗
gy.ci/–M-Li-l,j7+g,CEi-l,j]
gx,cx= Mci , j
–
Btg , c[ i , j – i]
}
}
d) Cd = Mci-1
, j – I]+sCXEi],Y[j],m,×) , CEI -1 , j – I]
5- Max lgy , gx , d)
for v. f in zip ([ gy ,gx,d] ,
[ cy , ex , cd
]){
if v==s {
M[i,j]=v
C. [ i,j]t=f
}
}
}
}
return [ [mm]
}
ii. Consider the problem of global alignment of a pair of strings under a linear gap scoring function (as
above). Likewise, assume you are seeking to maximize the score of the resulting alignment. Provide
a dynamic program that will compute the score of the best sub-optimal alignment between the strings.
That is, if the best alignments have score 100, and the second best alignments have score 95, and the
third best alignments have score 90, your program should return 95. If there are many alignments
that have the optimal score (e.g. 100), your program should still return the score of the second best
alignment (i.e. 95). Please write your algorithm in pseudocode (not prose). Your algorithm should
run in quadratic time.
7
5 points for tracking 2nd -best solutions for each cell
5 points for properly determining
2
” ”
-best solution for a
cell given te sub
–
problems .
match
mismatch
v1 / ✓gap
def svbopt – Sol ( X ,Y , M , X , g) {
n= lenlx)
m= lenly)
M= matrix (nm) 11 initialized to 0
MIO 🙂 :] = [ i. g
for i from 0 to m]
ME : ,O] = Ei • g
for i from 0 to n]
M2= matrix (nm) 11 initialized to 0
M2Eoj :] = [ i.
g
for i from 0 to m]
M2[ : ,0] = Ei • g
for i from 0 to n]
for i =\ to n {
for j=l to m {
scores__ [MEI -1 ,j]+g
,
Mci
, j
– 1) +g) Mci-1
, j – ☐ + SCXEI],Y[j] ,mix]
Scores t=[M2Ei -1 ,j]+g,M2ei , j -☐ +g)M2[i- i , j – ☐ + SCXEI],Y[j] ,mix]
scores = set ( scores) 11 remove duplicates
scores = sorted ( scores
,
reverse = true) 11 best to worst
if len / scores )== I {
M[i ,j ] = M2[ i,j] = scores [ 0 ]
} else {
M[ i ,j ] = scores to]
M2[ i,j] = Scores [ I]
}
}
}
return M2[mm]
}
Problem 5. (15 points) Given the phylogenetic tree below, and the nucleotide-to-nucleotide transition
cost matrix, find the minimum cost for this tree. Write this optimal cost above the root node. Also, write
in each internal node the state of the character that is used in an optimal solution, and along each edge the
cost of the transition from the parent to the child node. Note: The nodes of the tree are labeled so that, if
it is easier, you can write down your scores / states for a node by referring to it directly e.g. X : VAL, and
for an edge by referring to its endpoints (X ! Z) : VAL2.
G G
T T
A C G T
A 0 1 1 3
C 3 0 2 3
G 3 2 0 3
T 2 3 3 0
A C
V
W X
Y Z
8
6
A
0
0
A A
1 0 I 3
A T
→lot 0 I •1-1010 o o
01-1-10 -101-10 state state
If not correct answer
• 7.5 points for
” self consistent ” tree . . .
if the stated cost is achieved by the
node states & transition
costs