CS计算机代考程序代写 algorithm Name:

Name:

I followed the University’s Honor Code.

Student ID:

CMSC 423 Fall 2021 : Midterm 2

Solve each of the following problems. There are 5 problems and 100 total points. You may use NO notes,

calculators, books, phones etc.. If you need additional space, use the back of the exam pages (and note

that you’ve written there). If you need still more space, use the blank paper provided. If you think a

problem cannot be solved as stated, explain why. For the algorithm design questions, you should

describe your algorithm either in clear and unambiguous prose, or using clearly-written pseudocode. Please,

write as neatly as you can. Partial credit is granted, but we cannot grade what we cannot interpret. Good

luck!

Problem 1. (30 points)

i. True or False (circle the correct one) : For a pair of strings X, Y, global pairwise sequence alignment is

asymptotically more e�cient than local pairwise sequence alignment because you need not determine

where the alignment must start or end.

ii. Given a pair of strings X of length n and Y of length m, the asymptotic time complexity for finding the

optimal global alignment between X and Y under a a�ne gap penalty function is O( ).

iii. Given the fm-index over a string T of length n, determining if pattern P of length m exists in the text

requires O( ) time.

iv. Given strings X and Y of length |X| and |Y | respectively, finding the optimal global alignment requires

at most what amount of space?

(a) O(|X| |Y |)

(b) O(|X| log |Y |)

(c) O(|X|+ |Y |)

(d) O(|X|� |Y |)

v. Describe (in one sentence) a reason for the prevalence of reads multi-mapping among transcripts (i.e.

reads aligning equally well to multiple transcripts) in an RNA-seq experiment.

1

0

nm

:
Alternative splicing of isoforms leads to

shared

sequence .

( Also
, gene families share extensive sequence)

vi. True or False (circle the correct one) : The asymptotic time complexity for building the FM-index is

larger than that of building the su�x array.

vii. Given a pair of strings X, Y , both of length n, the asymptotic time complexity for finding the optimal

global alignment between X and Y under a general gap penalty function is O( ).

viii. Given the observations x = {H,H, T,H,H, T,H, T, T,H} for a series of 10 flips of a coin. Given that

the outcomes follow a Bernoulli distribution with parameter ✓ (the probabiliy of observing heads on a

random flip of the coin), the maximum likelihood estimator for ✓ is ✓̂ = .

ix. True or False (circle the correct one) : The Sanko↵ algorithm to solve the small phylogeny problem

can only be applied when the character state transition cost is symmetric (i.e. when the cost of c ! c0

= the cost of c0 ! c) along a tree edge.

x. True or False (circle the correct one) : Given a likelihood function L(X; ✓), the value of ✓ that maximizes

logL(X; ✓) also maximizes L(X; ✓).

xi. True or False (circle the correct one) : The splits S1 = {A,B,C | D,E, F} and S2 = {A,D,C | B,E, F}

are incompatible.

xii. Write down a pair S1 and S2 of (non-identical) compatible splits over the set of taxa {A,B,C,D,E, F}:

S1 =

S2 =

xiii. The asymptotic time complexity for computing the “fitting” alignment of string X of length n against

string Y of length m is O( ).

xiv. Given a weighted graph G with positive edge weights, the problem of finding the longest simple path

(the longest path that does not have any repeated vertices) in the graph is NP-hard. Given a weighted

directed acyclic graph D = (V,E) with positive edge weights, the problem of finding the longest simple

path in the graph can be solved in O( ) time.

xv. We discussed two distinct approaches to solving the small phylogeny problem, one approach is maxi-

mum likelihood, and the other approach is .

2

0

m3

640

:
0

{ A ,B
,
c l D

,
E
,
F} Note : there are many correct

{ A ,B / C , D ,E ,F}
answers

. Just need that one of

Suh Sze
,
S
, pnszc , 5 , , nszr is,pMSzR

is 0
.

nm

Vt F-

maximum parsimony

Problem 2. (15 points) NOTE: This problem has 2 parts

i. Consider the string S = “wanafanta“. What is BWT(S) (i.e., the last column of the BW matrix)? Show
your work . reminder: don’t forget to add the sentinel $ to S.

ii. You are given the BWT of a string T such that BWT(T ) = “G$TCTAAAGATGG”. What was the
original string, T (show your work)?

3

7.5 points

atnwfaaanls

If not correct
,
Yz credit for drawing out

the rows cyclic permutations , 74 credit for

correctly sorting them into the DWM .

7.5 points

ACAGGTAGTTAG $

If not correct
,

42 credit for drawing out the BWM

Problem 3. (15 points) NOTE: This problem has 3 parts

Consider the strings x = “ALIGNMENT” and y = “CONFINEMENT”. Below is a partial matrix for the
global alignment between x and y, where we are attempting to maximize a given score function.

A

B

C

D

E

i. Knowing that the matrix was created as part of the algorithm for global alignment with linear gap

scores, what are the gap, match, and mismatch scores?

4

no
partial
credit gap

=
-g

Frein
match = 2

mismatch = -1

ii. Fill in the 5 missing entries (the boxes outlined with a dotted line) of the matrix above and include

the backtracking arrows for those entries. Note: The boxes have been labeled so that, if it is easier,

you can write down the solution as such A: VAL, arrow : DIR where VAL is the numeric value and

DIR is one of LEFT, DOWN, DIAGONAL.

iii. Describe (in words, not pseudocode) how to find the optimal global alignment from the matrix, and

write the optimal alignment you find. You should write the alignment in the format we discussed in
class; e.g., as follows for some alignment of the strings “CAR” and “CPRT”:

C A R –

| X |

C P R T

5

5 pts 1 point for each correct cell

A : -9
,
arrow : DIAGONAL ( DOWN is also valid)

B : -15
,
arrow : DOWN

C : – 9
, arrow :

LEFT

D:
– 15

,
arrow : DOWN

E : -4
,
arrow : DIAGONAL

2. 5 points for correct alignment


– ALIGNMENT

✗ ✗ I ✗ ✗ 1 1
11

CONFINEMENT

2. 5 points for correct description of

backtracing procedure

Problem 4. (25 points) Note: This problem has 2 parts.

i. Consider the problem of global alignment of a pair of strings under a linear gap scoring function (as

above). Likewise, assume you are seeking to maximize the score of the resulting alignment. Provide

a dynamic program that will compute the number of optimal alignments between the strings. Please
write your algorithm in pseudocode (not prose). Assuming that arithmetic operations on numbers
can be done in constant time (regardless of the size of the numbers), your algorithm should run in

quadratic time.

6

5 pts for using 2nd matrix to track counts

5 pts
for properly combining counts from predecessors

(when co – optimal)

12.5 points

match
mismatch

/

✓gap

def count

Sol ( X ,Y
,
M
, X , g) {

n= lenlx)
m= lenly)
M= matrix (nm) 11 initialized to 0

MIO 🙂 :] = [ i. g
for i from 0 to m]

ME : ,O] = Ei • g
for i from 0 to n]

( =matrix (nm) 11 initialized to 0
,
def Sla,b,m , ×){

CEO
,
:] = 1

“” ” ”

J
” “” ” £
return M

for i =\ to n {
} else {

for j= 1 to m { return ✗

gy.ci/–M-Li-l,j7+g,CEi-l,j]
gx,cx= Mci , j

Btg , c[ i , j – i]
}
}

d) Cd = Mci-1
, j – I]+sCXEi],Y[j],m,×) , CEI -1 , j – I]

5- Max lgy , gx , d)
for v. f in zip ([ gy ,gx,d] ,

[ cy , ex , cd
]){

if v==s {
M[i,j]=v
C. [ i,j]t=f

}
}
}
}
return [ [mm]

}

ii. Consider the problem of global alignment of a pair of strings under a linear gap scoring function (as

above). Likewise, assume you are seeking to maximize the score of the resulting alignment. Provide

a dynamic program that will compute the score of the best sub-optimal alignment between the strings.
That is, if the best alignments have score 100, and the second best alignments have score 95, and the

third best alignments have score 90, your program should return 95. If there are many alignments

that have the optimal score (e.g. 100), your program should still return the score of the second best

alignment (i.e. 95). Please write your algorithm in pseudocode (not prose). Your algorithm should
run in quadratic time.

7

5 points for tracking 2nd -best solutions for each cell

5 points for properly determining
2
” ”
-best solution for a

cell given te sub

problems .

match
mismatch

v1 / ✓gap
def svbopt – Sol ( X ,Y , M , X , g) {

n= lenlx)
m= lenly)
M= matrix (nm) 11 initialized to 0

MIO 🙂 :] = [ i. g
for i from 0 to m]

ME : ,O] = Ei • g
for i from 0 to n]

M2= matrix (nm) 11 initialized to 0

M2Eoj :] = [ i.
g
for i from 0 to m]

M2[ : ,0] = Ei • g
for i from 0 to n]

for i =\ to n {
for j=l to m {
scores__ [MEI -1 ,j]+g

,
Mci

, j
– 1) +g) Mci-1

, j – ☐ + SCXEI],Y[j] ,mix]
Scores t=[M2Ei -1 ,j]+g,M2ei , j -☐ +g)M2[i- i , j – ☐ + SCXEI],Y[j] ,mix]
scores = set ( scores) 11 remove duplicates
scores = sorted ( scores

,
reverse = true) 11 best to worst

if len / scores )== I {

M[i ,j ] = M2[ i,j] = scores [ 0 ]
} else {
M[ i ,j ] = scores to]
M2[ i,j] = Scores [ I]

}
}

}

return M2[mm]

}

Problem 5. (15 points) Given the phylogenetic tree below, and the nucleotide-to-nucleotide transition
cost matrix, find the minimum cost for this tree. Write this optimal cost above the root node. Also, write

in each internal node the state of the character that is used in an optimal solution, and along each edge the
cost of the transition from the parent to the child node. Note: The nodes of the tree are labeled so that, if
it is easier, you can write down your scores / states for a node by referring to it directly e.g. X : VAL, and

for an edge by referring to its endpoints (X ! Z) : VAL2.

G G

T T

A C G T
A 0 1 1 3
C 3 0 2 3
G 3 2 0 3
T 2 3 3 0

A C

V

W X

Y Z

8

6
A

0
0

A A
1 0 I 3

A T

→lot 0 I •1-1010 o o

01-1-10 -101-10 state state

If not correct answer

• 7.5 points for
” self consistent ” tree . . .

if the stated cost is achieved by the
node states & transition

costs