代写 data structure algorithm parallel database CSE 4502/5717 Big Data Analytics Fall 2019 Exam 2 Helpsheet

CSE 4502/5717 Big Data Analytics Fall 2019 Exam 2 Helpsheet
1. In a Parallel Disks Model (PDM) there are D disks. In one parallel I/O we can
bring a block (of size B) of elements from each of the disks. We typically assume
that M is a constant multiple of DB. We briefly described the DSM and SRM al-
gorithms for sorting on the PDM. We then introduced the (l,m)-merge sort (LMM)
algorithm and showed that it can be used to sort N given elements in no more than
number of passes through the data.
2. Suffix tree is a powerful data structure that can be used to perform a variety of op- erations on strings and much more. We showed the following results: 1) Given a text T andapatternP wecansearchforP inT inO(m+n)timewherem=|T|and n = |P|; 2) Given a text T and a set P = {P1,P2,…,Pq} of patterns, we can find all the occurrences of all the patterns in T in O(m + N + K) time where m = |T|, N is the total size of all the patterns and K is the total number of occurrences of all the patterns in T; 3) Given a database DB of texts {T1,T2,…,Tk} and a set of patterns P = {P1,P2,…,Pq}, we can find occurrences of all the patterns in DB in O(M +N +K) time where M is the total size of all the texts in DB, N is the total size of all the patterns, and K is the total number of occurrences of all the patterns in DB; 4) Given two strings S1 and S2, we can find the longest common substring between them in O(|S1| + |S2|) time; 5) Given two strings S1 and S2 and an integer l, we can find all the substrings of S2 of length ≥ l that occur in S1 in O(|S1| + |S2|) time; 6) Given a string S1, a collection of strings C1, C2, . . . , Cq and an integer l, we can find all the occurrences of Ci of length ≥ l in S1 (for 1 ≤ i ≤ q) in O(|S1| + 􏰆qi=1 |Ci|) time; 7) Given a set of strings S1,S2,…,Sn we can compute l[2 : n] such that l(i) = the length of the longest common substring that occurs in at least i strings (for 2 ≤ i ≤ n) in O(Mn) time where M is the total length of the n strings; and 8) Given n strings of total length M, we can solve the all pairs suffix-prefix problem in O(M + n2) time.
3. We showed that a suffix array on a given string of length m can be constructed in O(m) time. We can use the suffix array and the longest common prefix (LCP) array to search for a pattern P in a text T in O(n + log m) character comparisons, where m = |T| and n = |P|. We also pointed out that we can compute the LCP array (for pairs of interest in string matching) in O(m) time.
􏰈 log( N ) 􏰉2 M + 1
log(min{√M,M }) B
1