程序代写代做代考 algorithm information retrieval cache Can you use a hashtable to implement skipTo()?

Can you use a hashtable to implement skipTo()?
Better than next()
• What’stheworstcaseforsequentialmerge-based intersection?
• {52, 1}èmove k2’s cursor
– To the position whose id is at least 52èskipTo(52)
– Essentially, asking the first i, such that K2[i] >= 52 (K2’s list is sorted).
– Takes many sequential call of next()
– Could use binary search in the rest of the list
– Cost: ⎡log2(Nremainder) ⎤
K2: K1:
1
1
3
5




79
52
54
56
58

skipTo(id)
• Galloping search (gambler’s strategy)
– [Stage 1] Doubling the search range until you overshoot – [Stage 2] Perform binary search in the last range
• Performanceanalysis(worstcase)
– Let the destination position be n positions away.
– ≈log2nprobesinStage1+≈log2 nprobesinStage2 – Total = 2⎡ log2 (n+1) ⎤= O(log2 n)
Inc=1 Inc=2
Inc=4
Binary search
1
3
5
7
9
11
13
79
K1:
52?
2

Total Cost
• Galloping search (gambler’s strategy)
– Cost of the i-th probe: ≈ 2 log2(ni)
– Total cost of K1 probes: ≈ 2 log2(∏1|K1| ni)
≤ 2 log2( ((∑1|K1| ni) / |K1|)|K1| ) ≤ 2|K1|*log2(|K2|/|K1|)
• Asymptotically, resembles linear merge when |K2|/|K1| = O(1), resembles binary search when |K1| = O(1)
What about list intersection using binary search?
3

Multiple Term Conjunctive Queries
• K1ANDK2AND…ANDKn
• SvS does not perform well if none of the
associated lists are short
• In addition, it is blocking
• Can you design non-blocking multiple sorted array intersection algorithm?
4

Generalization
• Generalize the 2-way intersection
algorithm
1
3
K1: K2: K3:
• 3-way:
– {1, 2, 3}èmove k1,k2’s cursor
• 2-way:
– {1, 2}èmove k1’s cursor – skipTo(2)
2
4
6
3
9
27
81
– skipTo(3)
eliminator = Max1<=i<=n(ki.cursor) 5 Optimization • Mismatch found even before accessing K3’s cursor • Choice 1: continue to get cursors of other list • Choice 2: settle the K1: K2: K3: 1 3 2 4 6 3 9 27 81 dispute within the first two listsèmax algorithm [Culpepper & Moffat: Efficient set intersection for inverted indexing] – Better locality of access è fewer cache misses – Similar to SvS 6 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) Pseudo-Code for the Max Algorithm (Wrong) • Input: K1, K2, ..., Kn in increasing size x := K1[1]; startAt := 2 //x is the eliminator while x is defined do for i = startAt to n do y := Ki.skipTo(x) if y > x then //mismatch
x := K1.next() //restart_1 //restart_2
if y > x then startAt := 1; x := y else startAt := 2 end if
break elsifi=nthen
Output x
x := K1.next() end if
end for end while
//match in all lists
//y = x
7

Pseudo-Code for the Max Algorithm (Fixed)
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14)
x := K1[1]; startAt := 2 while x is defined do
for i = startAt to n do y := Ki.skipTo(x)
ify>xthen
x := K1.next()
(4.1)
(4.2)
(4.3)
(4.4)
(4.5)
(4.6)
if i = 1 then
if y > x then
x := y
break end if
end if
• Input: K1, K2, …, Kn in increasing size
if y > x then startAt := 1; x := y else startAt := 2 end if
break elsifi=nthen
Output x
x := K1.next() end if
end for end while
8
The original code has a bug when in restart_1 cases

References
• J. Shane Culpepper, Alistair Moffat: Efficient set intersection for inverted indexing. ACM Trans. Inf. Syst. 29(1): 1 (2010)
• F.K. Hwang and S. Lin, A simple algorithm for merging two disjoint linearly ordered sets. SIAM J. Comput. 1 1 (1972), pp. 31–39.
• Stefan Buettcher, Charles L. A. Clarke, Gordon V. Cormack, Information Retrieval: Implementing and Evaluating Search Engines, 2010 [Chapter 5]
9