程序代写代做代考 cache information retrieval algorithm Better than next()

Better than next()
•  What’s the worst case for sequential merge-based

intersection?
•  {52, 1} è move k2’s cursor

–  To the position whose id is at least 52 è skipTo(52)
–  Essentially, asking the first i, such that K2[i] >= 52 (K2’s list

is sorted).
–  Takes many sequential call of next()
–  Could use binary search in the rest of the list
–  Cost: ⎡log2(Nremainder) ⎤

… … … 79

52 54 56 58

1 3 5 … K2:

K1: 1

Can you use a hashtable to implement skipTo()?

skipTo(id)
•  Galloping search (gambler’s strategy)

–  [Stage 1] Doubling the search range until you overshoot
–  [Stage 2] Perform binary search in the last range

•  Performance analysis (worst case)
–  Let the destination position be n positions away.
–  ≈ log2 n probes in Stage 1 + ≈ log2 n probes in Stage 2
–  Total = 2⎡ log2 (n+1) ⎤= O(log2 n)

9 11 13 79 1 3 5 7 K1:

52?

Inc=1 Inc=2 Inc=4

Binary
search

2

Total Cost
•  Galloping search (gambler’s strategy)

–  Cost of the i-th probe: ≈ 2 log2(ni)
–  Total cost of K1 probes: ≈ 2 log2(∏1|K1| ni)
≤ 2 log2( ((∑1|K1| ni) / |K1|)|K1| ) ≤ 2|K1|*log2(|K2|/|K1|)

•  Asymptotically, resembles linear merge when
|K2|/|K1| = O(1), resembles binary search
when |K1| = O(1)

What about list intersection using binary search?
3

Multiple Term Conjunctive
Queries

•  K1 AND K2 AND … AND Kn
•  SvS does not perform well if none of the

associated lists are short
•  In addition, it is blocking
•  Can you design non-blocking multiple

sorted array intersection algorithm?

4

Generalization

•  Generalize the 2-way intersection
algorithm

•  2-way:
–  {1, 2} è move k1’s cursor
– skipTo(2)

•  3-way:
–  {1, 2, 3} è move k1,k2’s cursor
– skipTo(3)

1 3

2 4 6

3 9 27 81

K1:

K2:

K3:

eliminator = Max1<=i<=n(ki.cursor) 5 Optimization •  Mismatch found even before accessing K3’s cursor •  Choice 1: continue to get cursors of other list •  Choice 2: settle the dispute within the first two lists è max algorithm [Culpepper & Moffat: Efficient set intersection for inverted indexing] –  Better locality of access è fewer cache misses –  Similar to SvS 1 3 2 4 6 3 9 27 81 K1: K2: K3: 6 Pseudo-Code for the Max Algorithm (Wrong) •  Input: K1, K2, …, Kn in increasing size x := K1[1]; startAt := 2 while x is defined do for i = startAt to n do y := Ki.skipTo(x) if y > x then
x := K1.next()

if y > x then startAt := 1; x := y else startAt := 2 end if
break

elsif i = n then
Output x
x := K1.next()

end if
end for

end while

//mismatch
 

//match
 in
 all
 
 lists
 

//restart_1
  //restart_2
 

//y
 =
 x
 

//x
 is
 the
 eliminator
 

7

(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)

(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)

Pseudo-Code for the Max Algorithm (Fixed)
•  Input: K1, K2, …, Kn in increasing size
x := K1[1]; startAt := 2
while x is defined do

for i = startAt to n do
y := Ki.skipTo(x)
if y > x then
x := K1.next()

if y > x then startAt := 1; x := y else startAt := 2 end if
break

elsif i = n then
Output x
x := K1.next()

end if
end for

end while

8

The original code has a bug when in restart_1 cases

(4.1) if i = 1 then
(4.2) if y > x then
(4.3) x := y
(4.4) break
(4.5) end if
(4.6) end if

References

•  J. Shane Culpepper, Alistair Moffat: Efficient
set intersection for inverted indexing. ACM
Trans. Inf. Syst. 29(1): 1 (2010)

•  F.K. Hwang and S. Lin, A simple algorithm for
merging two disjoint linearly ordered sets.
SIAM J. Comput. 1 1 (1972), pp. 31–39.

•  Stefan Buettcher, Charles L. A. Clarke,
Gordon V. Cormack, Information Retrieval:
Implementing and Evaluating Search
Engines, 2010 [Chapter 5] 9