UNIS Template
*
COMMONWEALTH OF AUSTRALIA
Copyright Regulations 1969
WARNING
This material has been reproduced and communicated to you by or on behalf of the University of Sydney pursuant to Part VB of the Copyright Act 1968 (the Act). The material in this communication may be subject to copyright under the Act. Any further copying or communication of this material by you may be the subject of copyright protection under the Act.
Do not remove this notice.
*
Orthogonal range searching
Orthogonal range searching
Joachim Gudmundsson
*
Orthogonal range searching
*
Range queries
Input: A set of n points S={s1, s2, … , sn} in d-dimensional space.
Aim: Preprocess S such that orthogonal range queries can be handled efficiently.
*
Range queries
The data is usually processed once while queries are performed many times.
Preprocessing: O(n log n)?
Space: O(n)?
Query time: O(polylog n+k)?
*
1D-range queries
S={12,3,29,20,2,11,30,31,24,22,5,9,26}
Q=[10,25]
*
1D-range queries
S={12,3,29,20,2,11,30,31,24,22,5,9,26}
What if we allow no preprocessing?
Try every point separately – O(n) time
Q=[10,25]
*
1D-range queries
S={12,3,29,20,2,11,30,31,24,22,5,9,26}
What if we allow preprocessing?
Sort the points
S={2,3,5,9,11,12,20,22,24,26,29,30,31}
Query: Search for the boundary values.
Report everything in-between.
Q=[10,25]
*
1D-range queries using BST
S={2,3,5,9,11,12,20,22,24,26,29,30,31}
Q=[10,25]
*
1D-range queries using BST
S={2,3,5,9,11,12,20,22,24,26,29,30,31}
Observation 1: A balanced binary search tree can be constructed in linear time if the input is sorted.
2 3 5 9 11 12 20 22 24 26 29 30 31
5
The largest value in the left subtree
All elements in the leaves
*
1D-range queries using BST
S={2,3,5,9,11,12,20,22,24,26,29,30,31}
Query: Search for the boundary values.
Report everything in-between.
Q=[10,25]
2 3 5 9 11 12 20 22 24 26 29 30 31
*
1D-range queries using BST
S={2,3,5,9,11,12,20,22,24,26,29,30,31}
Query: Search for the boundary values.
Report everything in-between.
Q=[10,25]
2 3 5 9 11 12 20 22 24 26 29 30 31
*
1D-range queries using BST
S={2,3,5,9,11,12,20,22,24,26,29,30,31}
Query: Search for the boundary values.
Report everything in-between.
Q=[10,25]
2 3 5 9 11 12 20 22 24 26 29 30 31
*
1D-range queries using BST
S={2,3,5,9,11,12,20,22,24,26,29,30,31}
Query: Search for the boundary values.
Report everything in-between.
Q=[10,25]
2 3 5 9 11 12 20 22 24 26 29 30 31
*
1D-range queries using BST
S={2,3,5,9,11,12,20,22,24,26,29,30,31}
Query: Search for the boundary values.
Report everything in-between.
Q=[10,25]
2 3 5 9 11 12 20 22 24 26 29 30 31
*
1D-range queries using BST
S={2,3,5,9,11,12,20,22,24,26,29,30,31}
Query time:
Q=[10,25]
2 3 5 9 11 12 20 22 24 26 29 30 31
O(log n+reporting points)
log n
*
1D-range queries using BST
How to report the points?
Q=[10,25]
2 3 5 9 11 12 20 22 24 26 29 30 31
split point (search paths split)
Follow left path.
*
1D-range queries using BST
How to report the points?
Q=[10,25]
2 3 5 9 11 12 20 22 24 26 29 30 31
split point (search paths split)
Follow left path.
*
1D-range queries using BST
How to report the points?
Q=[10,25]
2 3 5 9 11 12 20 22 24 26 29 30 31
split point (search paths split)
Follow left path.
No point in sub-tree lies inside Q
*
1D-range queries using BST
How to report the points?
Q=[10,25]
split point (search paths split)
Follow left path.
Every time the left path turns left report the leaves in the right subtree!
2 3 5 9 11 12 20 22 24 26 29 30 31
All points in sub-tree lies inside Q
*
1D-range queries using BST
How to report the points?
Q=[10,25]
split point (search paths split)
Follow right path.
Every time the right path turns right report the leaves in the left subtree!
2 3 5 9 11 12 20 22 24 26 29 30 31
Time: O(size of subtree)
All points in sub-tree lies inside Q
*
1D-range queries using BST
Algorithm 1D-RangeQuery(T,[a,b])
SplitNode FindSplitNode(T,a,b)
v lc(split)
While v is not a leaf do
If value(v) a then
ReportSubtree(rc(v))
v lc(v)
else v rc(v)
If value(v) in [a,b] then report(value(v))
% Do the same for the right path
…
SplitNode
v
rc(v)
*
1D-range queries using BST
Correctness proof
Prove that every reported point p lies in [a,b]
If p is a leaf of the search path then p is tested explicitly.
Otherwise p must be reported in call to ReportSubtree(rc(v)).
Assume this happened when searching for a.
SplitNode
v
rc(v)
Search for b goes right at split node b > SplitNode p
Search for a goes left at v a v < p Finally prove that every point in the range is reported. p [a,b] * 1D-range queries using BST How long does it take to report all the leaves of a bbs-tree? 2 3 5 9 11 12 20 22 24 26 29 30 31 Query time: O(log n+reporting points) n leaves ≤ 2n-1 nodes * 1D-range queries using BST How long does it take to report all the leaves of a bbs-tree? 2 3 5 9 11 12 20 22 24 26 29 30 31 Query time: O(log n+k) Preprocessing: O(n log n) Space: O(n) Query time: O(log n+k) * 2D-range queries Input: A set of n points S={s1, s2, … , sn} in the Euclidean plane. Aim: Preprocess S such that orthogonal range queries can be handled efficiently. * kD-trees How can we generalise the 1D structure to 2D? What if we split both wrt x-coordinates and y-coordinates? * kD-trees How can we generalise the 1D structure to 2D? What if we split both wrt x-coordinates and y-coordinates? A A * kD-trees How can we generalise the 1D structure to 2D? What if we split both wrt x-coordinates and y-coordinates? A B C A B C * kD-trees How can we generalise the 1D structure to 2D? What if we split both wrt x-coordinates and y-coordinates? A B D E F G C B C A D E F G * kD-trees How can we generalise the 1D structure to 2D? What if we split both wrt x-coordinates and y-coordinates? B C A B D E F G C H I J K L M N O A D E F G H L N K M I J O * kD-trees How can we generalise the 1D structure to 2D? What if we split both wrt x-coordinates and y-coordinates? A B D E F G C H I J K L M N O kD-trees How can we generalise the 1D structure to 2D? What if we split both wrt x-coordinates and y-coordinates? A B D E F G C H I J K L M N O * kd-tree pseudocode Algorithm kd-tree(P,depth) if #P=1 then return leaf containing P else if even(depth) then split P with vertical line L through Xmedian(P) P1 subset of P on or to the left of L, P2 subset of P to the right of L else if odd(depth) then split P with horizontal line L through Ymedian(P.y) P1 subset of P on or below L, P2 subset of P above L create node v storing L v.left kd-tree(P1,depth+1) v.right kd-tree(P2,depth+1) RETURN v Construction time? * kd-tree preprocessing Median can be computed in O(n) time [Blum et al. 1973] or… * kd-tree preprocessing Median can be computed in O(n) time [Blum et al. 1973] or… 1 2 3 4 5 6 sorted wrt x-coord. sorted wrt y-coord. Time = O(n log n) 1 2 3 4 5 6 Time T(n) = O(1) if n=1 O(n) + 2T(n/2) otherwise * kd-tree space Space = size of a balanced binary tree = O(n) * kd-tree query A node v in the kd-tree corresponds to a rectangular region in the plane (region(v)). V 1 2 3 1 2 3 All points in region(v) are leaves in the subtree with root at v. v * kd-tree query Query algorithm: Given query rectangle R, visit all nodes in T that intersect R. 2 R 1 1 2 3 3 v kd-tree query Query algorithm: Given query rectangle R, visit all nodes in T that intersect R. 2 R All points in the subtree of T rooted at v should be reported! 4 1 1 2 3 3 4 v v * kd-tree query Algorithm SearchTree(v – root of T, R – query range) if v is a leaf then RETURN v if v in R else if region(lc(v)) in R then ReportSubtree(lc(v)) else if region(lc(v)) intersects R then SearchTree(lc(v),R) else if region(rc(v)) in R then ReportSubtree(rc(v)) else if region(rc(v)) intersects R then SearchTree(lc(v),R) Query time? O(k) + number of nodes tested in T that do not contain any points to report. 2 1 R * kd-tree query Algorithm SearchTree(v – root of T, R – query range) if v is a leaf then RETURN v if v in R else if region(lc(v)) in R then ReportSubtree(lc(v)) else if region(lc(v)) intersects R then SearchTree(lc(v),R) else if region(rc(v)) in R then ReportSubtree(rc(v)) else if region(rc(v)) intersects R then SearchTree(lc(v),R) Query time? O(k) + number of nodes tested in T that do not contain any points to report. How many such regions is there? 2 1 R * kd-tree query The number of such regions is at most the number of regions that can intersect a vertical line (×4). Consider a vertical line L and a kd-tree T R A L A L * kd-tree query Consider a vertical line L and a kd-tree T A L B A Number of visited nodes? Level 1: Q(n) = 1 + Q(n/2) Level 2: Q(n) = 1 + 2Q(n/4) B Q(n) = 2 + 2Q(n/4) = O(n ) * Summary: kd-trees Preprocessing: O(n log n) Space: O(n) Query time O(n1/2+k) * kd-trees: higher dimensions 3D: Consider a side L of the query range (×6) L B A A * kd-trees: higher dimensions 3D: Consider a side L of the query range L B B A A * kd-trees: higher dimensions Query: Q(n) = 4 + 4Q(n/8) = O(n2/3) 3D: Consider a side of the query range L A B A L B * Summary kd-trees: higher dimensions Preprocessomg: O(dn log n) Space: O(dn) Query: O(n1-1/d + k) dD: Consider a side of the query range Can we do better? If space = O(n) then query time = (n1-1/d + k) * 2D-range queries Range Trees kd-trees are optimal if we only allow for O(n) space. What if we allow for more space, can we do better? * 2D-range queries Observation: A 2D-query is two 1D-queries. * 2D-range queries Observation: A 2D-query is two 1D-queries. Idea: Build a balanced binary search tree w.r.t. the x-coordinates of the points in S. * 2D-range queries Observation: A 2D-query is two 1D-queries. Idea: Build a balanced binary search tree w.r.t. the x-coordinates of the points in S. T * 2D-range queries For each internal node v in T construct an associated data structure A(v), for the points in the subtree of T rooted at v. v T * 2D-range queries For each internal node v in T construct an associated data structure A(v), for the points in the subtree of T rooted at v. A(v) is a balanced binary search tree w.r.t. the y-coordinates. v A(v) v A(v) * 2D-range queries For each internal node v in T construct an associated data structure A(v), for the points in the subtree of T rooted at v. A(v) is a balanced binary search tree w.r.t. the y-coordinates. Range tree: multilevel tree structure T – main tree (first-level tree) v A(v) – associated data structure (secondary tree) * 2D-range queries Query: [x:x’] [y;y’] x x’ * 2D-range queries Query: [x:x’] [y;y’] x x’ x x’ * 2D-range queries Query: [x:x’] [y;y’] x x’ C B A x x’ A B C * 2D-range queries Query: [x:x’] [y;y’] x x’ C B A x x’ A B C y’ y Query time: Left and right search paths + Searching the associated structures 2 x O(log n) * 2D-range queries Each 1D-range tree takes O(log n+k’) to query. How many associated structures do we need to query? * 2D-range queries Query time: Querying a 1D-tree requires O(log n+k’) time. How many 1D trees (associated structures) do we need to query? At most 2 height of T = 2 log n Each 1D query requires O(log n+k’) time. Query time = O(log2 n + k) Query: [x,x’] x x’ * Range trees Space: - Number of nodes in T is O(n) - |A(v)| = ? Consider one leaf v. How many associated structures does v belong to? Which ones does it belong to? vT v * Range trees Space: - Number of nodes in T is O(n) - |A(v)| = ? Consider one leaf v. How many associated structures does v belong to? Which ones does it belong to? vT v * Range trees Space: - Number of nodes in T is O(n) - |A(v)| = ? Consider one leaf v. How many associated structures does v belong to? Which ones does it belong to? vT v log n * Range trees Space: - Number of nodes in T is O(n) - |A(v)| = O(n log n) Consider one leaf v. How many associated structures does v belong to? Which ones does it belong to? vT v log n * Range trees Construction time: - T can be constructed in O(n log n) time - All the associated structures can be constructed in time O(n log n) + |A(v)| = O(n log n). Why? Observation 1! vT v log n * 2D-range queries Summary Preprocessing: O(n log n) Space: O(n log n) Query time: O(log2 n + k) * dD-range queries Summary Preprocessing: O(n logd-1 n) Space: O(n logd-1 n) Query time: O(logd n + k) first-level tree second-level tree d-level tree v * Lower bound Query time and preprocessing is almost optimal. Can we improve the space requirement? Theorem: (Chazelle 1990) If query time is O(logc n + k) then one needs at least (n log n/loglog n) space. * Summary: Range trees kd-trees Space: O(dn) Preprocessomg: O(dn log n) Query: O(n1-1/d + k) Range trees Space: O(n logd-1 n) Preprocessing: O(n logd-1 n) Query time: O(logd n) O(logd-1 n) with some tricks