程序代写代做代考 Excel data structure algorithm hp0

hp0

HIERARCHICAL REPRESENTATIONS OF

POINT DATA

Hanan Samet
Computer Science Department and

Institute for Advanced Computer Studies and
Center for Automation Research

University of Maryland

College Park, MD 20742 USA
e-mail: hjs@umiacs.umd.edu

These notes may not be reproduced by any means (mechanical or
electronic or any other) without the express written permission of Hanan
Samet

hp1
PRELIMINARIES

• File ≡ collection of records (N)
• Each record contains several attributes or keys (k)

Queries:
1. Point query
2. Range query (includes partial match)

3. Boolean query ≡ combine 1 and 2 with AND, OR, NOT

Search methods

1. Organize data to be stored
• boundaries of regions in the search space are

determined by the data
• e.g., binary search tree

2. Organize the embedding space from which the data is
drawn
• region boundaries in the search space are fixed
• e.g., address computation methods such as digital

searching
3. Hybrid

• use 1 for some attributes and 2 for others

Extreme solution:
• Bitmap representation where one bit is reserved for

every possible record in the multidimensional point
space whether or not it is present

• Problems:
1. large number of attributes
2. continuous data (non-discrete)

hp2

SIMPLE NON-HIERARCHICAL DATA STRUCTURES

1. Sequential list

NAME X Y

Chicago 35 42
Mobile 52 10
Toronto 62 77
Buffalo 82 65
Denver 5 45
Omaha 27 35
Atlanta 85 15
Miami 90 5

2. Inverted List

X Y

Denver Miami
Omaha Mobile
Chicago Atlanta
Mobile Omaha
Toronto Chicago
Buffalo Denver
Atlanta Buffalo
Miami Toronto

• 2 sorted lists

• Data is pointers

• Enables pruning the search with respect to one key

hp3

GRID METHOD

(0,100) (100,100)

(100,0)(0,0)

(5,45)
Denver (35,42)

Chicago

(27,35)
Omaha

(52,10)
Mobile

(62,77)
Toronto (82,65)

Buffalo

(85,15)
Atlanta

(90,5)
Miami

• Divide space into squares of width equal to the search
region

• Each cell contains a list of all points within it

• Assume L∞ distance metric (i.e., chessboard)

• Assume C = uniform distribution of points per cell

• Average search time for k-dimensional space is O(F•2k)

F = number of records found = C since query region has
the width of a cell

2k = number of cells examined

POINT QUADTREE (Finkel/Bentley)
1 hp4
b

• Marriage between a uniform grid and a binary search tree

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Chicago

POINT QUADTREE (Finkel/Bentley)
1 hp4
b

• Marriage between a uniform grid and a binary search tree

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Chicago

2
r

hp4

(52,10)
Mobile

Mobile

POINT QUADTREE (Finkel/Bentley)
1 hp4
b

• Marriage between a uniform grid and a binary search tree

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Chicago

2
r

hp4

(52,10)
Mobile

Mobile

3
z

hp4

(62,77)
Toronto

Toronto

POINT QUADTREE (Finkel/Bentley)
1 hp4
b

• Marriage between a uniform grid and a binary search tree

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Chicago

2
r

hp4

(52,10)
Mobile

Mobile

3
z

hp4

(62,77)
Toronto

Toronto

4
g

hp4

(82,65)
Buffalo

Buffalo

POINT QUADTREE (Finkel/Bentley)
1 hp4
b

• Marriage between a uniform grid and a binary search tree

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Chicago

2
r

hp4

(52,10)
Mobile

Mobile

3
z

hp4

(62,77)
Toronto

Toronto

4
g

hp4

(82,65)
Buffalo

Buffalo

5
v

hp4

(5,45)
Denver

Denver

POINT QUADTREE (Finkel/Bentley)
1 hp4
b

• Marriage between a uniform grid and a binary search tree

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Chicago

2
r

hp4

(52,10)
Mobile

Mobile

3
z

hp4

(62,77)
Toronto

Toronto

4
g

hp4

(82,65)
Buffalo

Buffalo

5
v

hp4

(5,45)
Denver

Denver

6
g

hp4

(27,35)
Omaha

Omaha

POINT QUADTREE (Finkel/Bentley)
1 hp4
b

• Marriage between a uniform grid and a binary search tree

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Chicago

2
r

hp4

(52,10)
Mobile

Mobile

3
z

hp4

(62,77)
Toronto

Toronto

4
g

hp4

(82,65)
Buffalo

Buffalo

5
v

hp4

(5,45)
Denver

Denver

6
g

hp4

(27,35)
Omaha

Omaha

7
v

hp4

(85,15)
Atlanta

Atlanta

POINT QUADTREE (Finkel/Bentley)
1 hp4
b

• Marriage between a uniform grid and a binary search tree

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Chicago

2
r

hp4

(52,10)
Mobile

Mobile

3
z

hp4

(62,77)
Toronto

Toronto

4
g

hp4

(82,65)
Buffalo

Buffalo

5
v

hp4

(5,45)
Denver

Denver

6
g

hp4

(27,35)
Omaha

Omaha

7
v

hp4

(85,15)
Atlanta

Atlanta

8
z

hp4

(90,5)
Miami

Miami

hp5

PROBLEM OF DELETION IN POINT QUADTREES

1
b

I
G

B
C

• Delete node A

• Conventional algorithm takes one son as the new root
and reinserts the remaining subtrees

hp5

1. B is the new root

2
r

PROBLEM OF DELETION IN POINT QUADTREES

1
b

I
G

B
C

• Delete node A

• Conventional algorithm takes one son as the new root
and reinserts the remaining subtrees

hp5

2. D is the new root

2
r

PROBLEM OF DELETION IN POINT QUADTREES

1
b

I
G

B
C

• Delete node A

• Conventional algorithm takes one son as the new root
and reinserts the remaining subtrees

hp5

3. G is the new root

2
r

PROBLEM OF DELETION IN POINT QUADTREES

1
b

I
G

B
C

• Delete node A

• Conventional algorithm takes one son as the new root
and reinserts the remaining subtrees

hp52
r

4. I is the new root

PROBLEM OF DELETION IN POINT QUADTREES

1
b

I
G

B
C

• Delete node A

• Conventional algorithm takes one son as the new root
and reinserts the remaining subtrees

hp53
z

• Optimal solution is to use H as the new root since the
shaded region is empty

PROBLEM OF DELETION IN POINT QUADTREES

1
b

I
G

B
C

• Delete node A

• Conventional algorithm takes one son as the new root
and reinserts the remaining subtrees

hp53
z

• Optimal solution is to use H as the new root since the
shaded region is empty

4
g

• Problems:

1. must search for H

2. a node such as H may possibly not exist as is the
case if node J is present

PROBLEM OF DELETION IN POINT QUADTREES

1
b

I
G

B
C

• Delete node A

• Conventional algorithm takes one son as the new root
and reinserts the remaining subtrees

MECHANICS OF DELETION IN POINT QUADTREES

• Ideally, want to replace deleted
node (A) with a node (B) that
leaves an empty shaded region

hp61
b

MECHANICS OF DELETION IN POINT QUADTREES

• Ideally, want to replace deleted
node (A) with a node (B) that
leaves an empty shaded region

hp61
b

2
z

hp6

D A

• Involves search and instead settle on a set of
candidate nodes obtained using a method analogous
to searching a binary search tree

MECHANICS OF DELETION IN POINT QUADTREES

• Ideally, want to replace deleted
node (A) with a node (B) that
leaves an empty shaded region

hp61
b

2
z

hp6

D A

• Involves search and instead settle on a set of
candidate nodes obtained using a method analogous
to searching a binary search tree

3
r

hp6

D is the closest node

MECHANICS OF DELETION IN POINT QUADTREES

• Ideally, want to replace deleted
node (A) with a node (B) that
leaves an empty shaded region

hp61
b

2
z

hp6

D A

• Involves search and instead settle on a set of
candidate nodes obtained using a method analogous
to searching a binary search tree

3
r

hp6

D is the closest node

4
v

hp6

• Set of candidates = “closest” node in each quadrant
1. choose the candidate node that is closer to each of

its bordering axes than any other candidate node
which is on the same side of those axes

MECHANICS OF DELETION IN POINT QUADTREES

• Ideally, want to replace deleted
node (A) with a node (B) that
leaves an empty shaded region

hp61
b

2
z

hp6

D A

• Involves search and instead settle on a set of
candidate nodes obtained using a method analogous
to searching a binary search tree

3
r

hp6

D is the closest node

4
v

hp6

• Set of candidates = “closest” node in each quadrant
1. choose the candidate node that is closer to each of

its bordering axes than any other candidate node
which is on the same side of those axes

5
g

hp6

E
D

condition does not always hold

MECHANICS OF DELETION IN POINT QUADTREES

• Ideally, want to replace deleted
node (A) with a node (B) that
leaves an empty shaded region

hp61
b

2
z

hp6

D A

• Involves search and instead settle on a set of
candidate nodes obtained using a method analogous
to searching a binary search tree

3
r

hp6

D is the closest node

4
v

hp6

• Set of candidates = “closest” node in each quadrant
1. choose the candidate node that is closer to each of

its bordering axes than any other candidate node
which is on the same side of those axes

5
g

hp6

E
D

condition does not always hold

6
z

hp6

more than one node satisfies it

MECHANICS OF DELETION IN POINT QUADTREES

• Ideally, want to replace deleted
node (A) with a node (B) that
leaves an empty shaded region

hp61
b

2
z

hp6

D A

• Involves search and instead settle on a set of
candidate nodes obtained using a method analogous
to searching a binary search tree

3
r

hp6

D is the closest node

4
v

hp6

• Set of candidates = “closest” node in each quadrant
1. choose the candidate node that is closer to each of

its bordering axes than any other candidate node
which is on the same side of those axes

5
g

hp6

E
D

condition does not always hold

6
z

hp6

more than one node satisfies it

7
r

hp6

2. Use the L1 metric to break ties or deadlocks
• L1 metric is the sum of the displacements from the

bordering x and y axes
• rationale: area of shaded region is Lx•dy+Ly•dx-dx•dy

which can be approximated by 2dx•(Lx+Ly) assuming
dx=dy and dx being very small

MECHANICS OF DELETION IN POINT QUADTREES

• Ideally, want to replace deleted
node (A) with a node (B) that
leaves an empty shaded region

hp61
b

2
z

hp6

D A

• Involves search and instead settle on a set of
candidate nodes obtained using a method analogous
to searching a binary search tree

3
r

hp6

D is the closest node

4
v

hp6

• Set of candidates = “closest” node in each quadrant
1. choose the candidate node that is closer to each of

its bordering axes than any other candidate node
which is on the same side of those axes

5
g

hp6

E
D

condition does not always hold

6
z

hp6

more than one node satisfies it

7
r

hp6

2. Use the L1 metric to break ties or deadlocks
• L1 metric is the sum of the displacements from the

bordering x and y axes
• rationale: area of shaded region is Lx•dy+Ly•dx-dx•dy

which can be approximated by 2dx•(Lx+Ly) assuming
dx=dy and dx being very small

8
v

hp6

• at most one of the remaining candidates will be in
the shaded region

ALGORITHM FOR DELETION IN POINT QUADTREES

1 hp7
b

JK D

B E

H
F

• Select a node satisfying the “closest” criteria to the
deleted node A to serve as the new root (B in the NE
quadrant

ALGORITHM FOR DELETION IN POINT QUADTREES

1 hp7
b

JK D

B E

H
F

• Select a node satisfying the “closest” criteria to the
deleted node A to serve as the new root (B in the NE
quadrant

2
z

hp7

ALGORITHM FOR DELETION IN POINT QUADTREES

1 hp7
b

JK D

B E

H
F

• Select a node satisfying the “closest” criteria to the
deleted node A to serve as the new root (B in the NE
quadrant

2
z

hp7

3
v

hp7

1. no reinsertion in opposite quadrant (SW)

ALGORITHM FOR DELETION IN POINT QUADTREES

1 hp7
b

JK D

B E

H
F

• Select a node satisfying the “closest” criteria to the
deleted node A to serve as the new root (B in the NE
quadrant

2
z

hp7

3
v

hp7

1. no reinsertion in opposite quadrant (SW)

4
r

hp7

2. ADJ: apply to adjacent quadrants (NW,SE):
if root remains in the quadrant (J) then
• no reinsertion in 2 subquadrants (NW,NE)
• apply ADJ to remaining 2 subquadrants (SW,SE)
else reinsert entire quadrant

ALGORITHM FOR DELETION IN POINT QUADTREES

1 hp7
b

JK D

B E

H
F

• Select a node satisfying the “closest” criteria to the
deleted node A to serve as the new root (B in the NE
quadrant

2
z

hp7

3
v

hp7

1. no reinsertion in opposite quadrant (SW)

4
r

hp7

5
g

hp7

3. NEWROOT: apply to quadrant containing replacement
node (NE)
• same subquadrant (NE): no reinsertion
• adjacent subquadrants (NW,SE): ADJ
• opposite subquadrant (SW): NEWROOT

ALGORITHM FOR DELETION IN POINT QUADTREES

1 hp7
b

JK D

B E

H
F

• Select a node satisfying the “closest” criteria to the
deleted node A to serve as the new root (B in the NE
quadrant

2
z

hp7

3
v

hp7

1. no reinsertion in opposite quadrant (SW)

4
r

hp7

5
g

hp7

3. NEWROOT: apply to quadrant containing replacement
node (NE)
• same subquadrant (NE): no reinsertion
• adjacent subquadrants (NW,SE): ADJ
• opposite subquadrant (SW): NEWROOT

6
b

hp7

• Comparison with conventional reinsertion algorithm
1. 5/6 reduction in number of nodes requiring

reinsertion (2/3 if pick a candidate at random)
2. number of comparisons during reinsertion was

observed to be ~log4n vs. a much larger factor
3. (average total path length)/(optimal total path length)

was observed to be constant vs. an increase
Copyright © 1998 by Hanan Samet

hp8hp8hp8hp8hp8hp8hp8hp8hp8
MX QUADTREE (Hunter)

• Points are like BLACK pixels in a region quadtree
• Useful for raster to vector conversion
• Empty cells are merged to form larger empty cells
• Only good for discrete data
• Good for sparse matrix applications
• Assume that the point is associated with the lower

left corner of each cell
• Ex: assume an 8 x 8 array
divide coordinate values by 12.5

1
b

(0,8) (8,8)

(8,0)(0,0)

hp8hp8hp8hp8hp8hp8hp8hp8hp8
MX QUADTREE (Hunter)

left corner of each cell
• Ex: assume an 8 x 8 array
divide coordinate values by 12.5

1
b

(0,8) (8,8)

(8,0)(0,0)

hp8hp8hp8hp8hp8hp8hp8hp8hp82
r

(2,3)
Chicago

hp8hp8hp8hp8hp8hp8hp8hp8hp8
MX QUADTREE (Hunter)

left corner of each cell
• Ex: assume an 8 x 8 array
divide coordinate values by 12.5

1
b

(0,8) (8,8)

(8,0)(0,0)

hp8hp8hp8hp8hp8hp8hp8hp8hp82
r

(2,3)
Chicago

hp8hp8hp8hp8hp8hp8hp8hp8hp83
z

(4,0)
Mobile

hp8hp8hp8hp8hp8hp8hp8hp8hp8
MX QUADTREE (Hunter)

left corner of each cell
• Ex: assume an 8 x 8 array
divide coordinate values by 12.5

1
b

(0,8) (8,8)

(8,0)(0,0)

hp8hp8hp8hp8hp8hp8hp8hp8hp82
r

(2,3)
Chicago

hp8hp8hp8hp8hp8hp8hp8hp8hp83
z

(4,0)
Mobile

hp8hp8hp8hp8hp8hp8hp8hp8hp84
g

(4,6)
Toronto

hp8hp8hp8hp8hp8hp8hp8hp8hp8
MX QUADTREE (Hunter)

left corner of each cell
• Ex: assume an 8 x 8 array
divide coordinate values by 12.5

1
b

(0,8) (8,8)

(8,0)(0,0)

hp8hp8hp8hp8hp8hp8hp8hp8hp82
r

(2,3)
Chicago

hp8hp8hp8hp8hp8hp8hp8hp8hp83
z

(4,0)
Mobile

hp8hp8hp8hp8hp8hp8hp8hp8hp84
g

(4,6)
Toronto

hp8hp8hp8hp8hp8hp8hp8hp8hp85
v

(6,5)
Buffalo

hp8hp8hp8hp8hp8hp8hp8hp8hp8
MX QUADTREE (Hunter)

left corner of each cell
• Ex: assume an 8 x 8 array
divide coordinate values by 12.5

1
b

(0,8) (8,8)

(8,0)(0,0)

hp8hp8hp8hp8hp8hp8hp8hp8hp82
r

(2,3)
Chicago

hp8hp8hp8hp8hp8hp8hp8hp8hp83
z

(4,0)
Mobile

hp8hp8hp8hp8hp8hp8hp8hp8hp84
g

(4,6)
Toronto

hp8hp8hp8hp8hp8hp8hp8hp8hp85
v

(6,5)
Buffalo

hp8hp8hp8hp8hp8hp8hp8hp8hp86
z

(0,3)
Denver

hp8hp8hp8hp8hp8hp8hp8hp8hp8
MX QUADTREE (Hunter)

left corner of each cell
• Ex: assume an 8 x 8 array
divide coordinate values by 12.5

1
b

(0,8) (8,8)

(8,0)(0,0)

hp8hp8hp8hp8hp8hp8hp8hp8hp82
r

(2,3)
Chicago

hp8hp8hp8hp8hp8hp8hp8hp8hp83
z

(4,0)
Mobile

hp8hp8hp8hp8hp8hp8hp8hp8hp84
g

(4,6)
Toronto

hp8hp8hp8hp8hp8hp8hp8hp8hp85
v

(6,5)
Buffalo

hp8hp8hp8hp8hp8hp8hp8hp8hp86
z

(0,3)
Denver

hp8hp8hp8hp8hp8hp8hp8hp8hp87
g

(2,2)
Omaha

hp8hp8hp8hp8hp8hp8hp8hp8hp8
MX QUADTREE (Hunter)

left corner of each cell
• Ex: assume an 8 x 8 array
divide coordinate values by 12.5

1
b

(0,8) (8,8)

(8,0)(0,0)

hp8hp8hp8hp8hp8hp8hp8hp8hp82
r

(2,3)
Chicago

hp8hp8hp8hp8hp8hp8hp8hp8hp83
z

(4,0)
Mobile

hp8hp8hp8hp8hp8hp8hp8hp8hp84
g

(4,6)
Toronto

hp8hp8hp8hp8hp8hp8hp8hp8hp85
v

(6,5)
Buffalo

hp8hp8hp8hp8hp8hp8hp8hp8hp86
z

(0,3)
Denver

hp8hp8hp8hp8hp8hp8hp8hp8hp87
g

(2,2)
Omaha

hp8hp8hp8hp8hp8hp8hp8hp8hp88
r

(6,1)
Atlanta

hp8hp8hp8hp8hp8hp8hp8hp8hp8
MX QUADTREE (Hunter)

left corner of each cell
• Ex: assume an 8 x 8 array
divide coordinate values by 12.5

1
b

(0,8) (8,8)

(8,0)(0,0)

hp8hp8hp8hp8hp8hp8hp8hp8hp82
r

(2,3)
Chicago

hp8hp8hp8hp8hp8hp8hp8hp8hp83
z

(4,0)
Mobile

hp8hp8hp8hp8hp8hp8hp8hp8hp84
g

(4,6)
Toronto

hp8hp8hp8hp8hp8hp8hp8hp8hp85
v

(6,5)
Buffalo

hp8hp8hp8hp8hp8hp8hp8hp8hp86
z

(0,3)
Denver

hp8hp8hp8hp8hp8hp8hp8hp8hp87
g

(2,2)
Omaha

hp8hp8hp8hp8hp8hp8hp8hp8hp88
r

(6,1)
Atlanta

hp8hp8hp8hp8hp8hp8hp8hp8hp8

(7,0)
Miami

9
v

PR QUADTREE (Orenstein)
1 hp9
b

Regular decomposition point representation

Decomposition occurs whenever a block contains more
than one point
Useful when the domain of data points is not discrete
but finite

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Maximum level of decomposition depends on the
minimum separation between two points

• if two points are very close, then decomposition can be
very deep

• can be overcome by viewing blocks as buckets with
capacity c and only decomposing the block when it
contains more than c points

Ex: c = 1

PR QUADTREE (Orenstein)
1 hp9
b

Regular decomposition point representation

Decomposition occurs whenever a block contains more
than one point
Useful when the domain of data points is not discrete
but finite

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Maximum level of decomposition depends on the
minimum separation between two points

• if two points are very close, then decomposition can be
very deep

• can be overcome by viewing blocks as buckets with
capacity c and only decomposing the block when it
contains more than c points

Ex: c = 1

2
r

hp9

(52,10)
Mobile

PR QUADTREE (Orenstein)
1 hp9
b

Regular decomposition point representation

Decomposition occurs whenever a block contains more
than one point
Useful when the domain of data points is not discrete
but finite

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Maximum level of decomposition depends on the
minimum separation between two points

• if two points are very close, then decomposition can be
very deep

• can be overcome by viewing blocks as buckets with
capacity c and only decomposing the block when it
contains more than c points

Ex: c = 1

2
r

hp9

(52,10)
Mobile

3
z

hp9

(62,77)
Toronto

PR QUADTREE (Orenstein)
1 hp9
b

Regular decomposition point representation

Decomposition occurs whenever a block contains more
than one point
Useful when the domain of data points is not discrete
but finite

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Maximum level of decomposition depends on the
minimum separation between two points

• if two points are very close, then decomposition can be
very deep

• can be overcome by viewing blocks as buckets with
capacity c and only decomposing the block when it
contains more than c points

Ex: c = 1

2
r

hp9

(52,10)
Mobile

3
z

hp9

(62,77)
Toronto

4
g

hp9

(82,65)
Buffalo

PR QUADTREE (Orenstein)
1 hp9
b

Regular decomposition point representation

Decomposition occurs whenever a block contains more
than one point
Useful when the domain of data points is not discrete
but finite

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Maximum level of decomposition depends on the
minimum separation between two points

• if two points are very close, then decomposition can be
very deep

• can be overcome by viewing blocks as buckets with
capacity c and only decomposing the block when it
contains more than c points

Ex: c = 1

2
r

hp9

(52,10)
Mobile

3
z

hp9

(62,77)
Toronto

4
g

hp9

(82,65)
Buffalo

5
v

hp9

(5,45)
Denver

PR QUADTREE (Orenstein)
1 hp9
b

Regular decomposition point representation

Decomposition occurs whenever a block contains more
than one point
Useful when the domain of data points is not discrete
but finite

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Maximum level of decomposition depends on the
minimum separation between two points

• if two points are very close, then decomposition can be
very deep

• can be overcome by viewing blocks as buckets with
capacity c and only decomposing the block when it
contains more than c points

Ex: c = 1

2
r

hp9

(52,10)
Mobile

3
z

hp9

(62,77)
Toronto

4
g

hp9

(82,65)
Buffalo

5
v

hp9

(5,45)
Denver

6
g

hp9

(27,35)
Omaha

PR QUADTREE (Orenstein)
1 hp9
b

Regular decomposition point representation

Decomposition occurs whenever a block contains more
than one point
Useful when the domain of data points is not discrete
but finite

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Maximum level of decomposition depends on the
minimum separation between two points

• if two points are very close, then decomposition can be
very deep

• can be overcome by viewing blocks as buckets with
capacity c and only decomposing the block when it
contains more than c points

Ex: c = 1

2
r

hp9

(52,10)
Mobile

3
z

hp9

(62,77)
Toronto

4
g

hp9

(82,65)
Buffalo

5
v

hp9

(5,45)
Denver

6
g

hp9

(27,35)
Omaha

7
z

hp9

(85,15)
Atlanta

PR QUADTREE (Orenstein)
1 hp9
b

Regular decomposition point representation

Decomposition occurs whenever a block contains more
than one point
Useful when the domain of data points is not discrete
but finite

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Maximum level of decomposition depends on the
minimum separation between two points

• if two points are very close, then decomposition can be
very deep

• can be overcome by viewing blocks as buckets with
capacity c and only decomposing the block when it
contains more than c points

Ex: c = 1

2
r

hp9

(52,10)
Mobile

3
z

hp9

(62,77)
Toronto

4
g

hp9

(82,65)
Buffalo

5
v

hp9

(5,45)
Denver

6
g

hp9

(27,35)
Omaha

7
z

hp9

(85,15)
Atlanta

8
r

hp9

(90,5)
Miami

REGION SEARCH
1 hp10
b

Use of quadtree results in pruning the search space

Ex: Find all points within radius r of point A

REGION SEARCH
1 hp10
b

Use of quadtree results in pruning the search space

Ex: Find all points within radius r of point A

hp10

If a quadrant subdivision point p lies in a region l, then
search the quadrants of p specified by l

1. SE 6. NE 11. All but SW
2. SE, SW 7. NE, NW 12. All but SE
3. SW 8. NW 13. All
4. SE, NE 9. All but NW
5. SW, NW 10. All but NE

1 2 3
9 10

1211

4
5

876

2
r

REGION SEARCH
1 hp10
b

Use of quadtree results in pruning the search space

Ex: Find all points within radius r of point A

hp10

If a quadrant subdivision point p lies in a region l, then
search the quadrants of p specified by l

1. SE 6. NE 11. All but SW
2. SE, SW 7. NE, NW 12. All but SE
3. SW 8. NW 13. All
4. SE, NE 9. All but NW
5. SW, NW 10. All but NE

1 2 3
9 10

1211

4
5

876

2
r

hp103
z

hp11
FINDING THE NEAREST OBJECT

• Ex: find the nearest object to P

1
b

12 8 7 6

13 9 1 4 5

2 3

10 11

E C

• Assume PR quadtree for points (i.e., at most one point
per block)
• Search neighbors of block 1 in counterclockwise order
• Points are sorted with respect to the space they occupy
which enables pruning the search space
• Algorithm:

hp11
FINDING THE NEAREST OBJECT

• Ex: find the nearest object to P

1
b

12 8 7 6

13 9 1 4 5

2 3

10 11

E C

hp112
r

1. start at block 2 and compute distance to P from A

hp11
FINDING THE NEAREST OBJECT

• Ex: find the nearest object to P

1
b

12 8 7 6

13 9 1 4 5

2 3

10 11

E C

hp112
r

1. start at block 2 and compute distance to P from A

hp113
z

2. ignore block 3 whether or not it is empty as A is closer
to P than any point in 3

hp11
FINDING THE NEAREST OBJECT

• Ex: find the nearest object to P

1
b

12 8 7 6

13 9 1 4 5

2 3

10 11

E C

hp112
r

1. start at block 2 and compute distance to P from A

hp113
z

2. ignore block 3 whether or not it is empty as A is closer
to P than any point in 3

hp114
g

3. examine block 4 as distance to SW corner is shorter
than the distance from P to A; however, reject B as it is
further from P than A

hp11
FINDING THE NEAREST OBJECT

• Ex: find the nearest object to P

1
b

12 8 7 6

13 9 1 4 5

2 3

10 11

E C

hp112
r

1. start at block 2 and compute distance to P from A

hp113
z

2. ignore block 3 whether or not it is empty as A is closer
to P than any point in 3

hp114
g

3. examine block 4 as distance to SW corner is shorter
than the distance from P to A; however, reject B as it is
further from P than A

hp115
v

4. ignore blocks 6, 7, 8, 9, and 10 as the minimum
distance to them from P is greater than the distance
from P to A

hp11
FINDING THE NEAREST OBJECT

• Ex: find the nearest object to P

1
b

12 8 7 6

13 9 1 4 5

2 3

10 11

E C

hp112
r

1. start at block 2 and compute distance to P from A

hp113
z

2. ignore block 3 whether or not it is empty as A is closer
to P than any point in 3

hp114
g

3. examine block 4 as distance to SW corner is shorter
than the distance from P to A; however, reject B as it is
further from P than A

hp115
v

4. ignore blocks 6, 7, 8, 9, and 10 as the minimum
distance to them from P is greater than the distance
from P to A

hp116
z

5. examine block 11 as the distance from P to the southern
border of 1 is shorter than the distance from P to A;
however, reject F as it is further from P than A

hp11
FINDING THE NEAREST OBJECT

• Ex: find the nearest object to P

1
b

12 8 7 6

13 9 1 4 5

2 3

10 11

E C

hp112
r

1. start at block 2 and compute distance to P from A

hp113
z

2. ignore block 3 whether or not it is empty as A is closer
to P than any point in 3

hp114
g

3. examine block 4 as distance to SW corner is shorter
than the distance from P to A; however, reject B as it is
further from P than A

hp115
v

4. ignore blocks 6, 7, 8, 9, and 10 as the minimum
distance to them from P is greater than the distance
from P to A

hp116
z

5. examine block 11 as the distance from P to the southern
border of 1 is shorter than the distance from P to A;
however, reject F as it is further from P than A

hp117
r

• If F was moved, a better order would have started with
block 11, the southern neighbor of 1, as it is closest

new F

hp12
COMPARISON OF POINT, MX, AND PR QUADTREES

FEATURE MX PR POINT

Regular
decomposition

Yes Yes No

Type of nodes Data Stored in
leaf nodes and
non-leaf nodes
are for control

Data stored in leaf
nodes and non-
leaf nodes are for
control

Data stored in
leaf nodes and
non-leaf nodes

Shape of tree
depends on order
of inserting nodes

No No Yes

Deletion Simple but may
have to collapse
WHITE nodes

Simple but may
have to collapse
WHITE nodes

Complex

Size of space
represented

Finite Finite Unbounded

Type of data
represented

Discrete Continuous Continuous

Shape of space
represented

Square Rectangle Unbounded

Stores
coordinates

No Yes Yes

Depth of tree (d);
assume M points

All nodes are at
the same depth,
n for a 2n by 2n
region

For square region
with side length L
and minimum
separation S
between two
points,
log4(M-1) ≤ d
and
d≤log2((L/S)2.5)

log4(3M) ≤ d
and d ≤ M-1

APPLICATION OF THE MX QUADTREE (Hunter)

• Represent the boundary as a sequence of BLACK
pixels in a region quadtree

• Useful for a simple digitized polygon (i.e., non-
intersecting edges)

• Three types of nodes

1. interior – treat like WHITE nodes

2. exterior – treat like WHITE nodes

3. boundary – the edge of the polygon passes
through them and treated like BLACK nodes

• Disadvantages

1. a thickness is associated with the line segments

2. no more than 4 lines can meet at a point

hp131
b

APPLICATION OF THE MX QUADTREE (Hunter)

• Represent the boundary as a sequence of BLACK
pixels in a region quadtree

• Useful for a simple digitized polygon (i.e., non-
intersecting edges)

• Three types of nodes

1. interior – treat like WHITE nodes

2. exterior – treat like WHITE nodes

3. boundary – the edge of the polygon passes
through them and treated like BLACK nodes

• Disadvantages

1. a thickness is associated with the line segments

2. no more than 4 lines can meet at a point

hp131
b

hp132
r

hp14
MX-CIF QUADTREE (Kedem)

1
b

Collections of small rectangles for VLSI applications

Each rectangle is associated with its minimum
enclosing quadtree block
Like hashing: quadtree blocks serve as hash buckets

4
5

10 11

D {11}

F {12}E {3,4,5}

B {1}

A {2,6,7,8,9,10}

C {}

B
C

hp14
MX-CIF QUADTREE (Kedem)

1
b

Collections of small rectangles for VLSI applications

Each rectangle is associated with its minimum
enclosing quadtree block
Like hashing: quadtree blocks serve as hash buckets

4
5

10 11

D {11}

F {12}E {3,4,5}

B {1}

A {2,6,7,8,9,10}

C {}

B
C

hp142
r

Collision = more than one rectangle in a block
resolve by using two one-dimensional MX-CIF trees to
store the rectangle intersecting the lines passing
through each subdivision point

hp14
MX-CIF QUADTREE (Kedem)

1
b

Collections of small rectangles for VLSI applications

Each rectangle is associated with its minimum
enclosing quadtree block
Like hashing: quadtree blocks serve as hash buckets

4
5

10 11

D {11}

F {12}E {3,4,5}

B {1}

A {2,6,7,8,9,10}

C {}

B
C

hp142
r

Collision = more than one rectangle in a block
resolve by using two one-dimensional MX-CIF trees to
store the rectangle intersecting the lines passing
through each subdivision point

hp14

one for y-axis

Binary tree for y-
axis through A

Y2
10
Y4

6
Y7

3
g

hp14
MX-CIF QUADTREE (Kedem)

1
b

Collections of small rectangles for VLSI applications

Each rectangle is associated with its minimum
enclosing quadtree block
Like hashing: quadtree blocks serve as hash buckets

4
5

10 11

D {11}

F {12}E {3,4,5}

B {1}

A {2,6,7,8,9,10}

C {}

B
C

hp142
r

Collision = more than one rectangle in a block
resolve by using two one-dimensional MX-CIF trees to
store the rectangle intersecting the lines passing
through each subdivision point

hp14

one for y-axis

Binary tree for y-
axis through A

Y2
10
Y4

6
Y7

3
g

hp14

if a rectangle intersects both x and y axes, then
associate it with the y axis

4
v

hp14
MX-CIF QUADTREE (Kedem)

1
b

Collections of small rectangles for VLSI applications

Each rectangle is associated with its minimum
enclosing quadtree block
Like hashing: quadtree blocks serve as hash buckets

4
5

10 11

D {11}

F {12}E {3,4,5}

B {1}

A {2,6,7,8,9,10}

C {}

B
C

hp142
r

Collision = more than one rectangle in a block
resolve by using two one-dimensional MX-CIF trees to
store the rectangle intersecting the lines passing
through each subdivision point

hp14

one for y-axis

Binary tree for y-
axis through A

Y2
10
Y4

6
Y7

3
g

hp14

if a rectangle intersects both x and y axes, then
associate it with the y axis

4
v

hp145
z

one for x-axis

Binary tree for x-
axis through A

X3
9

K-D TREE (Bentley)

• Test one attribute at a time instead of all simultaneously
as in the point quadtree

• Usually cycle through all the attributes

• Shape of the tree depends on the order in which the
data is encountered

1 hp15
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Chicago

K-D TREE (Bentley)

• Test one attribute at a time instead of all simultaneously
as in the point quadtree

• Usually cycle through all the attributes

• Shape of the tree depends on the order in which the
data is encountered

1 hp15
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Chicago

2
r

hp15

(52,10)
Mobile

Mobile

x test

K-D TREE (Bentley)

• Test one attribute at a time instead of all simultaneously
as in the point quadtree

• Usually cycle through all the attributes

• Shape of the tree depends on the order in which the
data is encountered

1 hp15
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Chicago

2
r

hp15

(52,10)
Mobile

Mobile

x test

3
z

hp15

(62,77)
Toronto

Toronto

y test

K-D TREE (Bentley)

• Test one attribute at a time instead of all simultaneously
as in the point quadtree

• Usually cycle through all the attributes

• Shape of the tree depends on the order in which the
data is encountered

1 hp15
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Chicago

2
r

hp15

(52,10)
Mobile

Mobile

x test

3
z

hp15

(62,77)
Toronto

Toronto

y test

4
g

hp15

(82,65)
Buffalo

Buffalo

x test

K-D TREE (Bentley)

• Test one attribute at a time instead of all simultaneously
as in the point quadtree

• Usually cycle through all the attributes

• Shape of the tree depends on the order in which the
data is encountered

1 hp15
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Chicago

2
r

hp15

(52,10)
Mobile

Mobile

x test

3
z

hp15

(62,77)
Toronto

Toronto

y test

4
g

hp15

(82,65)
Buffalo

Buffalo

x test

5
v

hp15

(5,45)
Denver

Denver

K-D TREE (Bentley)

• Test one attribute at a time instead of all simultaneously
as in the point quadtree

• Usually cycle through all the attributes

• Shape of the tree depends on the order in which the
data is encountered

1 hp15
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Chicago

2
r

hp15

(52,10)
Mobile

Mobile

x test

3
z

hp15

(62,77)
Toronto

Toronto

y test

4
g

hp15

(82,65)
Buffalo

Buffalo

x test

5
v

hp15

(5,45)
Denver

Denver

6
g

hp15

(27,35)
Omaha

Omaha

K-D TREE (Bentley)

• Test one attribute at a time instead of all simultaneously
as in the point quadtree

• Usually cycle through all the attributes

• Shape of the tree depends on the order in which the
data is encountered

1 hp15
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Chicago

2
r

hp15

(52,10)
Mobile

Mobile

x test

3
z

hp15

(62,77)
Toronto

Toronto

y test

4
g

hp15

(82,65)
Buffalo

Buffalo

x test

5
v

hp15

(5,45)
Denver

Denver

6
g

hp15

(27,35)
Omaha

Omaha

7
r

hp15

(85,15)
Atlanta

Atlanta

y test

K-D TREE (Bentley)

• Test one attribute at a time instead of all simultaneously
as in the point quadtree

• Usually cycle through all the attributes

• Shape of the tree depends on the order in which the
data is encountered

1 hp15
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

Chicago

2
r

hp15

(52,10)
Mobile

Mobile

x test

3
z

hp15

(62,77)
Toronto

Toronto

y test

4
g

hp15

(82,65)
Buffalo

Buffalo

x test

5
v

hp15

(5,45)
Denver

Denver

6
g

hp15

(27,35)
Omaha

Omaha

7
r

hp15

(85,15)
Atlanta

Atlanta

y test

8
v

hp15

(90,5)
Miami

Miami

hp16

K-D TREE DELETION

• Similar to deletion in binary search trees

• Assume branch to HISON if test value is ≥ root value

• Assume root discriminates on the x coordinate and
subsequent alternation with the y coordinate value

• Algorithm:

1. replace root by node in HISON with the minimum x
coordinate value

2. repeat the process for the position of the replacement
node using the x or y coordinate values depending on
whether the replacement node is an x or a y
discriminator

EXAMPLE OF K-D TREE DELETION

Ex:

hp171
b

J (28,45)

A (20,20)

B (10,30) D (40,50)

C (10,20) E (30,40) F (35,60)

G (25,30)

H (29,40)

I (27,35)

EXAMPLE OF K-D TREE DELETION

Ex:

hp171
b

J (28,45)

A (20,20)

B (10,30) D (40,50)

C (10,20) E (30,40) F (35,60)

G (25,30)

H (29,40)

I (27,35)

2
r

hp17

min x

EXAMPLE OF K-D TREE DELETION

Ex:

hp171
b

J (28,45)

A (20,20)

B (10,30) D (40,50)

C (10,20) E (30,40) F (35,60)

G (25,30)

H (29,40)

I (27,35)

2
r

hp17

min x

3
z

hp17

min y

EXAMPLE OF K-D TREE DELETION

Ex:

hp171
b

J (28,45)

A (20,20)

B (10,30) D (40,50)

C (10,20) E (30,40) F (35,60)

G (25,30)

H (29,40)

I (27,35)

2
r

hp17

min x

3
z

hp17

min y

4
g

hp17

min y

G (25,30)

B (10,30) D (40,50)

C (10,20) E (30,40) F (35,60)

I (27,35)

H (29,40)

J (28,45)

EXAMPLE OF K-D TREE DELETION

Ex:

hp171
b

J (28,45)

A (20,20)

B (10,30) D (40,50)

C (10,20) E (30,40) F (35,60)

G (25,30)

H (29,40)

I (27,35)

2
r

hp17

min x

3
z

hp17

min y

4
g

hp17

min y

G (25,30)

B (10,30) D (40,50)

C (10,20) E (30,40) F (35,60)

I (27,35)

H (29,40)

J (28,45)

5
v

hp17

• Analogy with binary search trees implies that we can
replace the root by the node in LOSON with the max x
coordinate value. However, if there is more than one
node with the same max value, then after replacement
there would be a node in LOSON which is not strictly less
than the new root (e.g., nodes B and C)

EXAMPLE OF K-D TREE DELETION

Ex:

hp171
b

J (28,45)

A (20,20)

B (10,30) D (40,50)

C (10,20) E (30,40) F (35,60)

G (25,30)

H (29,40)

I (27,35)

2
r

hp17

min x

3
z

hp17

min y

4
g

hp17

min y

G (25,30)

B (10,30) D (40,50)

C (10,20) E (30,40) F (35,60)

I (27,35)

H (29,40)

J (28,45)

5
v

hp17

6
r

hp17

• What if the HISON of the root is empty?

EXAMPLE OF K-D TREE DELETION

Ex:

hp171
b

J (28,45)

A (20,20)

B (10,30) D (40,50)

C (10,20) E (30,40) F (35,60)

G (25,30)

H (29,40)

I (27,35)

2
r

hp17

min x

3
z

hp17

min y

4
g

hp17

min y

G (25,30)

B (10,30) D (40,50)

C (10,20) E (30,40) F (35,60)

I (27,35)

H (29,40)

J (28,45)

5
v

hp17

6
r

hp17

• What if the HISON of the root is empty?

7
z

hp17

1. replace root node A with the node B in LOSON having
the minimum x coordinate value and set the HISON
pointer of A to be the old LOSON pointer of A

2. repeat process for the position of the replacement node

T’
B

B is the node with
the minimum x
coordinate value
and replaces A

K-D TREE SEARCH

• Search space can be pruned by testing if the search
region is completely contained in one of the partitions of
the node as now only one subtree must be examined

hp181
b

(0,100) (100,100)

(100,0)(0,0)

(5,45)
Denver (35,42)

Chicago

(27,35)
Omaha

(52,10)
Mobile

(62,77)
Toronto (82,65)

Buffalo

(85,15)
Atlanta

(90,5)
Miami

K-D TREE SEARCH

• Search space can be pruned by testing if the search
region is completely contained in one of the partitions of
the node as now only one subtree must be examined

hp181
b

(0,100) (100,100)

(100,0)(0,0)

(5,45)
Denver (35,42)

Chicago

(27,35)
Omaha

(52,10)
Mobile

(62,77)
Toronto (82,65)

Buffalo

(85,15)
Atlanta

(90,5)
Miami

2
r

hp18

• Ex: find all points within 10 of (20,30)

(20,30)

K-D TREE SEARCH

• Search space can be pruned by testing if the search
region is completely contained in one of the partitions of
the node as now only one subtree must be examined

hp181
b

(0,100) (100,100)

(100,0)(0,0)

(5,45)
Denver (35,42)

Chicago

(27,35)
Omaha

(52,10)
Mobile

(62,77)
Toronto (82,65)

Buffalo

(85,15)
Atlanta

(90,5)
Miami

2
r

hp18

• Ex: find all points within 10 of (20,30)

(20,30)

3
z

hp18

1. search the region entirely to the left of Chicago (x=35)

K-D TREE SEARCH

• Search space can be pruned by testing if the search
region is completely contained in one of the partitions of
the node as now only one subtree must be examined

hp181
b

(0,100) (100,100)

(100,0)(0,0)

(5,45)
Denver (35,42)

Chicago

(27,35)
Omaha

(52,10)
Mobile

(62,77)
Toronto (82,65)

Buffalo

(85,15)
Atlanta

(90,5)
Miami

2
r

hp18

• Ex: find all points within 10 of (20,30)

(20,30)

3
z

hp18

1. search the region entirely to the left of Chicago (x=35)

4
g

hp18

2. search the region entirely below Denver (y=45)

K-D TREE SEARCH

• Search space can be pruned by testing if the search
region is completely contained in one of the partitions of
the node as now only one subtree must be examined

hp181
b

(0,100) (100,100)

(100,0)(0,0)

(5,45)
Denver (35,42)

Chicago

(27,35)
Omaha

(52,10)
Mobile

(62,77)
Toronto (82,65)

Buffalo

(85,15)
Atlanta

(90,5)
Miami

2
r

hp18

• Ex: find all points within 10 of (20,30)

(20,30)

3
z

hp18

1. search the region entirely to the left of Chicago (x=35)

4
g

hp18

2. search the region entirely below Denver (y=45)

5
v

hp18

3. search yields Omaha

PR K-D TREE (Knowlton)

• A region contains at most one data point

• Analogous to EXCELL with bucket size of 1

1 hp19
b

(0,100) (100,100)

(100,0)(0,0)

PR K-D TREE (Knowlton)

• A region contains at most one data point

• Analogous to EXCELL with bucket size of 1

1 hp19
b

(0,100) (100,100)

(100,0)(0,0)

2
r

hp19

(35,42)
Chicago

Chicago

PR K-D TREE (Knowlton)

• A region contains at most one data point

• Analogous to EXCELL with bucket size of 1

1 hp19
b

(0,100) (100,100)

(100,0)(0,0)

2
r

hp19

(35,42)
Chicago

Chicago

3
z

hp19

(52,10)
Mobile

MobileChicago

PR K-D TREE (Knowlton)

• A region contains at most one data point

• Analogous to EXCELL with bucket size of 1

1 hp19
b

(0,100) (100,100)

(100,0)(0,0)

2
r

hp19

(35,42)
Chicago

Chicago

3
z

hp19

(52,10)
Mobile

MobileChicago

4
g

hp19

(62,77)
Toronto

Mobile Toronto

PR K-D TREE (Knowlton)

• A region contains at most one data point

• Analogous to EXCELL with bucket size of 1

1 hp19
b

(0,100) (100,100)

(100,0)(0,0)

2
r

hp19

(35,42)
Chicago

Chicago

3
z

hp19

(52,10)
Mobile

MobileChicago

4
g

hp19

(62,77)
Toronto

Mobile Toronto

5
v

hp19

(82,65)
Buffalo

BuffaloToronto

PR K-D TREE (Knowlton)

• A region contains at most one data point

• Analogous to EXCELL with bucket size of 1

1 hp19
b

(0,100) (100,100)

(100,0)(0,0)

2
r

hp19

(35,42)
Chicago

Chicago

3
z

hp19

(52,10)
Mobile

MobileChicago

4
g

hp19

(62,77)
Toronto

Mobile Toronto

5
v

hp19

(82,65)
Buffalo

BuffaloToronto

6
z

hp19

(5,45)
Denver

Chicago
Denver

PR K-D TREE (Knowlton)

• A region contains at most one data point

• Analogous to EXCELL with bucket size of 1

1 hp19
b

(0,100) (100,100)

(100,0)(0,0)

2
r

hp19

(35,42)
Chicago

Chicago

3
z

hp19

(52,10)
Mobile

MobileChicago

4
g

hp19

(62,77)
Toronto

Mobile Toronto

5
v

hp19

(82,65)
Buffalo

BuffaloToronto

6
z

hp19

(5,45)
Denver

Chicago
Denver

7
g

hp19

(27,35)
Omaha

ChicagoOmaha

PR K-D TREE (Knowlton)

• A region contains at most one data point

• Analogous to EXCELL with bucket size of 1

1 hp19
b

(0,100) (100,100)

(100,0)(0,0)

2
r

hp19

(35,42)
Chicago

Chicago

3
z

hp19

(52,10)
Mobile

MobileChicago

4
g

hp19

(62,77)
Toronto

Mobile Toronto

5
v

hp19

(82,65)
Buffalo

BuffaloToronto

6
z

hp19

(5,45)
Denver

Chicago
Denver

7
g

hp19

(27,35)
Omaha

ChicagoOmaha

8
r

hp19

(85,15)
Atlanta

Mobile
Atlanta

PR K-D TREE (Knowlton)

• A region contains at most one data point

• Analogous to EXCELL with bucket size of 1

1 hp19
b

(0,100) (100,100)

(100,0)(0,0)

2
r

hp19

(35,42)
Chicago

Chicago

3
z

hp19

(52,10)
Mobile

MobileChicago

4
g

hp19

(62,77)
Toronto

Mobile Toronto

5
v

hp19

(82,65)
Buffalo

BuffaloToronto

6
z

hp19

(5,45)
Denver

Chicago
Denver

7
g

hp19

(27,35)
Omaha

ChicagoOmaha

8
r

hp19

(85,15)
Atlanta

Mobile
Atlanta

(90,5)
Miami

9
v

hp19

MiamiAtlanta

ADAPTIVE K-D TREE

• Data is only stored in terminal nodes
• An interior node contains the median of the set as the
discriminator
• The discriminator key is the one for which the spread of
the values of the key is a maximum

hp201
b

(0,100) (100,100)

(100,0)(0,0)

(5,45)
Denver (35,42)

Chicago

(27,35)
Omaha

(52,10)
Mobile

(62,77)
Toronto

(82,65)
Buffalo

(85,15)
Atlanta

(90,5)
Miami

ADAPTIVE K-D TREE

hp201
b

(0,100) (100,100)

(100,0)(0,0)

(5,45)
Denver (35,42)

Chicago

(27,35)
Omaha

(52,10)
Mobile

(62,77)
Toronto

(82,65)
Buffalo

(85,15)
Atlanta

(90,5)
Miami

2
v

hp20

57,X

ADAPTIVE K-D TREE

hp201
b

(0,100) (100,100)

(100,0)(0,0)

(5,45)
Denver (35,42)

Chicago

(27,35)
Omaha

(52,10)
Mobile

(62,77)
Toronto

(82,65)
Buffalo

(85,15)
Atlanta

(90,5)
Miami

2
v

hp20

57,X

3
r

hp20

31,X

ADAPTIVE K-D TREE

hp201
b

(0,100) (100,100)

(100,0)(0,0)

(5,45)
Denver (35,42)

Chicago

(27,35)
Omaha

(52,10)
Mobile

(62,77)
Toronto

(82,65)
Buffalo

(85,15)
Atlanta

(90,5)
Miami

2
v

hp20

57,X

3
r

hp20

31,X

4
z

hp20

16,X

OmahaDenver

ADAPTIVE K-D TREE

hp201
b

(0,100) (100,100)

(100,0)(0,0)

(5,45)
Denver (35,42)

Chicago

(27,35)
Omaha

(52,10)
Mobile

(62,77)
Toronto

(82,65)
Buffalo

(85,15)
Atlanta

(90,5)
Miami

2
v

hp20

57,X

3
r

hp20

31,X

4
z

hp20

16,X

OmahaDenver

5
g

hp20

26,Y

ChicagoMobile

ADAPTIVE K-D TREE

hp201
b

(0,100) (100,100)

(100,0)(0,0)

(5,45)
Denver (35,42)

Chicago

(27,35)
Omaha

(52,10)
Mobile

(62,77)
Toronto

(82,65)
Buffalo

(85,15)
Atlanta

(90,5)
Miami

2
v

hp20

57,X

3
r

hp20

31,X

4
z

hp20

16,X

OmahaDenver

5
g

hp20

26,Y

ChicagoMobile

6
g

hp20

40,Y

ADAPTIVE K-D TREE

hp201
b

(0,100) (100,100)

(100,0)(0,0)

(5,45)
Denver (35,42)

Chicago

(27,35)
Omaha

(52,10)
Mobile

(62,77)
Toronto

(82,65)
Buffalo

(85,15)
Atlanta

(90,5)
Miami

2
v

hp20

57,X

3
r

hp20

31,X

4
z

hp20

16,X

OmahaDenver

5
g

hp20

26,Y

ChicagoMobile

6
g

hp20

40,Y

AtlantaMiami

7
v

hp20

10,Y10,Y

ADAPTIVE K-D TREE

hp201
b

(0,100) (100,100)

(100,0)(0,0)

(5,45)
Denver (35,42)

Chicago

(27,35)
Omaha

(52,10)
Mobile

(62,77)
Toronto

(82,65)
Buffalo

(85,15)
Atlanta

(90,5)
Miami

2
v

hp20

57,X

3
r

hp20

31,X

4
z

hp20

16,X

OmahaDenver

5
g

hp20

26,Y

ChicagoMobile

6
g

hp20

40,Y

AtlantaMiami

7
v

hp20

10,Y10,Y

8
r

hp20

72,X

BuffaloToronto

hp21
GRID FILE (Nievergelt,Hinterberger,Sevcik)

• Two level grid for storing points

• Uses a grid directory (a 2d array of grid blocks) on disk
that contains the address of the bucket (i.e., page) that
contains the data associated with the grid block

• Linear scales (a pair of 1d arrays) in core that access the
grid block in the grid directory (on disk) that is associated
with a particular point thereby enabling the
decomposition of the space to be arbitrary

• Guarantees access to any record with two disk
operations — one for each level of the grid

1. access the grid block

2. access the bucket

• Each bucket has finite capacity

• Partition upon overflow

1. bucket partition — overflowing bucket is associated
with more than one grid block

2. grid partition — overflowing bucket is associated with
just one grid block

• Splitting policies

1. split at midpoint and cycle through attributes

2. adaptive

• increases granularity of frequently queried attributes

• favors some attributes over others

GRID FILE EXAMPLE

• Assume bucket size = 2

hp221
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

(52,10)
Mobile

Linear scales

1. Initially Chicago and Mobile in bucket A

x:
0 100

y:
0 100

GRID FILE EXAMPLE

• Assume bucket size = 2

hp221
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

(52,10)
Mobile

Linear scales

1. Initially Chicago and Mobile in bucket A

x:
0 100

y:
0 100

2
z

hp22

(62,77)
Toronto

2. Insert Toronto causing a grid partition yielding bucket B

x=45

GRID FILE EXAMPLE

• Assume bucket size = 2

hp221
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

(52,10)
Mobile

Linear scales

1. Initially Chicago and Mobile in bucket A

x:
0 100

y:
0 100

2
z

hp22

(62,77)
Toronto

2. Insert Toronto causing a grid partition yielding bucket B

x=45

3
r

hp22

(82,65)
Buffalo

3. Insert Buffalo causing a grid partition yielding bucket C

y=70

GRID FILE EXAMPLE

• Assume bucket size = 2

hp221
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

(52,10)
Mobile

Linear scales

1. Initially Chicago and Mobile in bucket A

x:
0 100

y:
0 100

2
z

hp22

(62,77)
Toronto

2. Insert Toronto causing a grid partition yielding bucket B

x=45

3
r

hp22

(82,65)
Buffalo

3. Insert Buffalo causing a grid partition yielding bucket C

y=70

4
v

hp22

(5,45)
Denver

4. Insert Denver causing no change

GRID FILE EXAMPLE

• Assume bucket size = 2

hp221
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

(52,10)
Mobile

Linear scales

1. Initially Chicago and Mobile in bucket A

x:
0 100

y:
0 100

2
z

hp22

(62,77)
Toronto

2. Insert Toronto causing a grid partition yielding bucket B

x=45

3
r

hp22

(82,65)
Buffalo

3. Insert Buffalo causing a grid partition yielding bucket C

y=70

4
v

hp22

(5,45)
Denver

4. Insert Denver causing no change

5
g

hp22

(27,35)
Omaha

5. Insert Omaha causing a grid partition yielding bucket D

y=43

GRID FILE EXAMPLE

• Assume bucket size = 2

hp221
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

(52,10)
Mobile

Linear scales

1. Initially Chicago and Mobile in bucket A

x:
0 100

y:
0 100

2
z

hp22

(62,77)
Toronto

2. Insert Toronto causing a grid partition yielding bucket B

x=45

3
r

hp22

(82,65)
Buffalo

3. Insert Buffalo causing a grid partition yielding bucket C

y=70

4
v

hp22

(5,45)
Denver

4. Insert Denver causing no change

5
g

hp22

(27,35)
Omaha

5. Insert Omaha causing a grid partition yielding bucket D

y=43

6
r

hp22

(85,15)
Atlanta

6. Insert Atlanta causing a bucket partition yielding bucket E

GRID FILE EXAMPLE

• Assume bucket size = 2

hp221
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

(52,10)
Mobile

Linear scales

1. Initially Chicago and Mobile in bucket A

x:
0 100

y:
0 100

2
z

hp22

(62,77)
Toronto

2. Insert Toronto causing a grid partition yielding bucket B

x=45

3
r

hp22

(82,65)
Buffalo

3. Insert Buffalo causing a grid partition yielding bucket C

y=70

4
v

hp22

(5,45)
Denver

4. Insert Denver causing no change

5
g

hp22

(27,35)
Omaha

5. Insert Omaha causing a grid partition yielding bucket D

y=43

6
r

hp22

(85,15)
Atlanta

6. Insert Atlanta causing a bucket partition yielding bucket E

7. Insert Miami causing a grid partition yielding bucket F

7
v

hp22

(90,5)
Miami

x=70

EXCELL (Tamminen)

• Uses regular decomposition
• Like grid file, guarantees access to any record with two

disk operations
• Differentiated from grid file by absence of linear scales

which enable decomposition of space to be arbitrary
• Grid partition results in doubling the size of grid directory
• Ex: bucket size = 2

hp231
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

(52,10)
Mobile

1. Initially, Chicago and Mobile in bucket A

EXCELL (Tamminen)

• Uses regular decomposition
• Like grid file, guarantees access to any record with two

disk operations
• Differentiated from grid file by absence of linear scales

which enable decomposition of space to be arbitrary
• Grid partition results in doubling the size of grid directory
• Ex: bucket size = 2

hp231
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

(52,10)
Mobile

1. Initially, Chicago and Mobile in bucket A

2
r

hp23

(62,77)
Toronto

2. Insert Toronto causing a grid partition yielding bucket B

EXCELL (Tamminen)

• Uses regular decomposition
• Like grid file, guarantees access to any record with two

disk operations
• Differentiated from grid file by absence of linear scales

which enable decomposition of space to be arbitrary
• Grid partition results in doubling the size of grid directory
• Ex: bucket size = 2

hp231
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

(52,10)
Mobile

1. Initially, Chicago and Mobile in bucket A

2
r

hp23

(62,77)
Toronto

2. Insert Toronto causing a grid partition yielding bucket B

3
z

hp23

(82,65)
Buffalo

3. Insert Buffalo causing a grid partition yielding bucket C

EXCELL (Tamminen)

• Uses regular decomposition
• Like grid file, guarantees access to any record with two

disk operations
• Differentiated from grid file by absence of linear scales

which enable decomposition of space to be arbitrary
• Grid partition results in doubling the size of grid directory
• Ex: bucket size = 2

hp231
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

(52,10)
Mobile

1. Initially, Chicago and Mobile in bucket A

2
r

hp23

(62,77)
Toronto

2. Insert Toronto causing a grid partition yielding bucket B

3
z

hp23

(82,65)
Buffalo

3. Insert Buffalo causing a grid partition yielding bucket C

4
r

hp23

(5,45)
Denver

4. Insert Denver causing no change

EXCELL (Tamminen)

• Uses regular decomposition
• Like grid file, guarantees access to any record with two

disk operations
• Differentiated from grid file by absence of linear scales

which enable decomposition of space to be arbitrary
• Grid partition results in doubling the size of grid directory
• Ex: bucket size = 2

hp231
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

(52,10)
Mobile

1. Initially, Chicago and Mobile in bucket A

2
r

hp23

(62,77)
Toronto

2. Insert Toronto causing a grid partition yielding bucket B

3
z

hp23

(82,65)
Buffalo

3. Insert Buffalo causing a grid partition yielding bucket C

4
r

hp23

(5,45)
Denver

4. Insert Denver causing no change

5
v

hp23

(27,35)
Omaha

5. Insert Omaha; bucket A overflows; split A yielding bucket D

EXCELL (Tamminen)

• Uses regular decomposition
• Like grid file, guarantees access to any record with two

disk operations
• Differentiated from grid file by absence of linear scales

which enable decomposition of space to be arbitrary
• Grid partition results in doubling the size of grid directory
• Ex: bucket size = 2

hp231
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

(52,10)
Mobile

1. Initially, Chicago and Mobile in bucket A

2
r

hp23

(62,77)
Toronto

2. Insert Toronto causing a grid partition yielding bucket B

3
z

hp23

(82,65)
Buffalo

3. Insert Buffalo causing a grid partition yielding bucket C

4
r

hp23

(5,45)
Denver

4. Insert Denver causing no change

5
v

hp23

(27,35)
Omaha

5. Insert Omaha; bucket A overflows; split A yielding bucket D

6
g

hp23

6. Bucket A is still too full, so perform a grid partition

D C

EXCELL (Tamminen)

• Uses regular decomposition
• Like grid file, guarantees access to any record with two

disk operations
• Differentiated from grid file by absence of linear scales

which enable decomposition of space to be arbitrary
• Grid partition results in doubling the size of grid directory
• Ex: bucket size = 2

hp231
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

(52,10)
Mobile

1. Initially, Chicago and Mobile in bucket A

2
r

hp23

(62,77)
Toronto

2. Insert Toronto causing a grid partition yielding bucket B

3
z

hp23

(82,65)
Buffalo

3. Insert Buffalo causing a grid partition yielding bucket C

4
r

hp23

(5,45)
Denver

4. Insert Denver causing no change

5
v

hp23

(27,35)
Omaha

5. Insert Omaha; bucket A overflows; split A yielding bucket D

6
g

hp23

6. Bucket A is still too full, so perform a grid partition

D C

7
z

hp23

(85,15)
Atlanta

7. Insert Atlanta causing no change

EXCELL (Tamminen)

• Uses regular decomposition
• Like grid file, guarantees access to any record with two

disk operations
• Differentiated from grid file by absence of linear scales

which enable decomposition of space to be arbitrary
• Grid partition results in doubling the size of grid directory
• Ex: bucket size = 2

hp231
b

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

(52,10)
Mobile

1. Initially, Chicago and Mobile in bucket A

2
r

hp23

(62,77)
Toronto

2. Insert Toronto causing a grid partition yielding bucket B

3
z

hp23

(82,65)
Buffalo

3. Insert Buffalo causing a grid partition yielding bucket C

4
r

hp23

(5,45)
Denver

4. Insert Denver causing no change

5
v

hp23

(27,35)
Omaha

5. Insert Omaha; bucket A overflows; split A yielding bucket D

6
g

hp23

6. Bucket A is still too full, so perform a grid partition

D C

7
z

hp23

(85,15)
Atlanta

7. Insert Atlanta causing no change

8
v

hp23

(90,5)
Miami

8. Insert Miami causing a bucket partition yielding bucket F

SUMMARY

• Data structures can be grouped:
N = No data organization

D = Organize data to be
stored (e.g., binary
search tree)

E = Organize the embedding
space from which the
data is drawn (e.g.,
digital searching)

H = Hybrid (combines at
least two of N, D, and E)

hp241
b

SUMMARY

• Data structures can be grouped:
N = No data organization

D = Organize data to be
stored (e.g., binary
search tree)

E = Organize the embedding
space from which the
data is drawn (e.g.,
digital searching)

H = Hybrid (combines at
least two of N, D, and E)

hp241
b

2
r

hp24

Sequential
List

0: no
organization

SUMMARY

• Data structures can be grouped:
N = No data organization

D = Organize data to be
stored (e.g., binary
search tree)

E = Organize the embedding
space from which the
data is drawn (e.g.,
digital searching)

H = Hybrid (combines at
least two of N, D, and E)

hp241
b

2
r

hp24

Sequential
List

0: no
organization

3
z

hp24

Inverted
List

1: one attribute

SUMMARY

• Data structures can be grouped:
N = No data organization

D = Organize data to be
stored (e.g., binary
search tree)

E = Organize the embedding
space from which the
data is drawn (e.g.,
digital searching)

H = Hybrid (combines at
least two of N, D, and E)

hp241
b

2
r

hp24

Sequential
List

0: no
organization

3
z

hp24

Inverted
List

1: one attribute

4
g

hp24

Grid
Method

2: all attributes
(fixed number
of cells)

SUMMARY

• Data structures can be grouped:
N = No data organization

D = Organize data to be
stored (e.g., binary
search tree)

E = Organize the embedding
space from which the
data is drawn (e.g.,
digital searching)

H = Hybrid (combines at
least two of N, D, and E)

hp241
b

2
r

hp24

Sequential
List

0: no
organization

3
z

hp24

Inverted
List

1: one attribute

4
g

hp24

Grid
Method

2: all attributes
(fixed number
of cells)

5
v

hp24

Grid
File

3: permit the
number of
cells to vary

Excell

E D

SUMMARY

• Data structures can be grouped:
N = No data organization

D = Organize data to be
stored (e.g., binary
search tree)

E = Organize the embedding
space from which the
data is drawn (e.g.,
digital searching)

H = Hybrid (combines at
least two of N, D, and E)

hp241
b

2
r

hp24

Sequential
List

0: no
organization

3
z

hp24

Inverted
List

1: one attribute

4
g

hp24

Grid
Method

2: all attributes
(fixed number
of cells)

5
v

hp24

Grid
File

3: permit the
number of
cells to vary

Excell

E D

6
r

hp24

Point
Quadtree

4: only partition
the
overflowing
cellsD

PR
Quadtree

MX
Quadtree

SUMMARY

• Data structures can be grouped:
N = No data organization

D = Organize data to be
stored (e.g., binary
search tree)

E = Organize the embedding
space from which the
data is drawn (e.g.,
digital searching)

H = Hybrid (combines at
least two of N, D, and E)

hp241
b

2
r

hp24

Sequential
List

0: no
organization

3
z

hp24

Inverted
List

1: one attribute

4
g

hp24

Grid
Method

2: all attributes
(fixed number
of cells)

5
v

hp24

Grid
File

3: permit the
number of
cells to vary

Excell

E D

6
r

hp24

Point
Quadtree

4: only partition
the
overflowing
cellsD

PR
Quadtree

MX
Quadtree

7
z

hp24

K-d tree

5: only partition
the
overflowing
cells into 2
cellsD

PR
K-d tree

SUMMARY

• Data structures can be grouped:
N = No data organization

D = Organize data to be
stored (e.g., binary
search tree)

E = Organize the embedding
space from which the
data is drawn (e.g.,
digital searching)

H = Hybrid (combines at
least two of N, D, and E)

hp241
b

2
r

hp24

Sequential
List

0: no
organization

3
z

hp24

Inverted
List

1: one attribute

4
g

hp24

Grid
Method

2: all attributes
(fixed number
of cells)

5
v

hp24

Grid
File

3: permit the
number of
cells to vary

Excell

E D

6
r

hp24

Point
Quadtree

4: only partition
the
overflowing
cellsD

PR
Quadtree

MX
Quadtree

7
z

hp24

K-d tree

5: only partition
the
overflowing
cells into 2
cellsD

PR
K-d tree

8
g

hp24

Adaptive
K-d tree

6: partition using
attribute
having
greatest
spread across
the cell

hp25

DRAWBACKS OF MOST HASHING METHODS

• Require rehashing all the data when the hash table
becomes too full

• Goal: only move a few records
• Solutions:

1. Knott
• use a trie in the form of a binary tree

2. extendible hashing
(Fagin, Nievergelt, Pippenger, Strong)

• like a trie except that all buckets are at same level
• buckets are accessed by use of a directory
• directory elements are NOT the same as buckets

3. Linear hashing (Litwin)
• provides for linear growth in the number of

buckets (i.e., the hash table grows at a rate of one
bucket at a time)

• does not make use of a directory

1
b

B C

hp25

DRAWBACKS OF MOST HASHING METHODS

• Require rehashing all the data when the hash table
becomes too full

• Goal: only move a few records
• Solutions:

1. Knott
• use a trie in the form of a binary tree

2. extendible hashing
(Fagin, Nievergelt, Pippenger, Strong)

• like a trie except that all buckets are at same level
• buckets are accessed by use of a directory
• directory elements are NOT the same as buckets

3. Linear hashing (Litwin)
• provides for linear growth in the number of

buckets (i.e., the hash table grows at a rate of one
bucket at a time)

• does not make use of a directory

1
b

B C

hp252
r

• bucket overflow is solved by splitting
• drawback: accessing a bucket at

level m requires m operations
E F

hp25

DRAWBACKS OF MOST HASHING METHODS

• Require rehashing all the data when the hash table
becomes too full

• Goal: only move a few records
• Solutions:

1. Knott
• use a trie in the form of a binary tree

2. extendible hashing
(Fagin, Nievergelt, Pippenger, Strong)

• like a trie except that all buckets are at same level
• buckets are accessed by use of a directory
• directory elements are NOT the same as buckets

3. Linear hashing (Litwin)
• provides for linear growth in the number of

buckets (i.e., the hash table grows at a rate of one
bucket at a time)

• does not make use of a directory

1
b

B C

hp252
r

• bucket overflow is solved by splitting
• drawback: accessing a bucket at

level m requires m operations
E F

hp253
z

• implemented as a directory of pointers to buckets
• accessing a bucket requires 1 operation

A A A A B C D D

hp25

DRAWBACKS OF MOST HASHING METHODS

• Require rehashing all the data when the hash table
becomes too full

• Goal: only move a few records
• Solutions:

1. Knott
• use a trie in the form of a binary tree

2. extendible hashing
(Fagin, Nievergelt, Pippenger, Strong)

• like a trie except that all buckets are at same level
• buckets are accessed by use of a directory
• directory elements are NOT the same as buckets

3. Linear hashing (Litwin)
• provides for linear growth in the number of

buckets (i.e., the hash table grows at a rate of one
bucket at a time)

• does not make use of a directory

1
b

B C

hp252
r

• bucket overflow is solved by splitting
• drawback: accessing a bucket at

level m requires m operations
E F

hp253
z

• implemented as a directory of pointers to buckets
• accessing a bucket requires 1 operation

A A A A B C D D

hp254
g

• bucket overflow may cause doubling the directory
• e.g., EXCELL

A A A A A A A A B B E F D D D D

hp26
MECHANICS OF LINEAR HASHING

• Assume a file with m buckets

• Use two hashing functions hi(k) = f(k) mod 2i+1 for i = n
and i = n+1

1. compute hn+1(k) = x and use the result if x < m 2. otherwise use hn(k) • Such a file is said to be at level n,n+1 • There exist primary and overflow buckets • When a record hashes to a full primary bucket, then it is inserted into an overflow bucket corresponding to the primary bucket • τ : storage utilization factor τ = number of records in file divided by the total of available slots in primary and overflow buckets • When τ > a given load α, then one of the buckets is split

• When bucket i is split, it and its overflow bucket’s records
are rehashed using hn+1 and distributed into buckets i
and i + 2n as is appropriate

hp27

LINEAR HASHING INSERTION ALGORITHM

• Let s denote the identity of the next bucket to be split and
cycles from 0 to 2n-1

• Insertion algorithm

1. compute bucket address i for record r

2. insert r in bucket i

3. if τ > α, then split bucket i creating bucket i + 2n and
reinsert in buckets i and i + 2n

4. if s = 2n, then all buckets have been split

• increment n

• reset s to 0

5. if buckets i or i+1 overflow, then allocate an overflow
bucket

6. if rehashing causes some overflow buckets to be
reclaimed, repeat steps 3-5

• Notes

1. a bucket split need not necessarily occur when a
record hashes to a full bucket, nor does the bucket
being split need to be full

2. key principle is that eventually every bucket will be
split and ideally all overflow buckets will be emptied
and reclaimed

3. if the storage utilization gets too low, then buckets
should be reclaimed

hp28

BIT INTERLEAVING HASHING FUNCTIONS

• Bit interleaving takes one bit from the binary
representation of the x coordinate value and one bit
from the binary representation of the y coordinate
value and alternates them

• Use city coordinate values and divide by 12.5 so that
each coordinate value requires just three binary digits

• Example with y being more significant than x

City x y f(z)=z div 12.5 Bit Interleaved
x y Value

Chicago 35 42 2 3 14

Mobile 52 10 4 0 16

Toronto 62 77 4 6 56

Buffalo 82 65 6 5 54

Denver 5 45 0 3 10

Omaha 27 35 2 2 12

Atlanta 85 15 6 1 22

Miami 90 5 7 0 21

1
b

hp28

BIT INTERLEAVING HASHING FUNCTIONS

• Bit interleaving takes one bit from the binary
representation of the x coordinate value and one bit
from the binary representation of the y coordinate
value and alternates them

• Use city coordinate values and divide by 12.5 so that
each coordinate value requires just three binary digits

• Example with y being more significant than x

City x y f(z)=z div 12.5 Bit Interleaved
x y Value

Chicago 35 42 2 3 14

Mobile 52 10 4 0 16

Toronto 62 77 4 6 56

Buffalo 82 65 6 5 54

Denver 5 45 0 3 10

Omaha 27 35 2 2 12

Atlanta 85 15 6 1 22

Miami 90 5 7 0 21

1
b

hp282
r

0 0 1

1 1 0

• Ex: Atlanta (6,1)

hp28

BIT INTERLEAVING HASHING FUNCTIONS

• Bit interleaving takes one bit from the binary
representation of the x coordinate value and one bit
from the binary representation of the y coordinate
value and alternates them

• Use city coordinate values and divide by 12.5 so that
each coordinate value requires just three binary digits

• Example with y being more significant than x

City x y f(z)=z div 12.5 Bit Interleaved
x y Value

Chicago 35 42 2 3 14

Mobile 52 10 4 0 16

Toronto 62 77 4 6 56

Buffalo 82 65 6 5 54

Denver 5 45 0 3 10

Omaha 27 35 2 2 12

Atlanta 85 15 6 1 22

Miami 90 5 7 0 21

1
b

hp282
r

0 0 1

1 1 0

• Ex: Atlanta (6,1)

hp283
z

0 1 0 1 1 0 = 22

hp29

EXAMPLE OF LINEAR HASHING

• Assume primary and overflow bucket capacity is 2

• A bucket is split whenever τ ≥ α = 0.66
• Initially, only bucket 0 exists

1
b

hp29

EXAMPLE OF LINEAR HASHING

• Assume primary and overflow bucket capacity is 2

• A bucket is split whenever τ ≥ α = 0.66
• Initially, only bucket 0 exists

1
b

hp292
r

• Insert Chicago (14) and Mobile (16):
τ=1 and split bucket 0 creating bucket 1

Chicago

0 1

Mobile

hp29

EXAMPLE OF LINEAR HASHING

• Assume primary and overflow bucket capacity is 2

• A bucket is split whenever τ ≥ α = 0.66
• Initially, only bucket 0 exists

1
b

hp292
r

• Insert Chicago (14) and Mobile (16):
τ=1 and split bucket 0 creating bucket 1

Chicago

0 1

Mobile

hp29

• Insert Toronto (56) into bucket 0:
τ=0.75 and split bucket 0 creating bucket 2; move
Chicago to bucket 2

3
z

Toronto
Chicago

hp29

EXAMPLE OF LINEAR HASHING

• Assume primary and overflow bucket capacity is 2

• A bucket is split whenever τ ≥ α = 0.66
• Initially, only bucket 0 exists

1
b

hp292
r

• Insert Chicago (14) and Mobile (16):
τ=1 and split bucket 0 creating bucket 1

Chicago

0 1

Mobile

hp29

• Insert Toronto (56) into bucket 0:
τ=0.75 and split bucket 0 creating bucket 2; move
Chicago to bucket 2

3
z

Toronto
Chicago

hp29

• Insert Buffalo (54) into bucket 2:
τ=0.67 and split bucket 1 creating bucket 3

4
g

Buffalo

0 3

hp29

EXAMPLE OF LINEAR HASHING

• Assume primary and overflow bucket capacity is 2

• A bucket is split whenever τ ≥ α = 0.66
• Initially, only bucket 0 exists

1
b

hp292
r

• Insert Chicago (14) and Mobile (16):
τ=1 and split bucket 0 creating bucket 1

Chicago

0 1

Mobile

hp29

• Insert Toronto (56) into bucket 0:
τ=0.75 and split bucket 0 creating bucket 2; move
Chicago to bucket 2

3
z

Toronto
Chicago

hp29

• Insert Buffalo (54) into bucket 2:
τ=0.67 and split bucket 1 creating bucket 3

4
g

Buffalo

0 3

hp295
v

Denver

• Insert Denver (10) into bucket 2 which causes it to overflow

hp29

EXAMPLE OF LINEAR HASHING

• Assume primary and overflow bucket capacity is 2

• A bucket is split whenever τ ≥ α = 0.66
• Initially, only bucket 0 exists

1
b

hp292
r

• Insert Chicago (14) and Mobile (16):
τ=1 and split bucket 0 creating bucket 1

Chicago

0 1

Mobile

hp29

• Insert Toronto (56) into bucket 0:
τ=0.75 and split bucket 0 creating bucket 2; move
Chicago to bucket 2

3
z

Toronto
Chicago

hp29

• Insert Buffalo (54) into bucket 2:
τ=0.67 and split bucket 1 creating bucket 3

4
g

Buffalo

0 3

hp295
v

Denver

• Insert Denver (10) into bucket 2 which causes it to overflow

hp296
g

Omaha

• Insert Omaha (12) into bucket 0 which causes it to overflow

hp29

EXAMPLE OF LINEAR HASHING

• Assume primary and overflow bucket capacity is 2

• A bucket is split whenever τ ≥ α = 0.66
• Initially, only bucket 0 exists

1
b

hp292
r

• Insert Chicago (14) and Mobile (16):
τ=1 and split bucket 0 creating bucket 1

Chicago

0 1

Mobile

hp29

• Insert Toronto (56) into bucket 0:
τ=0.75 and split bucket 0 creating bucket 2; move
Chicago to bucket 2

3
z

Toronto
Chicago

hp29

• Insert Buffalo (54) into bucket 2:
τ=0.67 and split bucket 1 creating bucket 3

4
g

Buffalo

0 3

hp295
v

Denver

• Insert Denver (10) into bucket 2 which causes it to overflow

hp296
g

Omaha

• Insert Omaha (12) into bucket 0 which causes it to overflow

hp297
r

Atlanta

• Insert Atlanta (22) into bucket 2’s overflow area

hp29

EXAMPLE OF LINEAR HASHING

• Assume primary and overflow bucket capacity is 2

• A bucket is split whenever τ ≥ α = 0.66
• Initially, only bucket 0 exists

1
b

hp292
r

• Insert Chicago (14) and Mobile (16):
τ=1 and split bucket 0 creating bucket 1

Chicago

0 1

Mobile

hp29

• Insert Toronto (56) into bucket 0:
τ=0.75 and split bucket 0 creating bucket 2; move
Chicago to bucket 2

3
z

Toronto
Chicago

hp29

• Insert Buffalo (54) into bucket 2:
τ=0.67 and split bucket 1 creating bucket 3

4
g

Buffalo

0 3

hp295
v

Denver

• Insert Denver (10) into bucket 2 which causes it to overflow

hp296
g

Omaha

• Insert Omaha (12) into bucket 0 which causes it to overflow

hp297
r

Atlanta

• Insert Atlanta (22) into bucket 2’s overflow area

hp298
z

Omaha

• Insert Miami (21) into bucket 1:
τ=0.67 and split bucket 0 creating bucket 4; move Omaha to
bucket 4

Miami

hp29

EXAMPLE OF LINEAR HASHING

• Assume primary and overflow bucket capacity is 2

• A bucket is split whenever τ ≥ α = 0.66
• Initially, only bucket 0 exists

1
b

hp292
r

• Insert Chicago (14) and Mobile (16):
τ=1 and split bucket 0 creating bucket 1

Chicago

0 1

Mobile

hp29

• Insert Toronto (56) into bucket 0:
τ=0.75 and split bucket 0 creating bucket 2; move
Chicago to bucket 2

3
z

Toronto
Chicago

hp29

• Insert Buffalo (54) into bucket 2:
τ=0.67 and split bucket 1 creating bucket 3

4
g

Buffalo

0 3

hp295
v

Denver

• Insert Denver (10) into bucket 2 which causes it to overflow

hp296
g

Omaha

• Insert Omaha (12) into bucket 0 which causes it to overflow

hp297
r

Atlanta

• Insert Atlanta (22) into bucket 2’s overflow area

hp298
z

Omaha

• Insert Miami (21) into bucket 1:
τ=0.67 and split bucket 0 creating bucket 4; move Omaha to
bucket 4

Miami

hp299
v

Miami

• Reclaim the overflow area of bucket 0:
τ=0.67 again and split bucket 1 creating bucket 5; move
Miami to bucket 5

hp30

ORDER PRESERVING LINEAR HASHING (OPLH)

• Problem: the hash function hn(k) = k mod 2n implies that
all records in a given bucket agree in the least
n significant bits

1. OK for random access
2. unacceptable for sequential access as each record will

be in a different bucket
• Solution: use the hash function

hn(k) = (reverse(k)) mod 2n

1. tests the n most significant bits
2. all records in a bucket are within a given range

City x y f(z)=z mod 12.5 Bit Interleaved Value
x y x most sig y most sig

Chicago
35 42 2 3 44 28

Mobile 52 10 4 0 1 2
Toronto 62 77 4 6 11 7
Buffalo 82 65 6 5 39 27
Denver 5 45 0 3 40 20
Omaha 27 35 2 2 12 12
Atlanta 85 15 6 1 37 26
Miami 90 5 7 0 21 42

• Shortcomings
1. records may not be scattered too well

• overflow is much more common than with traditional
hashing methods

• random access is slower since several overflow
buckets may have to be examined

2. creates a large number of sparsely filled buckets
• sequential access may be slower as may have to

examine many empty buckets

hp31

EXAMPLE OF ORDER PRESERVING LINEAR HASHING
• Assume primary and overflow bucket capacity is 2
• A bucket is split whenever τ≥α=0.66
• Initially, only bucket 0 exists

(0,100) (100,100)

(100,0)(0,0)

1
b

hp31

EXAMPLE OF ORDER PRESERVING LINEAR HASHING
• Assume primary and overflow bucket capacity is 2
• A bucket is split whenever τ≥α=0.66
• Initially, only bucket 0 exists

(0,100) (100,100)

(100,0)(0,0)

1
b

hp312
r

(52,10)
Mobile

(35,42)
Chicago

• Insert Chicago (28) and Mobile (2): τ=1 and split bucket
0 creating bucket 1

hp31

EXAMPLE OF ORDER PRESERVING LINEAR HASHING
• Assume primary and overflow bucket capacity is 2
• A bucket is split whenever τ≥α=0.66
• Initially, only bucket 0 exists

(0,100) (100,100)

(100,0)(0,0)

1
b

hp312
r

(52,10)
Mobile

(35,42)
Chicago

• Insert Chicago (28) and Mobile (2): τ=1 and split bucket
0 creating bucket 1

hp313
z

(62,77)
Toronto

• Insert Toronto (7) into bucket 1: τ=0.75 and split bucket
0 creating bucket 2; move Mobile to bucket 2

hp31

EXAMPLE OF ORDER PRESERVING LINEAR HASHING
• Assume primary and overflow bucket capacity is 2
• A bucket is split whenever τ≥α=0.66
• Initially, only bucket 0 exists

(0,100) (100,100)

(100,0)(0,0)

1
b

hp312
r

(52,10)
Mobile

(35,42)
Chicago

• Insert Chicago (28) and Mobile (2): τ=1 and split bucket
0 creating bucket 1

hp313
z

(62,77)
Toronto

• Insert Toronto (7) into bucket 1: τ=0.75 and split bucket
0 creating bucket 2; move Mobile to bucket 2

hp314
g

(82,65)
Buffalo3

• Insert Buffalo (27) into 1: τ=0.67, split bucket 1 creating
bucket 3; move Toronto and Buffalo to bucket 3

hp31

EXAMPLE OF ORDER PRESERVING LINEAR HASHING
• Assume primary and overflow bucket capacity is 2
• A bucket is split whenever τ≥α=0.66
• Initially, only bucket 0 exists

(0,100) (100,100)

(100,0)(0,0)

1
b

hp312
r

(52,10)
Mobile

(35,42)
Chicago

• Insert Chicago (28) and Mobile (2): τ=1 and split bucket
0 creating bucket 1

hp313
z

(62,77)
Toronto

• Insert Toronto (7) into bucket 1: τ=0.75 and split bucket
0 creating bucket 2; move Mobile to bucket 2

hp314
g

(82,65)
Buffalo3

• Insert Buffalo (27) into 1: τ=0.67, split bucket 1 creating
bucket 3; move Toronto and Buffalo to bucket 3

hp315
v

(5,45)
Denver

• Insert Denver (20) into bucket 0

hp31

EXAMPLE OF ORDER PRESERVING LINEAR HASHING
• Assume primary and overflow bucket capacity is 2
• A bucket is split whenever τ≥α=0.66
• Initially, only bucket 0 exists

(0,100) (100,100)

(100,0)(0,0)

1
b

hp312
r

(52,10)
Mobile

(35,42)
Chicago

• Insert Chicago (28) and Mobile (2): τ=1 and split bucket
0 creating bucket 1

hp313
z

(62,77)
Toronto

• Insert Toronto (7) into bucket 1: τ=0.75 and split bucket
0 creating bucket 2; move Mobile to bucket 2

hp314
g

(82,65)
Buffalo3

• Insert Buffalo (27) into 1: τ=0.67, split bucket 1 creating
bucket 3; move Toronto and Buffalo to bucket 3

hp315
v

(5,45)
Denver

• Insert Denver (20) into bucket 0

hp316
g

(27,35)
Omaha4

• Insert Omaha (12) into 0: τ=0.75 and split bucket 0 creating
bucket 4; move Denver and Chicago to bucket 4

hp31

EXAMPLE OF ORDER PRESERVING LINEAR HASHING
• Assume primary and overflow bucket capacity is 2
• A bucket is split whenever τ≥α=0.66
• Initially, only bucket 0 exists

(0,100) (100,100)

(100,0)(0,0)

1
b

hp312
r

(52,10)
Mobile

(35,42)
Chicago

• Insert Chicago (28) and Mobile (2): τ=1 and split bucket
0 creating bucket 1

hp313
z

(62,77)
Toronto

• Insert Toronto (7) into bucket 1: τ=0.75 and split bucket
0 creating bucket 2; move Mobile to bucket 2

hp314
g

(82,65)
Buffalo3

• Insert Buffalo (27) into 1: τ=0.67, split bucket 1 creating
bucket 3; move Toronto and Buffalo to bucket 3

hp315
v

(5,45)
Denver

• Insert Denver (20) into bucket 0

hp316
g

(27,35)
Omaha4

• Insert Omaha (12) into 0: τ=0.75 and split bucket 0 creating
bucket 4; move Denver and Chicago to bucket 4

hp317
v

(85,15)
Atlanta

• Insert Atlanta (26) into bucket 2

hp31

EXAMPLE OF ORDER PRESERVING LINEAR HASHING
• Assume primary and overflow bucket capacity is 2
• A bucket is split whenever τ≥α=0.66
• Initially, only bucket 0 exists

(0,100) (100,100)

(100,0)(0,0)

1
b

hp312
r

(52,10)
Mobile

(35,42)
Chicago

• Insert Chicago (28) and Mobile (2): τ=1 and split bucket
0 creating bucket 1

hp313
z

(62,77)
Toronto

• Insert Toronto (7) into bucket 1: τ=0.75 and split bucket
0 creating bucket 2; move Mobile to bucket 2

hp314
g

(82,65)
Buffalo3

• Insert Buffalo (27) into 1: τ=0.67, split bucket 1 creating
bucket 3; move Toronto and Buffalo to bucket 3

hp315
v

(5,45)
Denver

• Insert Denver (20) into bucket 0

hp316
g

(27,35)
Omaha4

• Insert Omaha (12) into 0: τ=0.75 and split bucket 0 creating
bucket 4; move Denver and Chicago to bucket 4

hp317
v

(85,15)
Atlanta

• Insert Atlanta (26) into bucket 2

hp318
z

(90,5)
Miami

• Insert Miami (42) into bucket 2: τ=0.67 and split bucket 1
creating bucket 5; move Miami to bucket 2’s overflow area

hp32

COMPARISON OF OPLH WITH EXCELL

OPLH EXCELL
1. Implicit directory

• one directory element
per primary bucket

• all buckets stored at
one of two levels

• overflow buckets

1. Explicit directory
• set of primary buckets

2. Reverse bit interleaving 2. Reverse bit interleaving
3. Bucket overflow

• allocate at most two
additional buckets (one
for the bucket that has
been split and one for
the overflowing bucket)

3. Bucket overflow
• triggers bucket split or

directory doubling

4.Retrieval of a record
requires examining primary
and overflow buckets

4.Retrieve any record with
two disk accesses

• Summary: order preserving linear hashing (OPLH) yields
a more gradual growth in the size of the
directory at the expense of the loss of the
guarantee of retrieval of any record with just
two disk accesses

hp33
EXAMPLES OF COMPARISON OF OPLH WITH EXCELL
1. Reversed bit interleaving with the y coordinate value as

the most significant (i.e., y is split first)
OPLH: EXCELL:

• 6 primary buckets • 7 buckets
• 2 overflow buckets • 16 directory elements

2. Reversed bit interleaving with the x coordinate value as
the most significant (i.e., x is split first)
OPLH: EXCELL:

• 7 primary buckets • 6 buckets
• no overflow buckets • 8 directory elements

(52,10)
Mobile

(35,42)
Chicago

(62,77)
Toronto

(82,65)
Buffalo

(5,45)
Denver

(27,35)
Omaha (85,15)

Atlanta

(90,5)
Miami

(52,10)
Mobile

(5,45)
Denver

(85,15)
Atlanta

(27,35)
Omaha

(0,100) (100,100)

(100,0)(0,0)

(0,100) (100,100)

(100,0)(0,0)

(52,10)
Mobile

(35,42)
Chicago

(62,77)
Toronto

(82,65)
Buffalo

(5,45)
Denver

(27,35)
OmahaD

(85,15)
Atlanta

(90,5)
Miami

B B B B

B BB

A G

E
F F

(0,100) (100,100)

(100,0)(0,0)

(52,10)
Mobile

(35,42)
Chicago

(62,77)
Toronto

(82,65)
Buffalo

(5,45)
Denver

(27,35)
Omaha

(85,15)
Atlanta

(90,5)
Miami

D CC

E F

(0,100) (100,100)

(100,0)(0,0)

(35,42)
Chicago

(62,77)
Toronto

(82,65)
Buffalo

(90,5)
Miami

hp341
bSPIRAL HASHING (Martin)

• Drawbacks of linear hashing:
1. order in which buckets are split is unrelated to the

probability of the occurence of overflow
2. all buckets that are candidates for a split have the

same probability of overflowing

• Central idea is the existence of an ever-changing (and
growing) address space of active bucket addresses

1. records are distributed in the active buckets in an
uneven manner

2. split the bucket with the highest probability of
overflowing

• When a bucket s is split

1. create d new buckets
2. rehash the contents of s into the d new buckets
3. bucket s is no longer used

• If [s,t ] are the active buckets, then [s + 1,t + d ] are
the active buckets after the split

• Ex: assume that initially there are d – 1 active buckets
starting at address 1

• After s bucket splits, there are (s + 1)•(d + 1) active
buckets starting at address s + 1 (prove by induction)

1d = 2:

hp341
bSPIRAL HASHING (Martin)

• Drawbacks of linear hashing:
1. order in which buckets are split is unrelated to the

probability of the occurence of overflow
2. all buckets that are candidates for a split have the

same probability of overflowing

• Central idea is the existence of an ever-changing (and
growing) address space of active bucket addresses

1. records are distributed in the active buckets in an
uneven manner

2. split the bucket with the highest probability of
overflowing

• When a bucket s is split

1. create d new buckets
2. rehash the contents of s into the d new buckets
3. bucket s is no longer used

• If [s,t ] are the active buckets, then [s + 1,t + d ] are
the active buckets after the split

• Ex: assume that initially there are d – 1 active buckets
starting at address 1

• After s bucket splits, there are (s + 1)•(d + 1) active
buckets starting at address s + 1 (prove by induction)

1d = 2:

hp342
r

2 3

hp341
bSPIRAL HASHING (Martin)

• Drawbacks of linear hashing:
1. order in which buckets are split is unrelated to the

probability of the occurence of overflow
2. all buckets that are candidates for a split have the

same probability of overflowing

• Central idea is the existence of an ever-changing (and
growing) address space of active bucket addresses

1. records are distributed in the active buckets in an
uneven manner

2. split the bucket with the highest probability of
overflowing

• When a bucket s is split

1. create d new buckets
2. rehash the contents of s into the d new buckets
3. bucket s is no longer used

• If [s,t ] are the active buckets, then [s + 1,t + d ] are
the active buckets after the split

• Ex: assume that initially there are d – 1 active buckets
starting at address 1

• After s bucket splits, there are (s + 1)•(d + 1) active
buckets starting at address s + 1 (prove by induction)

1d = 2:

hp342
r

2 3

hp343
z

3 4 5

hp341
bSPIRAL HASHING (Martin)

• Drawbacks of linear hashing:
1. order in which buckets are split is unrelated to the

probability of the occurence of overflow
2. all buckets that are candidates for a split have the

same probability of overflowing

• Central idea is the existence of an ever-changing (and
growing) address space of active bucket addresses

1. records are distributed in the active buckets in an
uneven manner

2. split the bucket with the highest probability of
overflowing

• When a bucket s is split

1. create d new buckets
2. rehash the contents of s into the d new buckets
3. bucket s is no longer used

• If [s,t ] are the active buckets, then [s + 1,t + d ] are
the active buckets after the split

• Ex: assume that initially there are d – 1 active buckets
starting at address 1

• After s bucket splits, there are (s + 1)•(d + 1) active
buckets starting at address s + 1 (prove by induction)

1d = 2:

hp342
r

2 3

hp343
z

3 4 5

hp344
g

4 5 6 7

hp35

SPIRAL HASHING FUNCTION

• Assume that initially there are d −1 active buckets starting
at address 1

• Key idea is the behavior of the function y = dx

1. dx+1 − dx = dx • (d −1)
2. bucket s has just been split
3. let dx = s +1 = address of first active bucket
4. implies dx+1 − dx = (s +1) • (d −1) = number of active buckets
5. last active bucket is at address dx+1 −1
6. [dx,dx+1 ) is range of active buckets

• Use two hashing functions
1. h(k) maps key k uniformly into [0,1) which is the range of the

difference in exponent values of addresses iny = dx+1 − dx

2. y(k) maps h(k) into an address in [s+1,(s+1)+(s+1)•(d −1))

• y(k) = dx(k) with x(k) in range [logd (s+1),logd (s+1)+1)
• before split, active buckets lay in range [logd s,logd s+1)
• want to make sure that all key values previously in

bucket s − i.e., x(k) in [logds,logd (s+1)) are rehashed
into one of the new buckets with an x(k) value in
[logds+1,logd (s+1)+1)

• leave other key values in [logd (s+1),logd s +1) unchanged
• difficult to choose x(k)

a. x(k) = logd (s+1) + h(k)
• drawback: must rehash all keys when a bucket is split

b. x(k) = logd (s+1) − h(k)  + h(k)
• guarantees that if k is hashed into bucket b (≥ s + 1)

then it continues to hash there until bucket b is split
• implies that x(k) is a number in the range
[logd (s +1),logd (s+1) +1) whose fractional part is h(k)

hp36

BEHAVIOR OF THE SPIRAL HASHING FUNCTION

• Ex: d = 2

1
b

15
14
13
12
11
10
9
8
7
6
5
4
3
2
1

y = 2x

x
Bucket
Number

Relative
Load

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0.0

0.1

0.2

0.5

1.0

The function y = 2x Relative load of buckets 1-15

Bucket address Hash interval Relative load

1 [0.0000,1.0000) 1.0000
2 [0.0000,0.5849) 0.5849
3 [0.5849,1.0000) 0.4151
4 [0.0000,0.3219) 0.3219
5 [0.3219,0.5849) 0.2630
6 [0.5849,0.8073) 0.2224
7 [0.8073,1.0000) 0.1927
8 [0.0000,0.1699) 0.1699
9 [0.1699,0.3219) 0.1520
10 [0.3219,0.4594) 0.1375
11 [0.4594,0.5849) 0.1255
12 [0.5849,0.7004) 0.1155
13 [0.7004,0.8073) 0.1069
14 [0.8073,0.9068) 0.0995
15 [0.9068,1.0000) 0.0932

hp36

BEHAVIOR OF THE SPIRAL HASHING FUNCTION

• Ex: d = 2

1
b

15
14
13
12
11
10
9
8
7
6
5
4
3
2
1

y = 2x

x
Bucket
Number

Relative
Load

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0.0

0.1

0.2

0.5

1.0

The function y = 2x Relative load of buckets 1-15

Bucket address Hash interval Relative load

hp362
r

• Split bucket 3

hp36

BEHAVIOR OF THE SPIRAL HASHING FUNCTION

• Ex: d = 2

1
b

15
14
13
12
11
10
9
8
7
6
5
4
3
2
1

y = 2x

x
Bucket
Number

Relative
Load

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0.0

0.1

0.2

0.5

1.0

The function y = 2x Relative load of buckets 1-15

Bucket address Hash interval Relative load

hp362
r

• Split bucket 3

hp363
z

hp37

TABLES FOR EXAMPLE OF SPIRAL HASHING

• Use bit interleaving to form a value k – i.e., take one bit
from the binary representation of the x coordinate
value and one bit from the binary representation of the
y coordinate value and alternate them

• Use city coordinate values and divide by 12.5 so that
each coordinate value requires just three binary digits

• Use h(k) = k/64 which has same effect as reverse bit
interleaving and behavior is analogous to OPLH

• Example with y being more significant than x
City x y f(z)=z div 12.5
x y k/64

Chicago 35 42 2 3 14 .21875

Mobile 52 10 4 0 16 .25

Toronto 62 77 4 6 56 .875

Buffalo 82 65 6 5 54 .84375

Denver 5 45 0 3 10 .15625

Omaha 27 35 2 2 12 .1875

Atlanta 85 15 6 1 22 .34375

Miami 90 5 7 0 21 .328125

1
b

hp37

TABLES FOR EXAMPLE OF SPIRAL HASHING

• Use city coordinate values and divide by 12.5 so that
each coordinate value requires just three binary digits

• Use h(k) = k/64 which has same effect as reverse bit
interleaving and behavior is analogous to OPLH

• Example with y being more significant than x
City x y f(z)=z div 12.5
x y k/64

Chicago 35 42 2 3 14 .21875

Mobile 52 10 4 0 16 .25

Toronto 62 77 4 6 56 .875

Buffalo 82 65 6 5 54 .84375

Denver 5 45 0 3 10 .15625

Omaha 27 35 2 2 12 .1875

Atlanta 85 15 6 1 22 .34375

Miami 90 5 7 0 21 .328125

1
b

hp372
r

0 0 1

1 1 0

• Ex: Atlanta (6,1)

hp37

TABLES FOR EXAMPLE OF SPIRAL HASHING

• Use city coordinate values and divide by 12.5 so that
each coordinate value requires just three binary digits

• Use h(k) = k/64 which has same effect as reverse bit
interleaving and behavior is analogous to OPLH

• Example with y being more significant than x
City x y f(z)=z div 12.5
x y k/64

Chicago 35 42 2 3 14 .21875

Mobile 52 10 4 0 16 .25

Toronto 62 77 4 6 56 .875

Buffalo 82 65 6 5 54 .84375

Denver 5 45 0 3 10 .15625

Omaha 27 35 2 2 12 .1875

Atlanta 85 15 6 1 22 .34375

Miami 90 5 7 0 21 .328125

1
b

hp372
r

0 0 1

1 1 0

• Ex: Atlanta (6,1)

hp373
z

0 1 0 1 1 0 = 22

hp38

MECHANICS OF EXAMPLE OF SPIRAL HASHING

• Assume d = 2 and primary and overflow bucket capacity of 2

• A bucket is split whenever τ ≥ α = 0.66
• Initially, only bucket 1 exists

1
b

hp38

MECHANICS OF EXAMPLE OF SPIRAL HASHING

• Assume d = 2 and primary and overflow bucket capacity of 2

• A bucket is split whenever τ ≥ α = 0.66
• Initially, only bucket 1 exists

1
b

hp38

• Insert Chicago (.22) and Mobile (.25): τ=1 and split bucket 1
creating buckets 2 and 3; Chicago and Mobile are moved to
bucket 2

2
r

Mobile

Chicago

1 2

2 3

CHI
MOB

hp38

MECHANICS OF EXAMPLE OF SPIRAL HASHING

• Assume d = 2 and primary and overflow bucket capacity of 2

• A bucket is split whenever τ ≥ α = 0.66
• Initially, only bucket 1 exists

1
b

hp38

• Insert Chicago (.22) and Mobile (.25): τ=1 and split bucket 1
creating buckets 2 and 3; Chicago and Mobile are moved to
bucket 2

2
r

Mobile

Chicago

1 2

2 3

CHI
MOB

hp38

• Insert Toronto (.87) into bucket 3: τ=0.75 and split bucket 2
creating buckets 3 and 4; Chicago and Mobile are moved to
bucket 4

3
z

Toronto

5
4 5

TOR CHI
MOB

hp38

MECHANICS OF EXAMPLE OF SPIRAL HASHING

• Assume d = 2 and primary and overflow bucket capacity of 2

• A bucket is split whenever τ ≥ α = 0.66
• Initially, only bucket 1 exists

1
b

hp38

• Insert Chicago (.22) and Mobile (.25): τ=1 and split bucket 1
creating buckets 2 and 3; Chicago and Mobile are moved to
bucket 2

2
r

Mobile

Chicago

1 2

2 3

CHI
MOB

hp38

• Insert Toronto (.87) into bucket 3: τ=0.75 and split bucket 2
creating buckets 3 and 4; Chicago and Mobile are moved to
bucket 4

3
z

Toronto

5
4 5

TOR CHI
MOB

hp38

• Insert Buffalo (.84) into bucket 3: τ=0.67 and split bucket 3
creating buckets 6 and 7; Toronto and Buffalo are moved to
bucket 7

4
g

Buffalo

6 7

TOR
BUF

hp38

MECHANICS OF EXAMPLE OF SPIRAL HASHING

• Assume d = 2 and primary and overflow bucket capacity of 2

• A bucket is split whenever τ ≥ α = 0.66
• Initially, only bucket 1 exists

1
b

hp38

• Insert Chicago (.22) and Mobile (.25): τ=1 and split bucket 1
creating buckets 2 and 3; Chicago and Mobile are moved to
bucket 2

2
r

Mobile

Chicago

1 2

2 3

CHI
MOB

hp38

• Insert Toronto (.87) into bucket 3: τ=0.75 and split bucket 2
creating buckets 3 and 4; Chicago and Mobile are moved to
bucket 4

3
z

Toronto

5
4 5

TOR CHI
MOB

hp38

• Insert Buffalo (.84) into bucket 3: τ=0.67 and split bucket 3
creating buckets 6 and 7; Toronto and Buffalo are moved to
bucket 7

4
g

Buffalo

6 7

TOR
BUF

hp385
g

• Insert Denver (.16) and Omaha (.19) into bucket 4’s overflow
area

Omaha

Denver
DEN
OMA

hp38

MECHANICS OF EXAMPLE OF SPIRAL HASHING

• Assume d = 2 and primary and overflow bucket capacity of 2

• A bucket is split whenever τ ≥ α = 0.66
• Initially, only bucket 1 exists

1
b

hp38

• Insert Chicago (.22) and Mobile (.25): τ=1 and split bucket 1
creating buckets 2 and 3; Chicago and Mobile are moved to
bucket 2

2
r

Mobile

Chicago

1 2

2 3

CHI
MOB

hp38

• Insert Toronto (.87) into bucket 3: τ=0.75 and split bucket 2
creating buckets 3 and 4; Chicago and Mobile are moved to
bucket 4

3
z

Toronto

5
4 5

TOR CHI
MOB

hp38

• Insert Buffalo (.84) into bucket 3: τ=0.67 and split bucket 3
creating buckets 6 and 7; Toronto and Buffalo are moved to
bucket 7

4
g

Buffalo

6 7

TOR
BUF

hp385
g

• Insert Denver (.16) and Omaha (.19) into bucket 4’s overflow
area

Omaha

Denver
DEN
OMA

hp386
v

• Insert Atlanta (.34) into bucket 5: τ=0.7 and split bucket 4
creating buckets 8 and 9; Denver is moved to bucket 8,
while Chicago, Mobile, and Omaha are moved to bucket 9
which causes it to overflow

Atlanta
9

8 9

CHI
MOB

DEN

OMA

ATL

hp38

MECHANICS OF EXAMPLE OF SPIRAL HASHING

• Assume d = 2 and primary and overflow bucket capacity of 2

• A bucket is split whenever τ ≥ α = 0.66
• Initially, only bucket 1 exists

1
b

hp38

• Insert Chicago (.22) and Mobile (.25): τ=1 and split bucket 1
creating buckets 2 and 3; Chicago and Mobile are moved to
bucket 2

2
r

Mobile

Chicago

1 2

2 3

CHI
MOB

hp38

• Insert Toronto (.87) into bucket 3: τ=0.75 and split bucket 2
creating buckets 3 and 4; Chicago and Mobile are moved to
bucket 4

3
z

Toronto

5
4 5

TOR CHI
MOB

hp38

• Insert Buffalo (.84) into bucket 3: τ=0.67 and split bucket 3
creating buckets 6 and 7; Toronto and Buffalo are moved to
bucket 7

4
g

Buffalo

6 7

TOR
BUF

hp385
g

• Insert Denver (.16) and Omaha (.19) into bucket 4’s overflow
area

Omaha

Denver
DEN
OMA

hp386
v

Atlanta
9

8 9

CHI
MOB

DEN

OMA

ATL

hp387
v

• Insert Miami (.33) into bucket 5: τ=0.67 and split bucket 5
creating buckets 10 and 11; Move Atlanta and Miami to
bucket 10

10 11

ATL
MIA

Miami

1110
11

hp39

COMPARISON OF SPIRAL AND LINEAR HASHING

• Main advantage of spiral hashing over linear hashing is
that the bucket being split is the one most likely to
overflow

• Disadvantages of spiral hashing:

1. the buckets that have been split are not reused

• overcome by using a mapping between logical and
physical addresses

2. expensive to calculate function y = dx

• overcome by use of an approximation

hp40
WHY SPIRAL?

• Can rewrite y = dx  as ρ = e jθ  using polar coordinates
which yields the equation of a spiral

• Ex: ρ = e (ln 2)•θ/2π

1
b

-10 -5 5 10 15

-10

-5

7
15

• Polar coordinates mean that the active buckets are always
within one complete arc of the spiral − i.e., θ = 2π

• Mechanics of a bucket split
1. let first active bucket be at a = e j•b  (i.e., θ = b )
2. last active bucket is at c = ej•(b+2π) − 1
3. bucket split means that the contents of the active

bucket at ρ = a (i.e., θ = b) are distributed into buckets
c + 1 through g where g = ej•(b+2π+φ) and φ is the
solution of a + 1 = ej•(b+φ) − i.e., φ = (ln (a + 1))/j − b

4. buckets a + 1 through g are now the active buckets
• Ex: d = 2

8
16

hp40
WHY SPIRAL?

• Can rewrite y = dx  as ρ = e jθ  using polar coordinates
which yields the equation of a spiral

• Ex: ρ = e (ln 2)•θ/2π

1
b

-10 -5 5 10 15

-10

-5

7
15

• Polar coordinates mean that the active buckets are always
within one complete arc of the spiral − i.e., θ = 2π

bucket at ρ = a (i.e., θ = b) are distributed into buckets
c + 1 through g where g = ej•(b+2π+φ) and φ is the
solution of a + 1 = ej•(b+φ) − i.e., φ = (ln (a + 1))/j − b

4. buckets a + 1 through g are now the active buckets
• Ex: d = 2

8
16

hp402
r

1. active buckets 6 through 11

hp40
WHY SPIRAL?

• Can rewrite y = dx  as ρ = e jθ  using polar coordinates
which yields the equation of a spiral

• Ex: ρ = e (ln 2)•θ/2π

1
b

-10 -5 5 10 15

-10

-5

7
15

• Polar coordinates mean that the active buckets are always
within one complete arc of the spiral − i.e., θ = 2π

bucket at ρ = a (i.e., θ = b) are distributed into buckets
c + 1 through g where g = ej•(b+2π+φ) and φ is the
solution of a + 1 = ej•(b+φ) − i.e., φ = (ln (a + 1))/j − b

4. buckets a + 1 through g are now the active buckets
• Ex: d = 2

8
16

hp402
r

1. active buckets 6 through 11

hp403
z

2. split bucket 6 to yield buckets 12 and 13

hp40
WHY SPIRAL?

• Can rewrite y = dx  as ρ = e jθ  using polar coordinates
which yields the equation of a spiral

• Ex: ρ = e (ln 2)•θ/2π

1
b

-10 -5 5 10 15

-10

-5

7
15

• Polar coordinates mean that the active buckets are always
within one complete arc of the spiral − i.e., θ = 2π

bucket at ρ = a (i.e., θ = b) are distributed into buckets
c + 1 through g where g = ej•(b+2π+φ) and φ is the
solution of a + 1 = ej•(b+φ) − i.e., φ = (ln (a + 1))/j − b

4. buckets a + 1 through g are now the active buckets
• Ex: d = 2

8
16

hp402
r

1. active buckets 6 through 11

hp403
z

2. split bucket 6 to yield buckets 12 and 13

hp404
g

3. active buckets 7 through 13

hp40
WHY SPIRAL?

• Can rewrite y = dx  as ρ = e jθ  using polar coordinates
which yields the equation of a spiral

• Ex: ρ = e (ln 2)•θ/2π

1
b

-10 -5 5 10 15

-10

-5

7
15

• Polar coordinates mean that the active buckets are always
within one complete arc of the spiral − i.e., θ = 2π

bucket at ρ = a (i.e., θ = b) are distributed into buckets
c + 1 through g where g = ej•(b+2π+φ) and φ is the
solution of a + 1 = ej•(b+φ) − i.e., φ = (ln (a + 1))/j − b

4. buckets a + 1 through g are now the active buckets
• Ex: d = 2

8
16

hp402
r

1. active buckets 6 through 11

hp403
z

2. split bucket 6 to yield buckets 12 and 13

hp404
g

3. active buckets 7 through 13

hp405
v

• Observations
1. insead of h(k) uniformly mapping key k into [0,1), use

hθ(k) to uniformly map k into [0,2π)
2. length of arc has constant value between successive

Related Posts