CS计算机代考程序代写 Java data structure algorithm CS61B

CS61B
Lecture 19: Multi-Dimensional Data
¡ñ Range-Finding and Nearest (1D Data)
¡ñ Multi-Dimensional Data Example
¡ñ QuadTrees for 2D Data
¡ñ KdTrees for Higher Dimensional Data
¡ñ Uniform Partitioning
datastructur.es

Range-Finding and Nearest
datastructur.es

Search Trees
So far we¡¯ve seen three different efficient implementations of a Map (or Set):
¡ñ Binary Search Tree.
¡ñ 2-3 Tree / 2-3-4 Tree / B-Tree.
¡ñ Red Black Tree.
These ¡°search tree¡± data structures support very fast insert, remove, and delete operations for arbitrary amounts of data.
¡ñ Requires that data can be compared to each other with some total order.
¡ñ We used the ¡°Comparable¡± interface as our comparison engine.
datastructur.es

Expanding the Power of our Set
There are other operations we might want to include in a Set. For example, we might add a select operation:
¡ñ select(int i): Returns the ith smallest item in the set. ¡ð select(0): 1
¡ð select(3): 6
{1, 4, 5, 6, 9, 11, 14, 17, 20}
datastructur.es

Expanding the Power of our Set
There are other operations we might want to include in a Set. For example, we might add a rank operation:
¡ñ rank(T x): Returns the ¡°rank¡± of x in the set (opposite of select). ¡ð rank(1): 0
¡ð rank(6): 3
{1, 4, 5, 6, 9, 11, 14, 17, 20}
datastructur.es

Expanding the Power of our Set
There are other operations we might want to include in a Set. For example, we might add a subSet operation:
¡ñ subSet(T from, T to): Returns all items between from and to.
¡ð subSet(4, 9): Returns {4, 5, 6, 9}.
¡ð subSet(3, 12): Returns {4, 5, 6, 9, 11}.
¡ð subSet(12, 13): Returns {}.
{1, 4, 5, 6, 9, 11, 14, 17, 20}
datastructur.es

Expanding the Power of our Set
There are other operations we might want to include in a Set.
If we have a notion of distance between items, we might add a nearest operation:
¡ñ nearest(T x): Returns the value closest to x. ¡ð nearest(6): Returns 6.
¡ð nearest(8): Returns 9.
¡ð nearest(10): Returns 9 or 11.
{1, 4, 5, 6, 9, 11, 14, 17, 20}
datastructur.es

Implementing Fancier Set Operations with a BST
It turns out that a BST can efficiently support the select, rank, subSet, and nearest operations.
nearest returns the item that is closest. Examples:
¡ñ nearest(6): Returns 6.
¡ñ nearest(8): Returns 9.
¡ñ nearest(10): Returns 9 or 11.
Challenge: How would you find nearest(N)?
¡ñ Just search for N and record closest item seen!
15
11
6 17
4 9 14 20
datastructur.es

Implementing Fancier Set Operations with a BST
It turns out that a BST can efficiently support the select, rank, subSet, and nearest operations.
nearest returns the item that is closest. Examples:
¡ñ nearest(6): Returns 6.
¡ñ nearest(8): Returns 9.
¡ñ nearest(10): Returns 9 or 11.
Challenge: How would you find nearest(N)?
11
6 17
4 9 14 20
15
¡ñ Just search for N and record closest item seen!
¡ñ Example: nearest(5.4): 11 is best seen so far.
datastructur.es

Implementing Fancier Set Operations with a BST
It turns out that a BST can efficiently support the select, rank, subSet, and nearest operations.
nearest returns the item that is closest. Examples:
¡ñ nearest(6): Returns 6.
¡ñ nearest(8): Returns 9.
¡ñ nearest(10): Returns 9 or 11.
Challenge: How would you find nearest(N)?
11
6 17
4 9 14 20
15
¡ñ Just search for N and record closest item seen!
¡ñ Example: nearest(5.4): 6 is best seen so far.
datastructur.es

Implementing Fancier Set Operations with a BST
It turns out that a BST can efficiently support the select, rank, subSet, and nearest operations.
nearest returns the item that is closest. Examples:
¡ñ nearest(6): Returns 6.
¡ñ nearest(8): Returns 9.
¡ñ nearest(10): Returns 9 or 11.
Challenge: How would you find nearest(N)?
11
6 17
4 9 14 20
15
¡ñ Just search for N and record closest item seen!
¡ñ Example: nearest(5.4): 6 is still best so far.
datastructur.es

Implementing Fancier Set Operations with a BST
It turns out that a BST can efficiently support the select, rank, subSet, and nearest operations.
Similar ideas can be used for select, rank, subSet, and many more. ¡ñ Won¡¯t discuss in lecture.
11
6 17
4 9 14 20
15
datastructur.es

Sets and Maps on 2D Data
So far we¡¯ve only discussed ¡°one dimensional data¡±. That is, all data could be compared under some total order. Examples:
¡ñ 1 < 3 < 6 < 9 < 15 ¡ñ ¡°cat¡± < ¡°doghouse¡± < ¡°if¡± < ¡°zebra¡± [using alphabetical order] ¡ñ ¡°if¡± < ¡°cat¡± < ¡°zebra¡± < ¡°doghouse¡± [using string length] But not all data can be compared along a single dimension. ¡ñ We¡¯ll see that search trees require some design tweaks to function efficiently on multi-dimensional data. datastructur.es Multi-dimensional Data datastructur.es Motivation: 2D Range Finding and Nearest Neighbors Suppose we want to perform operations on a set of Body objects in 2D space? datastructur.es Motivation: 2D Range Finding and Nearest Neighbors Suppose we want to perform operations on a set of Body objects in 2D space? ¡ñ 2D Range Searching: How many objects are in the highlighted rectangle? Note: 2D range search is a generalization of the ¡°subSet¡± method from earlier. datastructur.es Motivation: 2D Range Finding and Nearest Neighbors Suppose we want to perform operations on a set of Body objects in 2D space? ¡ñ 2D Range Searching: How many objects are in the highlighted rectangle? ¡ñ Nearest: What is the closest object to the space horse? Ideally, we¡¯d like to store our data in a format (like a BST) that allows more efficient approaches than just iterating over all objects. Let¡¯s see what goes wrong if we try to build a BST of 2 dimensional data. Note: 2D range search is a generalization of the ¡°subSet¡± method from earlier. datastructur.es Building Trees of Two Dimensional Data Consider trying to build a BST of Body objects in 2D space. ¡ñ earth.xPos = 1.5, earth.yPos = 1.6 ¡ñ mars.xPos = 1.0, mars.yPos = 2.8 For a BST, we need some notion of ¡°less than¡±: ¡ñ In xPos, Mars < Earth (tree on left). ¡ñ In yPos, Mars > Earth (tree on right).
1.5, 1.6
1.0, 2.8
(1.0, 2.8)
(1.5, 1.6)
1.5, 1.6
1.0, 2.8
X-Based Tree
Y-Based Tree
datastructur.es

Building Trees of Two Dimensional Data
Consider trying to build a BST of Body objects in 2D space.
¡ñ earth.xPos = 1.5, earth.yPos = 1.6
¡ñ mars.xPos = 1.0, mars.yPos = 2.8
For a BST, we need some notion of ¡°less than¡±:
¡ñ In xPos, Mars < Earth (tree on left). ¡ñ In yPos, Mars > Earth (tree on right).
(1.0, 2.8)
(1.5, 1.6)
Could just pick one, but you¡¯re losing some of your information about ordering.
¡ñ Will make nearest/2D range finding slow.
1.5, 1.6
1.0, 2.8
1.5, 1.6
1.0, 2.8
X-Based Tree
Y-Based Tree
datastructur.es

Example of a 2D Range-Finding Query
Suppose we want to know ¡°What are all the points with x-coordinate less than -1.5? ¡ñ Equivalent to ¡°What¡¯s in the green rectangle?¡±
As humans, we can easily visually identify them (E and F).
(-3, 2.5)
B
(2, 2)
A
(0, 1)
C
D
(1, 0) (-1, -1)
E
(-2, -2)
F
datastructur.es

Example of a 2D Range-Finding Query
If we have the points stored in a list, then we¡¯d have to iterate over the entire list. ¡ñ Slow! Requires looking at every single point.
Let¡¯s see what happens if we try to store them in a BST ordered by X, then Y coordinate.
(-3, 2.5)
B
(2, 2)
A
(0, 1)
C
D
(1, 0) (-1, -1)
E
(-2, -2)
F
datastructur.es

Building Trees of Two Dimensional Data
Example: Suppose we put points into a BST map ordered by x-coordinate.
¡ñ put((-1, -1), A)
¡ñ put(( 2, 2), B)
¡ñ put(( 0, 1), C)
¡ñ put(( 1, 0), D)
¡ñ put((-2, -2), E)
¡ñ put((-3, 2.5), F)
A (-1, -1)
E (-2, -2) B (2, 2)
F (-3, 2.5) C (0, 1)
(-3, 2.5)
B
(2, 2)
A
(0, 1)
C
D
(1, 0) (-1, -1)
E
(-2, -2)
F
D (1, 0)
datastructur.es

Building Trees of Two Dimensional Data
Challenge: Given the BST below, identify all the points with x-coordinate less than -1.5.
How can you avoid looking at every node?
A (-1, -1)
E (-2, -2) B (2, 2)
F (-3, 2.5) C (0, 1)
D (1, 0)
datastructur.es

Building Trees of Two Dimensional Data
Challenge: Given the BST below, identify all the points with x-coordinate less than -1.5.
¡ñ We can simply work our way down the tree, ignoring pursuing any options that are impossible.
¡ñ For example, no need to go down the right child of A because no point over there could possible have x coordinate < -1.5. A (-1, -1) No need to go here. This process of cutting off a tree search early is called ¡°Pruning¡±. E (-2, -2) F (-3, 2.5) B (2, 2) C (0, 1) D (1, 0) datastructur.es Building Trees of Two Dimensional Data Challenge: Given the BST below, identify all the points with x-coordinate less than -1.5. ¡ñ We can simply work our way down the tree, ignoring pursuing any options that are impossible. ¡ñ For example, no need to go down the right child of A because no point over there could possible have x coordinate < -1.5. A (-1, -1) No need to go here. This process of cutting off a tree search early is called ¡°Pruning¡±. E (-2, -2) F (-3, 2.5) B (2, 2) C (0, 1) D (1, 0) datastructur.es Building Trees of Two Dimensional Data If we¡¯d used y coordinate for comparisons, we¡¯d get a slightly different BST. ¡ñ put((-1, ¡ñ put(( 2, ¡ñ put(( 0, ¡ñ put(( 1, ¡ñ put((-2, ¡ñ put((-3, -1), A) 2), B) 1), C) 0), D) -2), E) 2.5), F) (-3, 2.5) B (2, 2) A (0, 1) C D (1, 0) (-1, -1) E (-2, -2) F A (-1, -1) E (-2, -2) B (2, 2) C (0, 1) F (-3, 2.5) D (1, 0) datastructur.es Building Trees of Two Dimensional Data Challenge: Given the BST below, identify all the points with x-coordinate less than -1.5. How can you avoid looking at every node? A (-1, -1) E (-2, -2) B (2, 2) C (0, 1) F (-3, 2.5) D (1, 0) datastructur.es Building Trees of Two Dimensional Data Challenge: Given the BST below, identify all the points with x-coordinate less than -1.5. In this case, pruning was impossible! A (-1, -1) E (-2, -2) B (2, 2) C (0, 1) F (-3, 2.5) D (1, 0) datastructur.es Spatial Partitioning / Rectangle Intersection Interpretation In the X-oriented tree, the ¡°A¡± node partitioned the universe into a left and right side. ¡ñ Left: All points with X < -1. ¡ñ Right: All points with X > -1.
Right side doesn¡¯t intersect query rectangle.
(-3, 2.5)
E
(-2, -2)
F
B
(2, 2)
(0, 1)
C
D
(1, 0) (-1, -1)
A
A (-1, -1)
No need to go here.
E (-2, -2)
F (-3, 2.5)
B (2, 2)
C (0, 1)
D (1, 0)
datastructur.es

Spatial Partitioning / Rectangle Intersection Interpretation
In the Y-oriented tree, the ¡°A¡± node partitioned the universe into a top and bottom side.
¡ñ Bottom (left): All points with Y < -1. ¡ñ Top (right): All points with Y > -1.
Both sides intersect query rectangle.
(-3, 2.5)
F
(0, 1)
C
D
B
(2, 2)
(1, 0)
A
E
(-1, -1) (-2, -2)
A (-1, -1)
E (-2, -2)
Can¡¯t prune either of these!
B (2, 2)
C (0, 1)
D (1, 0)
F (-3, 2.5)
datastructur.es

Spatial Partitioning / Rectangle Intersection Interpretation
In the Y-oriented tree, the ¡°A¡± node partitioned the universe into a top and bottom side.
¡ñ Bottom (left): All points with Y < -1. ¡ñ Top (right): All points with Y > -1.
Both sides intersect query rectangle.
F
(-3, 2.5)
(0, 1)
C
D
(1, 0) (-1, -1)
(2, 2)
B
A
E
(-2, -2)
A (-1, -1)
E (-2, -2)
Can¡¯t prune either of these!
B (2, 2)
C (0, 1)
D (1, 0)
F (-3, 2.5)
datastructur.es

Building Trees of Two Dimensional Data
Consider trying to build a BST of Body objects in 2D space.
¡ñ earth.xPos = 1.5, earth.yPos = 1.6
¡ñ mars.xPos = 1.0, mars.yPos = 2.8
For a BST, we need some notion of ¡°less than¡±:
¡ñ In xPos, Mars < Earth (tree on left). ¡ñ In yPos, Mars > Earth (tree on right).
(1.0, 2.8)
(1.5, 1.6)
Could just pick one, but you¡¯re losing some of your information about ordering. ¡ñ Results in suboptimal pruning.
Coming up next, we¡¯ll see two efficient approaches to storing 2D data in a tree.
datastructur.es

QuadTrees
datastructur.es

The QuadTree
A QuadTree is the simplest solution conceptually. ¡ñ Every Node has four children:
¡ð Top left, a.k.a. northwest.
¡ð Top right, a.k.a. northeast.
¡ð Bottom left, a.k.a. southwest.
¡ð Bottom right, a.k.a. southeast.
So if we insert Earth, then Mars, we have the unique tree below:
1.5, 1.6
(1.0, 2.8)
(1.5, 1.6)
NW
NE
1.0, 2.8
SW SE
datastructur.es

QuadTree Insertion Demo
Below: Quadtree Representation of 5 objects in 2D space.
¡ñ
Insertion Demo: Link
(2, 2)
(-1, -1) (-2, -2)
A
(0, 1)
D
(1, 0)
B
C
E
NW
A
B
SE
SW
E
C
D
NE
SW
SE

QuadTrees
Quadtrees are a form of ¡°spatial partitioning¡±.
¡ñ
Quadtrees: Each node ¡°owns¡± 4 subspaces.
¡ð ¡ð
NW
B
A
Space is more finely divided in regions where there are more points. Results in better runtime in many circumstances.
E
(-2, -2)
C
(0, 1)
(-1, -1)
D
(1, 0)
B
(2, 2)
A
NE
SE
SW
E
C
D
SW
D ¡°owns¡± the 4 blue subspaces.
SE
16 subspaces of varying sizes.

QuadTree Range Search Demo
Quadtrees allow us to prune when performing a rectangle search.
¡ñ ¡ñ
Simple idea: Prune (i.e. don¡¯t explore) subspaces that don¡¯t intersect the query rectangle.
Range Search Demo: Link
B
(0, 1)
(2, 2)
E
A
C
(-1, -1)
D
(1, 0)
(-2, -2)
NW
A
B
SE
SW
E
C
D
NE
SW
SE
Only item that in the rectangle is B.

QuadTree Range Search Demo
Quadtrees allow us to prune when performing a rectangle search.
¡ñ
For the example below, we were able to prune all of the dotted links.
B
(0, 1)
(2, 2)
E
A
C
(-1, -1)
D
(1, 0)
(-2, -2)
NW
A
B
SE
SW
E
C
D
NE
SW
SE
results = [B]

Higher Dimensional Data
datastructur.es

3D Data
Suppose we want to store objects in 3D space.
¡ñ Quadtrees only have four directions, but in 3D, there are 8.
One approach: Use an Oct-tree or Octree. ¡ñ Very widely used in practice.
datastructur.es

Even Higher Dimensional Space
You may want to organize data on a large number of dimensions.
¡ñ Example: Want to find songs with the following features:
¡ð Length between 3 minutes and 6 minutes.
¡ð Between 1000 and 20,000 listens.
¡ð Between 120 to 150 BPM.
¡ð Were recorded after 2004.
In these cases, one somewhat common solution is a k-d tree.
¡ñ Fascinating data structure that handles arbitrary numbers of dimensions. ¡ð k-d means ¡°k dimensional¡±.
¡ñ For the sake of simplicity in lecture, we¡¯ll use 2D data, but the idea generalizes naturally.
datastructur.es

K-d Trees
k-d tree example for 2-d:
¡ñ Basic idea, root node partitions entire space into left and right (by x).
¡ñ All depth 1 nodes partition subspace into up and down (by y).
¡ñ All depth 2 nodes partition subspace into left and right (by x).
Let¡¯s see an example of how to build a k-d tree. ¡ñ K-d tree insertion demo.
(1, 5)
(4, 5)
(4, 4) (3, 3)
E
C
F
A
(2, 3)
D
B
(4, 2)
datastructur.es

K-d Trees
k-d tree example for 2-d:
¡ñ Basic idea, root node partitions entire space into left and right (by x).
¡ñ All depth 1 nodes partition subspace into up and down (by y).
¡ñ All depth 2 nodes partition subspace into left and right (by x).
Let¡¯s see an example of how to build a k-d tree. ¡ñ K-d tree insertion demo.
Each point owns 2 subspaces.
¡ñ Similar to a quadtree.
¡ñ Example: D owns the two subspaces shown.
¡ð The top subspace is infinitely large.
(1, 5)
(4, 5)
(4, 4)
E
A
(2, 3)
(3, 3)
D
C
F
B
(4, 2)
datastruct
ur.es

K-d Trees and Nearest Neighbor
Just like spatial partitioning and quadtrees, k-d trees support an efficient nearest method.
¡ñ Optimization: Do not explore subspaces that can¡¯t possible have a better answer than your current best.
Example: Find the nearest point to (0, 7).
¡ñ Answer is (1, 5).
¡ñ K-d tree nearest demo.
(1, 5) (0, 7)
(4, 5)
(4, 4) (3, 3)
E
C
F
A
(2, 3)
D
B
(4, 2)
datastructur.es

Nearest Pseudocode
nearest(Node n, Point goal, Node best):
¡ñ If n is null, return best
¡ñ If n.distance(goal) < best.distance(goal), best = n ¡ñ If goal < n (according to n¡¯s comparator): ¡ö goodSide = n.¡±left¡±Child ¡ö badSide = n.¡±right¡±Child ¡ð else: ¡ö goodSide = n.¡±right¡±Child ¡ö badSide = n.¡±left¡±Child ¡ñ best = nearest(goodSide, goal, best) ¡ñ If bad side could still have something useful ¡ð best = nearest(badSide, goal, best) ¡ñ return best Nearest is a helper method that returns whichever is closer to goal out of the following two choices: 1. best 2. all items in the subtree starting at n This is our pruning rule. datastructur.es Inefficient Nearest Pseudocode nearest(Node n, Point goal, Node best): ¡ñ If n is null, return best ¡ñ If n.distance(goal) < best.distance(goal), best = n ¡ñ best = nearest(n.leftChild, goal, best) ¡ñ best = nearest(n.rightChild, goal, best) ¡ñ return best Consider implementing this inefficient version first. ¡ñ Easy to implement. ¡ñ Warning: Much slower than the NaivePointSet! ¡ñ Once you¡¯ve verified this is working, you can implement the more efficient version on the previous slide. datastructur.es Uniform Partitioning datastructur.es Uniform Partitioning To end today, let¡¯s discuss a totally different approach called ¡°Uniform partitioning¡±. Unlike other approaches, isn¡¯t based on a tree at all. Simplest idea: Partition space into uniform rectangular buckets (also called ¡°bins¡±). ¡ñ Example on right for 4x4 grid of such buckets. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 datastructur.es Uniform Partitioning Questions: ¡ñ Which buckets do we need to iterate over to find all points in the green range? 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 datastructur.es Uniform Partitioning: Range Finding Questions: ¡ñ Which buckets do we need to iterate over to find all points in the green range? ¡ð Only boxes 5, 6, 9, and 10. Note: Algorithm is still ¦È(N), but it¡¯s faster than iterating over all points. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 datastructur.es Uniform Partitioning Question: ¡ñ Which buckets do we need to look at to solve nearest( )? Assume we¡¯re looking at distance to the red dot in the middle. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 datastructur.es Uniform Partitioning: Nearest Question: ¡ñ Which buckets do we need to look at to solve nearest( )? After looking at bucket 1, we can prune 0, 4, 5, and 6 because there are no possible points in these boxes that could be closer than the closest in 1. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 datastructur.es Uniform Partitioning: Nearest Question: ¡ñ Which buckets do we need to look at to solve nearest( )? After looking at bucket 1, we can prune 0, 4, 5, and 6 because there are no possible points in these boxes that could be closer than the closest in 1. Overall runtime is still ¦¨(N). ¡ñ ... but maybe up to 16 times faster than iterating over all points. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 datastructur.es Uniform vs. Hierarchical Partitioning All of our approaches today boil down to spatial partitioning. ¡ñ Uniform partitioning (perfect grid of rectangles). ¡ñ Quadtrees and KdTrees: Hierarchical partitioning. ¡ð Each node ¡°owns¡± 4 and 2 subspaces, respectively. ¡ð Space is more finely divided into subspaces where there are more points. E A C B D E (-2, -2) C (0, 1) (-1, -1) D (1, 0) B (2, 2) A D ¡°owns¡± the 4 blue subspaces. 25 subspaces of uniform sizes. 16 subspaces of varying sizes. Uniform Partitioning vs. Quadtrees and Kd-Trees Uniform partitioning is easier to implement than a Quadtree or Kd-Tree. ¡ñ May be good enough for many applications. In our next lecture we¡¯ll see a very clever 1D version of this idea called a ¡°hash table¡±. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 datastructur.es Summary and Applications datastructur.es Multi-Dimensional Data Summary Set operations can be more complex than just contains, e.g.: ¡ñ Range Finding: What are all the objects inside this (rectangular) subspace? ¡ñ Nearest: What is the closest object to a specific point? ¡ð Note: Can be generalized to k-nearest. The most common approach is spatial partitioning: ¡ñ Quadtree: Generalized 2D BST where each node ¡°owns¡± 4 subspaces. ¡ñ K-d Tree: Generalized k-d BST where each node ¡°owns¡± 2 subspaces. ¡ð Dimension of ownership cycles with each level of depth in tree. ¡ñ Uniform Partitioning: Partition space into uniform chunks. Spatial partitioning allows for pruning of the search space. datastructur.es Application #1: Murmuration You may not know this, but starlings murmurate: ¡ñ Real starlings: https://www.youtube.com/watch?v=eakKfY5aHmY ¡ñ Simulated starlings: https://www.youtube.com/watch?v=nbbd5uby0sY ¡ñ Efficient simulation means ¡°birds¡± have to find their nearest neighbors efficiently. datastructur.es Application #2: NBody Simulation Your NBody simulation from Project 0 was a quadratic algorithm. ¡ñ Every object has to calculate the force from every other object. ¡ñ This is very silly if the force exerted is small, e.g. individual star in the Andromeda galaxy pulling on the sun. One optimization is called Barnes-Hut. Basic idea: ¡ñ Represent universe as a quadtree (or octree) of objects. ¡ñ If a node is sufficiently far away from X, then treat that node and all of its children as a single object. ¡ð Example: Andromeda is very far away, so treat it as a single object for the purposes of force calculation on our sun. ¡ñ Visualization: https://www.youtube.com/watch?v=ouO6wmKqycc datastructur.es Citations Title figure: A thing I made (one of the first Java programs I wrote during my teaching career) datastructur.es

Related Posts