COMP2100/COMP6442
Data Structures Part I
– Lecture 2]
Kin Chau [
Sid Chi
1
Why Data Structures
• No matter how efficient the programming language is, if the chosen data structure is not appropriate, the performance still suffers
• Data Structures are universal!
• What is the purpose of data structures?
• Data structures facilitate data management and retrieval • What is the best data structure?
• It can be a linked list or a tree, depends on applications
• Most common data structures in Java are available in the Java Collections Framework
• java.util package: Collection, List, Hashtable, TreeMap
2
Types of Data Structures
• Primitive
• Atomic/simplest form to store
data/predefined by the language
• Integer, float, char, string
• Non-primitive
• Non-atomic/multidimensional • Linear
• Array, linked list, stack, hashtable • Non-linear
• Heap, tree, octree, graph • Approximate
• Possible errors but space/time-efficient
Data Structure
Primitive
Linear
Non-Primitive
Non-Linear
Static
Array
Linked List
Dynamic
Tree
Octree
Bloom Filter
Count-min Sketch
Stack
Approximate
Hashtable
3
Goals of This Lecture
• Learn basic data structures • Linear data structures
• Linked list, stack, hash table, • Approximate data structures
• Bloom filter, count-min sketch • Non-linear data structures
• Binary search tree
• Search, traversing, successor and predecessor, insertion, deletion
• Something else to learn
• Applications of different data structures
4
Linked List
• Items are attached adjacently
• Traversing needs to be sequential
• Items are arranged in sorted sequence
• Linked list can grow dynamically
• Use a pointer to identify the next and previous items in the list
• Arrays need one contiguous memory block
• Do not need a contiguous memory block
• Inefficient searching
• Need to at most look at every item
Array
d1 d2 d3 ……. dn
Linked List
d1 d2 d3 … dn
NIL
5
Stack
• Items are stored in a stack
• Items are popped in and pushed out of the
stack in a last-in-first-out manner
• Use a pointer to the next item in the stack • Do not need a contiguous memory block • Inefficient searching
• Applications:
• Syntax parsing
• Backtracking (for depth-first search)
d2
d1 d1 d1
d2
6
Demo
https://www.cs.usfca.edu/~galles/visualization/StackLL.html
7
HashTable
• Items are mapped to indexes by a hash function deterministically
• The number of indexes is a constant, independent of the number of items
• Items with the same index are stored together (e.g., in a linked list)
• Increase searching efficiency
• Common hash functions
• Universal hash, one-way hash (SHA-2 (SHA-256)) • Collision-resistant hash function
• Each item has a uniform probability to be mapped to any one of the indexes
Index01 02 03 04 05
8
Hash Function
• Hash function h to map the universe U of all keys into {0, 1, …, m-1}: • Collision: When an item to be inserted maps to an already occupied slot
• Resolving collisions by chaining using listed links
k1 k
K k 5 h(49)=h(86)=h(52)=i
0
0
i
0
0
m-1
0
h(k1) h(k4) 0
Collision
0
h(k3) 0
m-1
k4
2 k3 h(k2) = h(k5)
U
49
86
52
NIL
9
Hash Function
• Assume all keys are integers, and define hash function:
h(k) = k mod m
• Caveat: If m = 2r, then the hash doesn’t even depend on all the bits of k:
• If k = 1011000111011010 and r = 6, then h(k) = 011010 • Randomized strategy
• Let m be prime. Decompose key k into r + 1 integers, each with value in the set {0, 1, …, m-1}. That is, let k = k0, k1, …, kr, where 0 ki < m
• Forexample,ki=kmodmi,forsome0mi
Else
BST-Search[x.right, Key] Return null
26
24 27 found
200
18
28
190
213
22
Traversing BST
• How to enumerate all the nodes (traverse tree from the root)?
• It is used to print out the data in a tree in a certain order
• Or apply some operations to each node • Recursive traversing methods:
• In-order traversing
• Pre-order traversing • Post-order traversing
56
26
200
18
28
190
213
24 27
23
In
–
order Traversal
Inorder-BST-Walk[x] If x null Then
Inorder-BST-Walk[x.left] Print x.key Inorder-BST-Walk[x.right]
Output:
12, 18, 24, 26, 27, 28, 56, 190, 200, 213
56
• Inorder-BST-Walk[56]; //key=56,xnull
• Inorder-BST-Walk[26]; //left=26,xnull 26 • Inorder-BST-Walk[18]; //left=18,xnull
• Inorder-BST-Walk[12]; //left=12,xnull
• Inorder-BST-Walk[null]; //left=null
• Print 12; //output
• Inorder-BST-Walk[null]; //right=null
• ……
200
18 28 12 24 27
190
213
24
Pre
–
order Traversal
Preorder-BST-Walk[x] If x null Then
Print x.key Preorder-BST-Walk[x.left] Preorder-BST-Walk[x.right]
Output:
56, 26, 18, 12, 24, 28, 27, 200, 190, 213
56
• Preorder-BST-Walk[56]; //key=56,xnull
• Print 56; //output 26 • Preorder-BST-Walk[26]; //left=26,xnull
• Print 26; //output
• Preorder-BST-Walk[18]; //left=18,xnull
• Print 18; //output
• Preorder-BST-Walk[12]; //left=12,xnull
• ……
200
18 28 12 24 27
190
213
25
Post
–
order Traversal
Postorder-BST-Walk[x] If x null Then
Postorder-BST-Walk[x.left] Postorder-BST-Walk[x.right] Print x.key
Output:
12, 24, 18, 27, 28, 26, 190, 213, 200, 56
56
• Postorder-BST-Walk[56]; //key=56,xnull
• Postorder-BST-Walk[26]; //left=26,xnull 26 • Postorder-BST-Walk[18]; //left=18,xnull
• Postorder-BST-Walk[12]; //left=12,xnull
• Postorder-BST-Walk[null]; //left=null
• Postorder-BST-Walk[null]; //right=null
• Print 12; //output
• Postorder-BST-Walk[24]; //right:24,xnull ……
200
18 28 12 24 27
190
213
26
Successor and Predecessor
• Successor
• Node y is the successor of node x, if y.key is the smallest key greater than x.key
• The successor of the largest key is NIL • Predecessor
• Node y is the predecessor of node
x, if y.key is the largest key smaller than x.key 12
• The predecessor of the smallest key is NIL
56
predecessor
26
200
18
28
190
213
24
27
x
successor
27
Finding Successor
• Finding the successor consists of two cases
Case 1: If node x has a non-empty right subtree, then
• x’s successor is the minimum in the right subtree of x
Case 2: If node x has an empty right subtree, then
• As long as we move up the tree, we can find bigger keys
• In other words, x’s successor y, is the lowest ancestor of x whose left child is also an ancestor of x
28
Finding Successor
• If x.right null, the successor is the smallest in x’s right subtree
• Otherwise, the successor is an ancestor
x
56
26 successor 200
BST-Successor[x]
If x.right null Then
Return BST-Minimum[x.right]
y x.parent
While y null and x = y.right do
xy
y y.parent Return y
18
28 190
213
12
24 27
29
Finding Successor
• If x.right null, the successor is the smallest in x’s right subtree
• Otherwise, the successor is an ancestor
x
56
26 successor 200
BST-Successor[x]
If Then BST-
]
x.right
null
Return
Minimum[x.right
y x.parent
While y null and x = y.right do
xy
y y.parent Return y
18
28 190
213
12
24 27
30
Finding Successor
• If x.right null, the successor is the smallest in x’s right subtree
• Otherwise, the successor is an ancestor
18
successor
56
26 x 200 28 190
213
BST-Successor[x]
If x.right null Then
Return BST-Minimum[x.right]
y x.parent
While y null and x = y.right do
xy
y y.parent Return y
12
24 27
31
Finding Successor
• If x.right null, the successor is the smallest in x’s right subtree
• Otherwise, the successor is an ancestor
18
successor
56
26 x 200 28 190
213
BST-Successor[x]
If x.right null Then
Return BST-Minimum[x.right]
y x.parent
While y null and x = y.right do
xy
y y.parent Return y
x
y
28
26
12
24 27
32
Finding Successor
• If x.right null, the successor is the smallest in x’s right subtree
• Otherwise, the successor is an ancestor
18
successor
56
26 x 200 28 190
213
BST-Successor[x]
If x.right null Then
Return BST-Minimum[x.right]
y x.parent
While y null and x = y.right do
xy
y y.parent Return y
x
y
28
26
26
56
12
24 27
33
Finding Predecessor
• If x.left null, the predecessor is the largest in x’s left subtree
• Otherwise, the predecessor is an ancestor
18
x
56
26 predecessor 28 190
24 27
200
213
BST-Predecessor[x]
If x.left null Then
Return BST-Maximum[x.left]
y x.parent
While y null and x = y.left do
xy
y y.parent Return y
12
34
Finding Predecessor
• If x.left null, the predecessor is the largest in x’s left subtree
• Otherwise, the predecessor is an ancestor
18
predecessor
56
26 x200 28 190
213
BST-Predecessor[x]
If x.left null Then
Return BST-Maximum[x.left]
y x.parent
While y null and x = y.left do
xy
y y.parent Return y
12
24 27
35
BST Insertion
• Ensure the binary-search-tree property holds after insertion • A new key is always inserted at the leaf node
56
56
26
18 28 24 27
200
26
200
190
213
18
28
190 30
213
12
12
24 27
36
BST Insertion
BST-Insert[T,z] y null
x T.root
// Find a suitable leave
While x null Do yx
If z.key < x.key Then x x.left
Else x x.right
z.parent y
If y = null Then T.root z
// Update parent’s child pointer
Else If z.key < y.key Then y.left z
Else y.right z
56
26
200
18
28
190
30
213
12
24 27
37
BST Insertion
z
x
y
30
56
null
BST-Insert[T,z] y null
x T.root
// Find a suitable leave
While x null Do yx
If z.key < x.key Then x x.left
Else x x.right
z.parent y
If y = null Then T.root z
// Update parent’s child pointer
Else If z.key < y.key Then y.left z
Else y.right z
56
26
200
18
28
190
30
213
12
24 27
38
BST Insertion
z
x
y
30
56
null
26
56
BST-Insert[T,z] y null
x T.root
// Find a suitable leave
While x null Do yx
If z.key < x.key Then x x.left
Else x x.right
z.parent y
If y = null Then T.root z
// Update parent’s child pointer
Else If z.key < y.key Then y.left z
Else y.right z
56
26
200
18
28
190
30
213
12
24 27
39
BST Insertion
z
x
y
30
56
null
26
56
28
26
BST-Insert[T,z] y null
x T.root
// Find a suitable leave
While x null Do yx
If z.key < x.key Then x x.left
Else x x.right
z.parent y
If y = null Then T.root z
// Update parent’s child pointer
Else If z.key < y.key Then y.left z
Else y.right z
56
26
200
18
28
190
30
213
12
24 27
40
BST Insertion
z
x
y
30
56
null
26
56
28
26
28
BST-Insert[T,z] y null
x T.root
// Find a suitable leave
While x null Do yx
If z.key < x.key Then x x.left
Else x x.right
z.parent y
If y = null Then T.root z
// Update parent’s child pointer
Else If z.key < y.key Then y.left z
Else y.right z
56
26
200
18
28
190
30
213
12
24 27
41
BST Deletion
• BST-Delete[T,x] • Case 0:
• If x has no child • Case 1:
• If x has one child • Case 2:
• If x has two children (i.e., substree)
42
BST Deletion
• Example • Case 0
• Remove 12 • Case 1
• Remove 28
• Set [27].parent 26 • Set [26].right 27
• Case 2
• Remove 26
• Replace 26 by 27
• Remove 27 from subtree
56
26
18 28 24 27
200
190
213
12
43
BST Deletion
• Example
• Case 2:
• Remove 26
• Replace 26 with 28
• Remove 28 from subtree (Case 1)
56
26
200
28
12 24 30
18
190
213
44
BST Deletion
• Basic Idea:
• Case 0:
• If x has no children • Then remove x
• Case 1:
• If x has one child
• Then make x.parent point to the child
• Case 2:
• If x has two children (subtrees)
• Then replace x with its successor
• Delete the successor in subtree, which gives Case 0 or 1
45
Exercise
• Consider the following operations to binary search tree • Insert 12, 43, 34, 11, 44, 1, 4
• Delete 44, 1
• Insert 55, 23, 11, 13
• Delete 12, 43
• What is the tree height of the final tree?
• What is the key of the root of the final tree?
46
Implement a BST
Inner Class Node
public class BST {
public class Node{ Integer key;
// Node key // Left child // Right child // Parent node
}
Node root;
Node left;
Node right;
Node parent;
Node(Integer key){ . . . }
public Node find(Integer key) { . . . } public Node insert(Integer key) { . . . } public void delete(Integer key) { . . . }
}
47
Implement a BST
public abstract class BinaryTree
public abstract boolean isEmpty(); // check if the tree is empty public abstract T biggest(); // find the biggest element in the tree public abstract T smallest(); // find the smallest element in the
public abstract boolean find(T d); // check if the element is in the tree public abstract BinaryTree
// add an element to the tree, this returns the new/modified tree
public abstract BinaryTree
// remove an element from the tree, this return the new/modified tree
}
48
Demo
https://www.cs.usfca.edu/~galles/visualization/BST.html
49
Octree for 3D Space Data
• How to record 3D spatial data?
• e.g., from autonomous vehicle laser
sensors
• Octree is tree-based data structure for 3D point data
• 3D space is sub-dividing into 8 spatial cells recursively
• Each cell can be represented by a node in a tree
• There are 8 child nodes of each node, which are the sub-divided cells
autonomous vehicle laser sensor data
50
Octree for 3D Space Data
• Consider a set of 3D points • Each point occupies in the
3D space
• Octree is a tree-based data
structure for 3D points
• A node in Octree has a binary value {0, 1}
• An occupied cell contains a point or a set of points, and is labeled by ‘1’, otherwise is labeled by ‘0’
Unoccupied
(𝑥1,𝑦1,𝑧1)
Geometric representation
1
01000000
cell
Occupied cell
10000100
01000100100001
Octree representation
00
51
Summary
• Linear data structures
• Linked list, stack, hashtable
• Approximate data structures • Bloom filter, count-min sketch
• Non-linear data structures • Binary search tree
• Search and traversing
• Successor and predecessor • Insertion
• Deletion
• Heap, octree
52
Reference
• Visualizations
• https://www.cs.usfca.edu/~galles/visualization/StackLL.html
• https://www.cs.usfca.edu/~galles/visualization/HashTable.html • https://www.cs.usfca.edu/~galles/visualization/BST.html
• Reference:
• Chapters 12-13 in Introduction to Algorithms (by Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest)
53
54