Slide 1
Data Structures and Abstractions
Binary Search Trees
Lecture 29
*
Introduction to ADS Sorted Data Stores
As pointed out in the earlier lecture, trees are used for problem solving, game playing, virtual reality and data storage, amongst other things.
When used for data storage they are always built so that the data is sorted as it is inserted.
We will be looking at several different sorted trees including Binary Search Trees, AVL Trees, Multiway Trees, B-Trees and B+ trees. [1]
In later lectures we will also consider non-sorted trees used to store information during graph processing.
*
[1] On completion of this topic, you are also encourage to examine red-black trees.
To go further after examining red-black trees, examine van Emde Boas Trees.
Find out about a tree that many people have been using for a while now, possibly without knowing it. These are radix trees (trie).
All of the above can be found in our reference book “Introduction to Algorithms” by Cormen et al.
A basic understanding of trie is at https://en.wikipedia.org/wiki/Trie
*
The Data to be Stored
The data stored in the tree can either be the actual data or a pointer/index to the actual data.
The actual data stored will almost always contain a key plus other data.
The key is used to place (order) the data in the container.
Examples of keys are account numbers, membership numbers, names, or keys calculated from some part of the data.
The key should be unique to enable the BST to be more efficient.
It also possible to have secondary keys, where a list, array or tree is ‘overlayed’ on the first structure giving a different sorted order.
*
Binary Search Trees
Binary trees are trees where
every node has 1 piece of data and two pointers (left, right)
every node, except root, can have a parent pointer [1]
therefore every node has 0:2 children
Binary search trees are binary trees where
every node has data that is greater than the data in all nodes to the left of it.
every node has data that is less than the data in all nodes to the right of it.
Note that this contrasts with the heap (see later), where a node’s data was always guaranteed to be less than (for a min-heap) or greater than (for a max-heap) all data in its subtree. [2]
Since the data sorting is based on a unique key, there is normally no two identical sets of data. [3]
*
[1] In the examples we work with, the parent pointer is not used. The parent pointer would be needed for the “predecessor” operation.
[2] A heap is a binary tree which satisfies certain properties which allow it to be stored efficiently **as an array**. See reference by Mark Weiss for diagram and code. (“Algorithms, Data Structures, and Problem solving using C++” ). Or see video https://www.youtube.com/watch?v=B7hVxCmfPtM&list=PLUl4u3cNGP61Oq3tWYp6V_F-5jb5L2iHb&index=4
[3] But this should not preclude variations such as found in another data structure, multimaps. Think of how you may have to cater for repeat key-value data.
BST Algorithms
Almost all BST algorithms are recursive as this makes them very simple.
However, the root node might be treated differently because it has no parent but should it? [1]
All the methods require that root has been set to NULL in the constructor.
Traversal of a BST is almost always done either in-order or pre-order. [2]
*
[1] left as an exercise for you to think about.
[2] of course post-order traversals can also be done.
BST Insert Animation (for integer key)
*
root
Insert(234)
newNode
234
BST Insert Animation (for integer key)
*
124
root
Insert(124)
newNode
234
compare
BST Insert Animation (for integer key)
*
124
root
Insert(124)
234
BST Insert Animation (for integer key)
*
124
root
Insert(180)
234
180
newNode
compare
compare
180
BST Insert Animation (for integer key)
*
124
root
Insert(300)
234
300
newNode
compare
180
300
BST Insert Animation (for integer key)
*
124
root
Insert(80)
234
80
newNode
compare
180
300
compare
80
BST Insert Animation (for integer key)
*
124
root
Insert(100)
234
100
newNode
compare
180
300
compare
80
compare
100
BST Insert Animation (for integer key)
*
124
root
Insert(111)
234
111
newNode
compare
180
300
compare
80
compare
100
compare
111
BST Insert Animation (for integer key)
*
124
root
Insert(153)
234
153
newNode
compare
180
300
compare
80
compare
100
111
153
BST Insert Animation (for integer key)
*
124
root
Insert(310)
234
310
newNode
compare
180
300
80
compare
100
111
153
310
END
BST Insert
Insert (newdata) [1]
Get memory for a newNode
Place new data in the newNode
IF root is NULL
root = newNode [2]
ELSE
Insert (newNode, root) [3]
ENDIF
END Insert
*
[1] just for the root, but think about it – does it have to be special?
[2] this is where the actual insertion happens.
[3] is this sepeartion of different insertions really needed?
Insert (newNode, parent) [1]
IF newNode’s data < parent.data
IF parent has no left child
parent.leftLink = newNode
ELSE
Insert (newNode, parent.leftLink)
ENDIF
ELSE
IF parent has no right child
parent.rightLink = newNode
ELSE
Insert (newNode, parent.rightLink)
ENDIF
ENDIF
END Insert
*
[1] For the non root nodes. Can this algorithm be simplified further?
BST Problem
The trouble with the ordinary BST, is that if ordered data is inserted, you end up with a linked list.
This means that searching a BST is O(log n) on average at best, but has a worst case complexity of O(n).
This problem is solved by using a balanced BST instead of the simple BST.
However, the solution comes at the cost of more difficult algorithms.
Which in turn means that programming, testing, debugging and maintaining becomes more time consuming.
*
AVL Trees
Invented by Adelson-Velski and Landis (so AVL) [1]
It is a height balanced tree. [2] – see separate diagram.
In other words the height of the left and right subtrees is never allowed to differ by more than 1.
This ensures that the complexity of a search remains at O(log n).
The height of a subtree is defined recursively as:
*
IF the tree is empty
height = -1
ELSE
height = 1 + max(height(leftLink), height(rightLink)
ENDIF
[1]
See https://www.youtube.com/watch?v=FNeL18KsWPc&index=6&list=PLUl4u3cNGP61Oq3tWYp6V_F-5jb5L2iHb
[2]
The height of a node in a tree is the length of the path from that node to the lowest level leaf.
Height of a tree = height of the root node
There is also something called depth. The depth of a node is determined by how far it is from the root.
Depth of the root is 0 and depth of any node = 1 + depth of that node’s parent
*
Insertion into an AVL Tree
Insertion is done the same as for an ordinary BST.
But if the height is unbalanced, the insertion is followed with a rebalance:
*
Insert (newNode, parent) [1]
IF newNode’s data < parent.data
IF parent has no left child
parent.leftLink = newNode
ELSE
Insert (newNode, parent.leftLink)
RebalanceBelowLeftOf (parent)
ENDIF
ELSE
IF parent has no right child
parent.rightLink = newNode
ELSE
Insert (newNode, parent.rightLink)
RebalanceBelowRightOf (parent)
ENDIF
ENDIF
END Insert
[1] use the combined insert – ie. Combine the root node insert into this one.
AVL Tree Insert Animation
*
20
10
40
30
50
60
The tree is now unbalanced
So we do a ‘rotation’ around the [40]
temp
root
END
AVL Tree Insert Animation
*
20
10
40
30
50
60
The tree is now unbalanced
So we do a ‘rotation’ around the [40]
root
Rearranged, to look good
AVL Tree Insert Animation
*
20
10
40
30
50
60
The tree is again unbalanced
But this more complicated situation requires a double rotation
root
25
35
28
temp
AVL Tree Insert Animation
*
20
10
40
30
50
60
root
25
35
28
Rearranged, to look good
Rebalancing and Rotations
Inserting into an AVL tree can result in the tree becoming unbalanced.
The part of the tree that is unbalanced is going to be somewhere on the tree from the insertion point to the root of the tree as only these subtrees are affected by the insertion.
Rebalancing needs to be carried out to maintain the AVL property.
The rebalancing is done by rotation operations.
For example a single rotation swaps the role of the parent and child maintaining the search order. For a number of cases, single rotation doesn’t work so double rotations are used. You are encouraged to find out how these operations work on your own. It is not examinable this semester. [1]
What is examinable is the ability to draw the tree after the rebalancing, so that is what you need to be able to do.
You are also encouraged to find out more about Red-Black trees. These are good alternatives to AVL trees.
*
*
[1] The library has previous textbooks that we have used in the past for this unit. The book “Data Structures and Problem Solving Using C++” by Mark Weiss explains rebalancing and rotations using diagrams and C++ code.
The Programs [1]
BTreeSolver shows the resulting BST, AVL Tree, Max Heap and/or Min Heap after insertions (which can be randomly generated or chosen).
HeapSort shows the steps involved in a heap sort.
MTreeSolver shows the resulting Multiway tree, BTree and/or B Plus Tree after insertions. These are covered in the next lecture.
Graphs allows you to build graphs and then view information about the graphs (future lectures).
*
[1] provided by Nicola Ritter
*
Readings
Textbook: Chapter on Binary Trees, particularly the section on Binary Search Trees.
Should go through the programming example at the end of the chapter.
Textbook: Chapter on Recursion
In the lab/assignment, you would normally be asked to provide a rational for your data structures. In this video link below (from an MIT unit on Introduction to Algorithms) for BST justification one particular example is used.
MIT Lecture:
https://www.youtube.com/watch?v=9Jry5-82I68&index=5&list=PLUl4u3cNGP61Oq3tWYp6V_F-5jb5L2iHb. In the video, the tree algorithm is modified to cater for a new requirement. This approach shouldn’t be used– see Open Closed Principle. Think of a better solution. Other than that, the video explains the BST and its use very **well**.
MIT Tutorial (called Recitation – similar to tutorial – no computers, labs are separate – have computers)
https://www.youtube.com/watch?v=r5pXu1PAUkI&list=PLUl4u3cNGP61Oq3tWYp6V_F-5jb5L2iHb&index=28
Further exploration
AVL trees .. from an MIT unit Introduction to Algorithms
MIT Lecture:
https://www.youtube.com/watch?v=FNeL18KsWPc&index=6&list=PLUl4u3cNGP61Oq3tWYp6V_F-5jb5L2iHb
MIT Tutorial (different to a lab, no computers)
https://www.youtube.com/watch?v=IWzYoXKaRIc&list=PLUl4u3cNGP61Oq3tWYp6V_F-5jb5L2iHb&index=29
Reference book, Introduction to Algorithms. For further study, there are many tree and tree algorithms described in the reference book. For this unit, the lecture notes, practical work and the textbook is sufficient.
Optional – recurrence trees https://www.youtube.com/watch?v=8F2OvQIlGiU
An earlier textbook used in this unit (some years ago) is a better reference to some of the more interesting Tree (and graph) data structures like AVL trees, Red Black trees and AA trees. The book is available in the library. It is “Algorithms, Data Structures, and Problem solving using C++” by Mark Weiss.