Skiplists, Bitmap Indices, kd Trees
Skiplist
Bitmap Indexes kd Trees
5.1
B and B+ trees are the most commonly used ordered DBMS index structure
▪ Tree nodes are mapped to disk pages
In-memory DBs allows the use of other index structures
▪ That do not have to be optimized for disk access One such structure is the skiplist
▪ Relatively simple to implement
▪ Fast insert and delete algorithms
▪ Thatavoidcomplexre-balancing
▪ Used by MemSQL
▪ Has advantages for concurrency
Sorted linked lists are inefficient ▪ Dynamic but all operations are O(n)
Skiplist – multiple levels of linked lists
▪ Lowest level contains all keys in order
▪ Higher levels contain 1⁄2 the number of keys from the previous level ▪ Acting as an index to the next level down
▪ Allow operations to be performed in O(log n), including range search
head
13 45 93 13 29 45 67 93 13 21 29 42 45 53 67 88 93
Skiplist search visits at most two nodes per level Start at top level
▪ If current node == target go to bottom level and return record ▪ If target < next key go down level and repeat
▪ If target next key go right and repeat
▪ If at lowest level linear search until target found or passed
Use doubly linked lists to implement reverse range searches
Find 88
13 45 93
88 < 93
13 29 45 67 93 13 21 29 42 45 53 67 88 93
88 < 93
head
Insertion and removal from a skiplist is problematic
▪ Expensive to ensure that levels have 1⁄2 the elements of lower levels ▪ The entire list may have to be rearranged
Solution – relax the requirement
▪ Each level is expected to have 1⁄2 the nodes of the previous level
▪ On insertion a node is copied to the higher level with probability of 0.5 ▪ A randomized data structure – two skiplists with the same data may differ
Example – assume maximum height of 3 ▪ Values inserted: 42, 88, 93, 21, 53, 13, 45, 29, 67
head
Insert 42
head
head
“rolled” < 0.5
42
Insert 88
head
head 88
< 0.5 > 0.5
head
42 88
Insert into lowest level first
Then roll dice to see if value is inserted into higher levels
Keep track of path to lowest levels (the visited nodes) to insert higher level nodes
head
Insert 93 head
head 42 88 93
88
head
Remember: new nodes can be spliced into a list in constant time
Insert 21 head
head 21 42 88 93
88
53
Insert 53 head 53 88
head 21 42 53 88 93
head
Insert 13, 45, 29, 67
head 29
head
head
53
29 45 53 88
13 21 29 42 45 53 67 88 93
Removal is straightforward
▪ If the entry has a tower of nodes remove the entire tower
▪ Like insertion, the nodes in the search path need to be retained in the process
Remove 53
head 29
head
head
53
29 45 53 88
13 21 29 42 45 53 67 88 93
5.2
There are a number 0f specialized indexes
▪ Often related to satisfying queries with complex conditions ▪ Where clauses with conjunctions and / or disjunctions
▪ Whichmayinvolvemultipleattributes
Geographic information systems ▪ Partial match queries
▪ Range queries
▪ Nearest neighbour queries
OLAP databases
▪ Queries on multidimensional data
Queries with complex conditions can be satisfied using a conventional index
▪ By creating an index with a composite search key
▪ Or by using multiple indices, retrieving the RIDs and selecting their intersection
▪ Or union for a disjunction
An alternative is to create a multiple key index
▪ An index on one attribute is built above an index on a second ▪ The first index refers to index pages on the second attribute
▪ Search key values in the lower index may be repeated for different search key values from the first index
This can be generalized to more than two attributes
bob
50
…
kate
33
…
Multiple key indices work well for range queries
▪ If the individual indices support range queries
But do not support queries where data for the first attribute is missing
▪ Similar to composite search keys
An alternative is a kd tree
30
40
50
ann
bob
dave
kim
ann
50
…
dave
40
…
35
50
ann
40
…
ann
30
…
lee
…
40
75
bob
35
…
dave
75
…
name index
age index file
A k-dimensional search tree is an in-memory structure used for multidimensional data
They generalize a binary search tree
▪ Values less than the node’s value are in its left subtree
▪ Values greater than the node’s value are in its right subtree
kd tree nodes contain an attribute name and an associated value {salary, 40000} {name, kate}
Attributes at different levels of the tree are different
▪ The levels rotate through the attributes of the tree
▪ With two attributes, the levels alternate between the attributes
name kate
age 60 age 47
joe
65
kat
60
sue
30
kim
40
name bob
name sue
ned
49
ren
52
sue
47
zak
60
age 30
joe
30
hil
40
ann
25
art
22
ben
ada
37
40
This structure can be adapted to disk storage
Nodes should be mapped to blocks Interior nodes made similar to B tree nodes
Bitmap indices are often used in databases for data mining and OLAP
▪ Which often have low cardinality attributes ▪ And change relatively infrequently
A bitmap index consists of multiple vectors of bits ▪ With one vector for each possible value of the attribute
▪ A bitmap to record if a patient was a smoker would require two bit vectors
▪ A bitmap on birth year might require 100 bit vectors
▪ The ith bit of the index is set to 1 if the ith row of the table has the vector’s value for the attribute
A bitmap index can speed up queries on sparse columns, that have few possible values
▪ One bit is allocated for each possible value Use indices to answer queries
▪ How many single customers live in BC?
▪ AND the S and BC columns and count the 1s
marital status index
province index
Use Boolean operators on vectors
AND OR NOT …
S
M
D
id
name marital province
AB
BC
MB
NB
…
1
0
0
112
Sam S BC
0
1
0
1
0
0
1
0
113 Sue M MB
0
0
0
1
121
Ann S BC
0
1
0
1
0
0
0
0
1
0
0
0
131 Bob S BC
0
0
0
1
212
Kate D AB
1
0
0
0