程序代写代做代考 concurrency graph database data structure ada data mining go algorithm B tree Skiplists, Bitmap Indices, kd Trees

Skiplists, Bitmap Indices, kd Trees

 Skiplist
 Bitmap Indexes  kd Trees

5.1

 B and B+ trees are the most commonly used ordered DBMS index structure
▪ Tree nodes are mapped to disk pages
 In-memory DBs allows the use of other index structures
▪ That do not have to be optimized for disk access  One such structure is the skiplist
▪ Relatively simple to implement
▪ Fast insert and delete algorithms
▪ Thatavoidcomplexre-balancing
▪ Used by MemSQL
▪ Has advantages for concurrency

 Sorted linked lists are inefficient ▪ Dynamic but all operations are O(n)
 Skiplist – multiple levels of linked lists
▪ Lowest level contains all keys in order
▪ Higher levels contain 1⁄2 the number of keys from the previous level ▪ Acting as an index to the next level down
▪ Allow operations to be performed in O(log n), including range search
head
13 45 93 13 29 45 67 93 13 21 29 42 45 53 67 88 93

 Skiplist search visits at most two nodes per level  Start at top level
▪ If current node == target go to bottom level and return record ▪ If target < next key go down level and repeat ▪ If target  next key go right and repeat ▪ If at lowest level linear search until target found or passed Use doubly linked lists to implement reverse range searches Find 88 13 45 93 88 < 93 13 29 45 67 93 13 21 29 42 45 53 67 88 93 88 < 93 head  Insertion and removal from a skiplist is problematic ▪ Expensive to ensure that levels have 1⁄2 the elements of lower levels ▪ The entire list may have to be rearranged  Solution – relax the requirement ▪ Each level is expected to have 1⁄2 the nodes of the previous level ▪ On insertion a node is copied to the higher level with probability of 0.5 ▪ A randomized data structure – two skiplists with the same data may differ  Example – assume maximum height of 3 ▪ Values inserted: 42, 88, 93, 21, 53, 13, 45, 29, 67 head Insert 42 head head “rolled” < 0.5 42 Insert 88 head head 88 < 0.5 > 0.5
head
42 88
Insert into lowest level first
Then roll dice to see if value is inserted into higher levels
Keep track of path to lowest levels (the visited nodes) to insert higher level nodes

head
Insert 93 head
head 42 88 93
88

head
Remember: new nodes can be spliced into a list in constant time
Insert 21 head
head 21 42 88 93
88

53
Insert 53 head 53 88
head 21 42 53 88 93
head

Insert 13, 45, 29, 67
head 29
head
head
53
29 45 53 88
13 21 29 42 45 53 67 88 93

 Removal is straightforward
▪ If the entry has a tower of nodes remove the entire tower
▪ Like insertion, the nodes in the search path need to be retained in the process
Remove 53
head 29
head
head
53
29 45 53 88
13 21 29 42 45 53 67 88 93

5.2

 There are a number 0f specialized indexes
▪ Often related to satisfying queries with complex conditions ▪ Where clauses with conjunctions and / or disjunctions
▪ Whichmayinvolvemultipleattributes
 Geographic information systems ▪ Partial match queries
▪ Range queries
▪ Nearest neighbour queries
 OLAP databases
▪ Queries on multidimensional data

 Queries with complex conditions can be satisfied using a conventional index
▪ By creating an index with a composite search key
▪ Or by using multiple indices, retrieving the RIDs and selecting their intersection
▪ Or union for a disjunction
 An alternative is to create a multiple key index
▪ An index on one attribute is built above an index on a second ▪ The first index refers to index pages on the second attribute
▪ Search key values in the lower index may be repeated for different search key values from the first index
 This can be generalized to more than two attributes

bob
50

kate
33

 Multiple key indices work well for range queries
▪ If the individual indices support range queries
 But do not support queries where data for the first attribute is missing
▪ Similar to composite search keys
 An alternative is a kd tree
30
40
50
ann
bob
dave
kim
ann
50

dave
40

35
50
ann
40

ann
30

lee

40
75
bob
35

dave
75

name index
age index file

 A k-dimensional search tree is an in-memory structure used for multidimensional data
 They generalize a binary search tree
▪ Values less than the node’s value are in its left subtree
▪ Values greater than the node’s value are in its right subtree
 kd tree nodes contain an attribute name and an associated value {salary, 40000} {name, kate}
 Attributes at different levels of the tree are different
▪ The levels rotate through the attributes of the tree
▪ With two attributes, the levels alternate between the attributes

name kate
age 60 age 47
joe
65
kat
60
sue
30
kim
40
name bob
name sue
ned
49
ren
52
sue
47
zak
60
age 30
joe
30
hil
40
ann
25
art
22
ben
ada
37
40
This structure can be adapted to disk storage
Nodes should be mapped to blocks Interior nodes made similar to B tree nodes

 Bitmap indices are often used in databases for data mining and OLAP
▪ Which often have low cardinality attributes ▪ And change relatively infrequently
 A bitmap index consists of multiple vectors of bits ▪ With one vector for each possible value of the attribute
▪ A bitmap to record if a patient was a smoker would require two bit vectors
▪ A bitmap on birth year might require 100 bit vectors
▪ The ith bit of the index is set to 1 if the ith row of the table has the vector’s value for the attribute

 A bitmap index can speed up queries on sparse columns, that have few possible values
▪ One bit is allocated for each possible value  Use indices to answer queries
▪ How many single customers live in BC?
▪ AND the S and BC columns and count the 1s
marital status index
province index
Use Boolean operators on vectors
AND OR NOT …
S
M
D
id
name marital province
AB
BC
MB
NB

1
0
0
112
Sam S BC
0
1
0
1
0
0
1
0
113 Sue M MB
0
0
0
1
121
Ann S BC
0
1
0
1
0
0
0
0
1
0
0
0
131 Bob S BC
0
0
0
1
212
Kate D AB
1
0
0
0