CS计算机代考程序代写 scheme AI algorithm 2021/4/28 Multi-dimensional Hashing

2021/4/28 Multi-dimensional Hashing
Multi-dimensional Hashing
Hashing and pmr MA.Hashing Example MA.Hashing Hash Functions Queries with MA.Hashing MA.Hashing Query Algorithm Query Cost for MA.Hashing Optimising MA.Hashing Cost
>>
COMP9315 21T1 ♢ Multi-d Hashing ♢ [0/14]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/nd-hashing/slides.html 1/16

2021/4/28 Multi-dimensional Hashing
❖ Hashing and pmr For a pmr query like
select * from R where a1 = C1 and … and an = Cn
if one ai is the hash key, query is very efcient if no ai is the hash key, need to use linear scan
Can be alleviated using multi-attribute hashing (mah) form a composite hash value involving all attributes
at query time, some components of composite hash are known
(allows us to limit the number of data pages which need to be checked)
MA.hashing works in conjunction with any dynamic hashing scheme.
COMP9315 21T1 ♢ Multi-d Hashing ♢ [1/14]
∧ >>
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/nd-hashing/slides.html 2/16

2021/4/28 Multi-dimensional Hashing
❖ Hashing and pmr (cont) Multi-attribute hashing parameters:
lesize=b=2dpages ⇒ used-bithashvalues relationhasnattributes: a1,a2,…an
attributeai hashashfunctionhi
attributeai contributesdi bits(tothecombinedhashvalue) total bits d = ∑i=1..n di
a choice vector (cv) species for all k ∈ 0..d-1
bit j from hi(ai) contributes bit k in combined hash
value
COMP9315 21T1 ♢ Multi-d Hashing ♢ [2/14]
<< ∧ >>
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/nd-hashing/slides.html 3/16

2021/4/28 Multi-dimensional Hashing
<< ∧ >>
❖ MA.Hashing Example
Consider relation Deposit(branch,acctNo,name,amount)
Assume a small data le with 8 main data pages (plus overows).
Hash parameters: d=3 d1=1 d2=1 d3=1 d4=0 Note that we ignore the amount attribute (d4=0) Assumes that nobody will want to ask queries like
select * from Deposit where amount=533
Choice vector is designed taking expected queries into account.
COMP9315 21T1 ♢ Multi-d Hashing ♢ [3/14]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/nd-hashing/slides.html 4/16

2021/4/28 Multi-dimensional Hashing
❖ MA.Hashing Example (cont) Choice vector:
This choice vector tells us:
bit0inhashcomesfrombit0ofhash1(a1) (b1,0) bit1inhashcomesfrombit0ofhash2(a2) (b2,0) bit2inhashcomesfrombit0ofhash3(a3) (b3,0) bit3inhashcomesfrombit1ofhash1(a1) (b1,1) etc.etc.etc. (uptoasmanybitsofhashingasrequired,e.g.32)
COMP9315 21T1 ♢ Multi-d Hashing ♢ [4/14]
<< ∧ >>
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/nd-hashing/slides.html 5/16

2021/4/28 Multi-dimensional Hashing
❖ MA.Hashing Example (cont) Consider the tuple:
Hash value (page address) is computed by:
<< ∧ >>
branch
acctNo
name
amount
Downtown
101
Johnston
512
COMP9315 21T1 ♢ Multi-d Hashing ♢ [5/14]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/nd-hashing/slides.html 6/16

2021/4/28 Multi-dimensional Hashing
❖ MA.Hashing Hash Functions Auxiliary denitions:
#define MaxHashSize 32
typedef unsigned int HashVal;
// extracts i’th bit from hash value
#define bit(i,h) (((h) & (1 << (i))) >> (i))
// choice vector elems
typedef struct { int attr, int bit } CVelem;
typedef CVelem ChoiceVec[MaxHashSize];
// hash function for individual attributes
HashVal hash_any(char *val) { … }
COMP9315 21T1 ♢ Multi-d Hashing ♢ [6/14]
<< ∧ >>
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/nd-hashing/slides.html 7/16

2021/4/28 Multi-dimensional Hashing
<< ∧ >> ❖ MA.Hashing Hash Functions (cont)
Produce combined d-bit hash value for tuple t :
HashVal hash(Tuple t, ChoiceVec cv, int d)
{
HashVal h[nAttr(t)+1]; // hash for each attr HashVal res = 0, oneBit;
int i, a, b;
for (i = 1; i <= nAttr(t); i++) h[i] = hash_any(attrVal(t,i)); for (i = 0; i < d; i++) { a = cv[i].attr; b = cv[i].bit; oneBit = bit(b, h[a]); res = res | (oneBit << i); } return res; } COMP9315 21T1 ♢ Multi-d Hashing ♢ [7/14] https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/nd-hashing/slides.html 8/16 2021/4/28 Multi-dimensional Hashing ❖ Queries with MA.Hashing In a partial match query: values of some attributes are known values of other attributes are unknown E.g. select amount from Deposit where branch = 'Brighton' and name = 'Green' forwhichweusetheshorthand (Brighton, ?, Green, ?) COMP9315 21T1 ♢ Multi-d Hashing ♢ [8/14] << ∧ >>
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/nd-hashing/slides.html 9/16

2021/4/28 Multi-dimensional Hashing
<< ∧ >>
❖ Queries with MA.Hashing (cont) Consider query: select amount from Deposit where
name=’Green’
Matching tuples must be in pages: 100, 101, 110, 111. COMP9315 21T1 ♢ Multi-d Hashing ♢ [9/14]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/nd-hashing/slides.html 10/16

2021/4/28 Multi-dimensional Hashing
❖ MA.Hashing Query Algorithm // Builds the partial hash value (e.g. 10*0*1)
// Treats query like tuple with some attr values missing
ns=0; //#unknownbits=#stars for each attribute i in query Q {
if (hasValue(Q,i)) {
set d[i] bits in composite hash
using choice vector and hash(Q,i)
} else {
set d[i] *’s in composite hash
using choice vector
ns += d[i] }
} …
COMP9315 21T1 ♢ Multi-d Hashing ♢ [10/14]
<< ∧ >>
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/nd-hashing/slides.html 11/16

2021/4/28 Multi-dimensional Hashing
<< ∧ >> ❖ MA.Hashing Query Algorithm (cont)

// Use the partial hash to find candidate pages
r = openRelation(“R”,READ); for (i = 0; i < 2ns; i++) { P = composite hash replace *'s in P using i and choice vector Buf = getPage(fileOf(r), P); for each tuple T in Buf { if (T satisfies pmr query) add T to results } } COMP9315 21T1 ♢ Multi-d Hashing ♢ [11/14] https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/nd-hashing/slides.html 12/16 2021/4/28 Multi-dimensional Hashing << ∧ >>
❖ Query Cost for MA.Hashing
Multi-attribute hashing handles a range of query types, e.g.
select * from R where a=1
select * from R where d=2
select * from R where b=3 and c=4
select * from R where a=5 and b=6 and c=7
A relation with n attributes has 2n different query types. Differentquerytypeshavedifferentcosts (differentno.of
*’s)
Cost(Q) = 2s where s = ∑i ∉ Q di (alternatively Cost(Q) = ∏i ∉ Q
2di)
Query distribution gives probability pQ of asking each query type Q.
COMP9315 21T1 ♢ Multi-d Hashing ♢ [12/14]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/nd-hashing/slides.html 13/16

2021/4/28 Multi-dimensional Hashing
<< ∧ >> ❖ Query Cost for MA.Hashing (cont)
Min query cost occurs when all attributes are used in query
Min Costpmr = 1
Max query cost occurs when no attributes are specied
MaxCostpmr = 2d = b
Average cost is given by weighted sum over all query types:
AvgCostpmr = ∑QpQ∏i∉Q2di
Aim to minimise the weighted average query cost over possible query
types
COMP9315 21T1 ♢ Multi-d Hashing ♢ [13/14]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/nd-hashing/slides.html 14/16

2021/4/28 Multi-dimensional Hashing
❖ Optimising MA.Hashing Cost
For a given application, useful to minimise Costpmr.
Can be achieved by choosing appropriate values for di (cv)
Heuristics:
distribution of query types (more bits to frequently used attributes)
size of attribute domain (≤ #bits to represent all values in domain)
discriminatory power (more bits to highly discriminating attributes)
Trade-off: making Qj more efcient makes Qk less efcient.
This is a combinatorial optimisation problem
(solve via standard optimisation techniques e.g. simulated annealing)
COMP9315 21T1 ♢ Multi-d Hashing ♢ [14/14]
<< ∧ https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/nd-hashing/slides.html 15/16 2021/4/28 Multi-dimensional Hashing Produced: 15 Mar 2021 https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/nd-hashing/slides.html 16/16