CS计算机代考程序代写 chain 2021/4/28 Sorted Files

2021/4/28 Sorted Files
Sorted Files
Sorted Files
Selection in Sorted Files Insertion into Sorted Files Deletion from Sorted Files
>>
COMP9315 21T1 ♢ Sorted Files ♢ [0/12]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/file-sorted/slides.html
1/14

2021/4/28 Sorted Files
∧ >>
❖ Sorted Files
Records stored in le in order of some eld k (the sort
key).
Makes searching more efcient; makes insertion less efcient
E.g. assume c = 4
COMP9315 21T1 ♢ Sorted Files ♢ [1/12]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/file-sorted/slides.html
2/14

2021/4/28 Sorted Files
<< ∧ >>
❖ Sorted Files (cont)
In order to mitigate insertion costs, use overow
pages.
Total number of overow pages = bov. Average overow chain length = Ov = bov / b. Bucket = data page + its overow page(s)
COMP9315 21T1 ♢ Sorted Files ♢ [2/12]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/file-sorted/slides.html
3/14

2021/4/28 Sorted Files
<< ∧ >>
❖ Selection in Sorted Files
For one queries on sort key, use binary search.
// select * from R where k = val (sorted on R.k)
lo = 0; hi = nPages(rel)-1
while (lo <= hi) { mid = (lo+hi) / 2; // int division with truncation (tup,loVal,hiVal) = searchBucket(rel,mid,x,val); if (tup != NULL) return tup; else if (val < loVal) hi = mid - 1; else if (val > hiVal) lo = mid + 1;
else return NOT_FOUND;
}
return NOT_FOUND;
where rel is relation handle, mid,lo,hi are page indexes,
k is a eld/attr, val,loVal,hiVal are values for k
COMP9315 21T1 ♢ Sorted Files ♢ [3/12]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/file-sorted/slides.html
4/14

2021/4/28 Sorted Files
<< ∧ >> ❖ Selection in Sorted Files (cont)
Search a page and its overow chain for a key value
searchBucket(rel,p,k,val)
{
get_page(rel,p,buf);
(tup,min,max) = searchPage(buf,k,val,+INF,-INF)
if (tup != NULL) return(tup,min,max);
ovf = openOvFile(f);
ovp = ovflow(buf);
while (tup == NULL && ovp != NO_PAGE) {
get_page(ovf,ovp,buf);
(tup,min,max) = searchPage(buf,k,val,min,max)
ovp = ovflow(buf);
}
return (tup,min,max);
}
Assumes each page contains index of next page in Ov chain
COMP9315 21T1 ♢ Sorted Files ♢ [4/12]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/file-sorted/slides.html
5/14

2021/4/28 Sorted Files
<< ∧ >> ❖ Selection in Sorted Files (cont)
Search within a page for key; also nd min/max key values
searchPage(buf,k,val,min,max)
{
res = NULL;
for (i = 0; i < nTuples(buf); i++) { tup = get_tuple(buf, i); if (tup.k == val) res = tup; if (tup.k < min) min = tup.k; if (tup.k > max) max = tup.k;
}
return (res,min,max);
}
COMP9315 21T1 ♢ Sorted Files ♢ [5/12]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/file-sorted/slides.html
6/14

2021/4/28 Sorted Files
<< ∧ >> ❖ Selection in Sorted Files (cont)
The above method treats each bucket like a single large page.
Cases:
best: nd tuple in rst data page we read worst: full binary search, and not found
examine log2b data pages
plus examine all of their overow pages
average: examine some data pages + their overow pages
Costone: Best=1 Worst=log2b+bov
Average case cost analysis needs assumptions (e.g. data distribution)
COMP9315 21T1 ♢ Sorted Files ♢ [6/12]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/file-sorted/slides.html
7/14

2021/4/28 Sorted Files
<< ∧ >> ❖ Selection in Sorted Files (cont)
For pmr query, on non-unique attribute k, where le is sorted on k
tuples containing k may span several pages E.g.select * from R where k = 2
Beginbylocatingapagepcontainingk=val (asforone query).
Scan backwards and forwards from p to nd matches. Thus, Costpmr = Costone + (bq-1).(1+Ov)
COMP9315 21T1 ♢ Sorted Files ♢ [7/12]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/file-sorted/slides.html
8/14

2021/4/28 Sorted Files
<< ∧ >> ❖ Selection in Sorted Files (cont)
For range queries on unique sort key (e.g. primary key):
use binary search to nd lower bound read sequentially until reach upper bound
E.g.select * from R where k >= 5 and k <= 13 Costrange = Costone + (bq-1).(1+Ov) COMP9315 21T1 ♢ Sorted Files ♢ [8/12] https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/file-sorted/slides.html 9/14 2021/4/28 Sorted Files << ∧ >> ❖ Selection in Sorted Files (cont)
For range queries on non-unique sort key, similar method to pmr.
binary search to nd lower bound then go backwards to start of run
then go forwards to last occurence of upper- bound
E.g.select * from R where k >= 2 and k <= 6 Costrange = Costone + (bq-1).(1+Ov) COMP9315 21T1 ♢ Sorted Files ♢ [9/12] https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/file-sorted/slides.html 10/14 2021/4/28 Sorted Files << ∧ >> ❖ Selection in Sorted Files (cont)
So far, have assumed query condition involves sort key k.
Butwhatabout select * from R where j = 100.0 ?
If condition contains attribute j, not the sort key
le is unlikely to be sorted by j as well sortedness gives no searching benets
Costone, Costrange, Costpmr asforheaples
COMP9315 21T1 ♢ Sorted Files ♢ [10/12]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/file-sorted/slides.html
11/14

2021/4/28 Sorted Files
❖ Insertion into Sorted Files Insertion approach:
nd appropriate page for tuple (via binary search) if page not full, insert into page
otherwise, insert into next overow page with space
Thus, Costinsert = Costone + δw (where δw = 1 or 2) Consider insertions of k=33, k=25, k=99 into:
<< ∧ >>
COMP9315 21T1 ♢ Sorted Files ♢ [11/12]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/file-sorted/slides.html
12/14

2021/4/28 Sorted Files
❖ Deletion from Sorted Files
E.g. delete from R where k = 2 Deletion strategy:
nd matching tuple(s) mark them as deleted
Cost depends on selectivity of selection condition Recall:selectivitydeterminesbq (#pageswithmatches)
Thus, Costdelete = Costselect + bqw
COMP9315 21T1 ♢ Sorted Files ♢ [12/12]
<< ∧ https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/file-sorted/slides.html 13/14 2021/4/28 Sorted Files Produced: 7 Mar 2021 https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/file-sorted/slides.html 14/14