2021/4/28 Sorting
Sorting
The Sort Operation Two-way Merge Sort Comparison for Sorting Cost of Two-way Merge Sort n-Way Merge Sort
Cost of n-Way Merge Sort Sorting in PostgreSQL
>>
COMP9315 21T1 ♢ Sorting ♢ [0/14]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/sorting/slides.html
1/16
2021/4/28 Sorting
❖ The Sort Operation
Sortingisexplicitinqueriesonlyintheorder byclause select * from Students order by name;
Sorting is used internally in other operations:
eliminating duplicate tuples for projection ordering les to enhance select ef ciency implementing various styles of join formingtuplegroupsingroup by
Sort methods such as quicksort are designed for in-memory data.
For large data on disks, need external sorts such as merge sort. COMP9315 21T1 ♢ Sorting ♢ [1/14]
∧ >>
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/sorting/slides.html
2/16
2021/4/28 Sorting
❖ Two-way Merge Sort Example:
<< ∧ >>
COMP9315 21T1 ♢ Sorting ♢ [2/14]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/sorting/slides.html
3/16
2021/4/28 Sorting
❖ Two-way Merge Sort (cont) Requires at least three in-memory buffers:
<< ∧ >>
Assumption: cost of Merge operation on two in-memory buffers ≅ 0.
COMP9315 21T1 ♢ Sorting ♢ [3/14]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/sorting/slides.html
4/16
2021/4/28 Sorting
❖ Comparison for Sorting
Above assumes that we have a function to compare tuples. Needs to understand ordering on different data types. Need a function tupCompare(r1,r2,f) (cf. C’s strcmp)
int tupCompare(r1,r2,f)
{
if (r1.f < r2.f) return -1;
if (r1.f > r2.f) return 1;
return 0;
}
Assume =, <, > are available for all attribute types.
COMP9315 21T1 ♢ Sorting ♢ [4/14]
<< ∧ >>
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/sorting/slides.html
5/16
2021/4/28 Sorting
<< ∧ >>
❖ Comparison for Sorting (cont)
In reality, need to sort on multiple attributes and ASC/DESC,
e.g.
— example multi-attribute sort
select * from Students
order by age desc, year_enrolled
Sketch of multi-attribute sorting function
int tupCompare(r1,r2,criteria)
{
foreach (f,ord) in criteria {
if (ord == ASC) {
if (r1.f < r2.f) return -1;
if (r1.f > r2.f) return 1;
}
else {
if (r1.f > r2.f) return -1;
if (r1.f < r2.f) return 1;
} }
return 0; }
COMP9315 21T1 ♢ Sorting ♢ [5/14]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/sorting/slides.html
6/16
2021/4/28 Sorting
❖ Cost of Two-way Merge Sort For a le containing b data pages:
require ceil(log2b) passes to sort,
each pass requires b page reads, b page writes
Givestotalcost: 2.b.ceil(log2b)
Example:Relationwithr=105andc=50 ⇒ b=2000pages. Number of passes for sort: ceil(log22000) = 11 Reads/writes entire le 11 times! Can we do better?
COMP9315 21T1 ♢ Sorting ♢ [6/14]
<< ∧ >>
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/sorting/slides.html
7/16
2021/4/28 Sorting
❖ n-Way Merge Sort Initial pass uses: B total buffers
<< ∧ >>
Reads B pages at a time, sorts in memory, writes out in order
COMP9315 21T1 ♢ Sorting ♢ [7/14]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/sorting/slides.html
8/16
2021/4/28 Sorting
<< ∧ >>
❖ n-Way Merge Sort (cont)
Mergepassesuse: n=B-1inputbuffers, 1outputbuffer
COMP9315 21T1 ♢ Sorting ♢ [8/14]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/sorting/slides.html
9/16
2021/4/28 Sorting
❖ n-Way Merge Sort (cont) Method:
// Produce B-page-long runs
for each group of B pages in Rel {
read B pages into memory buffers
sort group in memory
write B pages out to Temp
}
// Merge runs until everything sorted
numberOfRuns = ceil(b/B)
while (numberOfRuns > 1) {
// n-way merge, where n=B-1
for each group of n runs in Temp {
merge into a single run via input buffers
write run to newTemp via output buffer
}
numberOfRuns = ceil(numberOfRuns/n)
Temp = newTemp // swap input/output files
}
COMP9315 21T1 ♢ Sorting ♢ [9/14]
<< ∧ >>
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/sorting/slides.html
10/16
2021/4/28 Sorting
<< ∧ >>
❖ Cost of n-Way Merge Sort
Consider le where b = 4096, B = 16 total buffers:
pass 0 produces 256 × 16-page sorted runs pass 1
performs 15-way merge of groups of 16-page sorted runs
produces18×240-pagesortedruns (17fullruns,1short run)
pass 2
performs 15-way merge of groups of 240-page sorted runs
produces2×3600-pagesortedruns (1fullrun,1short run)
pass 3
performs 15-way merge of groups of 3600-page sorted runs
produces 1 × 4096-page sorted runs
(cf. two-way merge sort which needs 11 passes)
COMP9315 21T1 ♢ Sorting ♢ [10/14]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/sorting/slides.html
11/16
2021/4/28 Sorting
❖ Cost of n-Way Merge Sort (cont)
Generalising from previous example … For b data pages and B buffers
rst pass: read/writes b pages, gives b0 = ceil(b/B) runs then need ceil(lognb0) passes until sorted, where n = B-1 eachpassreadsandwritesbpages (i.e.2.bpageaccesses)
Cost=2.b.(1+ceil(lognb0)), whereb0andnarede nedabove
COMP9315 21T1 ♢ Sorting ♢ [11/14]
<< ∧ >>
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/sorting/slides.html
12/16
2021/4/28 Sorting
<< ∧ >>
❖ Sorting in PostgreSQL
Sort uses a merge-sort (from Knuth) similar to above:
backend/utils/sort/tuplesort.c include/utils/sortsupport.h
Tuples are mapped to SortTuple structs for sorting: containing pointer to tuple and sort key
no need to reference actual Tuples during sort unless multiple attributes used in sort
If all data ts into memory, sort using qsort().
If memory lls while reading, form “runs” and do disk-based
sort.
COMP9315 21T1 ♢ Sorting ♢ [12/14]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/sorting/slides.html
13/16
2021/4/28 Sorting
❖ Sorting in PostgreSQL (cont) Disk-based sort has phases:
divide input into sorted runs using HeapSort merge using N buffers, one output buffer
N = as many buffers as workMem allows
Described in terms of “tapes” (“tape” ≅ sorted run) Implementation of “tapes”: backend/utils/sort/logtape.c
<< ∧ >>
COMP9315 21T1 ♢ Sorting ♢ [13/14]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/sorting/slides.html
14/16
2021/4/28 Sorting
<< ∧
❖ Sorting in PostgreSQL (cont)
Sorting comparison operators are obtained via catalog (in
Type.o):
// gets pointer to function via pg_operator
struct Tuplesortstate { ... SortTupleComparator ... };
// returns negative, zero, positive
ApplySortComparator(Datum datum1, bool isnull1,
Datum datum2, bool isnull2,
SortSupport sort_helper); Flags in SortSupport indicate: ascending/descending, nulls-
rst/last.
ApplySortComparator() is PostgreSQL's version of tupCompare()
COMP9315 21T1 ♢ Sorting ♢ [14/14]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/sorting/slides.html
15/16
2021/4/28 Sorting
Produced: 1 Mar 2021
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/sorting/slides.html
16/16