CS计算机代考程序代写 Sorting

Sorting

>>
Sorting

The Sort Operation

Two-way Merge Sort

Comparison for Sorting

Cost of Two-way Merge Sort

n-Way Merge Sort

Cost of n-Way Merge Sort

Sorting in PostgreSQL

COMP9315 21T1 ♢ Sorting ♢ [0/14]

∧ >>
❖ The Sort Operation

Sorting is explicit in queries only in the order by clause

select * from Students order by name;

Sorting is used internally in other operations:

eliminating duplicate tuples for projection

ordering files to enhance select efficiency

implementing various styles of join

forming tuple groups in group by

Sort methods such as quicksort are designed for in-memory data.

For large data on disks, need external sorts such as merge sort.

COMP9315 21T1 ♢ Sorting ♢ [1/14]

<< ∧ >>
❖ Two-way Merge Sort

Example:

COMP9315 21T1 ♢ Sorting ♢ [2/14]

<< ∧ >>
❖ Two-way Merge Sort (cont)

Requires at least three in-memory buffers:

Assumption: cost of Merge operation on two in-memory buffers ≅ 0.

COMP9315 21T1 ♢ Sorting ♢ [3/14]

<< ∧ >>
❖ Comparison for Sorting

Above assumes that we have a function to compare tuples.

Needs to understand ordering on different data types.

Need a function tupCompare(r1,r2,f) (cf. C’s strcmp)

int tupCompare(r1,r2,f)
{
if (r1.f < r2.f) return -1; if (r1.f > r2.f) return 1;
return 0;
}

Assume =, <, > are available for all attribute types.

COMP9315 21T1 ♢ Sorting ♢ [4/14]

<< ∧ >>
❖ Comparison for Sorting (cont)

In reality, need to sort on multiple attributes and ASC/DESC, e.g.

— example multi-attribute sort
select * from Students
order by age desc, year_enrolled

Sketch of multi-attribute sorting function

int tupCompare(r1,r2,criteria)
{
foreach (f,ord) in criteria {
if (ord == ASC) {
if (r1.f < r2.f) return -1; if (r1.f > r2.f) return 1;
}
else {
if (r1.f > r2.f) return -1;
if (r1.f < r2.f) return 1; } } return 0; } COMP9315 21T1 ♢ Sorting ♢ [5/14] << ∧ >>
❖ Cost of Two-way Merge Sort

For a file containing b data pages:

require ceil(log2b) passes to sort,

each pass requires b page reads, b page writes

Gives total cost:   2.b.ceil(log2b)

Example: Relation with r=105 and c=50
  ⇒   b=2000 pages.

Number of passes for sort:   ceil(log22000)  =  11

Reads/writes entire file 11 times!    Can we do better?

COMP9315 21T1 ♢ Sorting ♢ [6/14]

<< ∧ >>
❖ n-Way Merge Sort

Initial pass uses: B total buffers

Reads B pages at a time, sorts in memory, writes out in order

COMP9315 21T1 ♢ Sorting ♢ [7/14]

<< ∧ >>
❖ n-Way Merge Sort (cont)

Merge passes use:
  n = B-1 input buffers,
  1 output buffer

COMP9315 21T1 ♢ Sorting ♢ [8/14]

<< ∧ >>
❖ n-Way Merge Sort (cont)

Method:

// Produce B-page-long runs
for each group of B pages in Rel {
read B pages into memory buffers
sort group in memory
write B pages out to Temp
}
// Merge runs until everything sorted
numberOfRuns = ceil(b/B)
while (numberOfRuns > 1) {
// n-way merge, where n=B-1
for each group of n runs in Temp {
merge into a single run via input buffers
write run to newTemp via output buffer
}
numberOfRuns = ceil(numberOfRuns/n)
Temp = newTemp // swap input/output files
}

COMP9315 21T1 ♢ Sorting ♢ [9/14]

<< ∧ >>
❖ Cost of n-Way Merge Sort

Consider file where b = 4096, B = 16 total buffers:

pass 0 produces 256 × 16-page sorted runs

pass 1
performs 15-way merge of groups of 16-page sorted runs

produces 18 × 240-page sorted runs   (17 full runs, 1 short run)

pass 2
performs 15-way merge of groups of 240-page sorted runs

produces 2 × 3600-page sorted runs   (1 full run, 1 short run)

pass 3
performs 15-way merge of groups of 3600-page sorted runs

produces 1 × 4096-page sorted runs

(cf. two-way merge sort which needs 11 passes)

COMP9315 21T1 ♢ Sorting ♢ [10/14]

<< ∧ >>
❖ Cost of n-Way Merge Sort (cont)

Generalising from previous example …

For b data pages and B buffers

first pass: read/writes b pages, gives b0 = ceil(b/B)  runs

then need ceil(lognb0) passes until sorted, where n = B-1

each pass reads and writes b pages   (i.e. 2.b page accesses)

Cost = 2.b.(1 + ceil(lognb0)),   where b0 and n are defined above

COMP9315 21T1 ♢ Sorting ♢ [11/14]

<< ∧ >>
❖ Sorting in PostgreSQL

Sort uses a merge-sort (from Knuth) similar to above:

backend/utils/sort/tuplesort.c

include/utils/sortsupport.h

Tuples are mapped to SortTuple structs for sorting:
containing pointer to tuple and sort key

no need to reference actual Tuples during sort

unless multiple attributes used in sort

If all data fits into memory, sort using qsort().

If memory fills while reading, form “runs” and do disk-based sort.

COMP9315 21T1 ♢ Sorting ♢ [12/14]

<< ∧ >>
❖ Sorting in PostgreSQL (cont)

Disk-based sort has phases:

divide input into sorted runs using HeapSort

merge using N buffers, one output buffer

N = as many buffers as workMem allows

Described in terms of “tapes” (“tape” ≅ sorted run)

Implementation of “tapes”: backend/utils/sort/logtape.c

COMP9315 21T1 ♢ Sorting ♢ [13/14]

<< ∧ ❖ Sorting in PostgreSQL (cont) Sorting comparison operators are obtained via catalog (in Type.o): // gets pointer to function via pg_operator struct Tuplesortstate { ... SortTupleComparator ... }; // returns negative, zero, positive ApplySortComparator(Datum datum1, bool isnull1, Datum datum2, bool isnull2, SortSupport sort_helper); Flags in SortSupport indicate: ascending/descending, nulls-first/last. ApplySortComparator() is PostgreSQL's version of tupCompare() COMP9315 21T1 ♢ Sorting ♢ [14/14] Produced: 1 Mar 2021