程序代写代做代考 algorithm Microsoft PowerPoint – Chapter 9 – Sorting

Microsoft PowerPoint – Chapter 9 – Sorting

Introduction to
Parallel Computing

George Karypis
Sorting

Outline
Background
Sorting Networks
Quicksort
Bucket-Sort & Sample-Sort

Background
Input Specification

Each processor has n/p elements
A ordering of the processors

Output Specification
Each processor will get n/p consecutive elements of
the final sorted array.
The “chunk” is determined by the processor ordering.

Variations
Unequal number of elements on output.

In general, this is not a good idea and it may require a shift to
obtain the equal size distribution.

Basic Operation:
Compare-Split Operation

Single element per processor

Multiple elements per processor

Sorting Networks
Sorting is one of the fundamental problems in
Computer Science
For a long time researchers have focused on the
problem of “how fast can we sort n elements”?

Serial
nlog(n) lower-bound for comparison-based sorting

Parallel
O(1), O(log(n)), O(???)

Sorting networks
Custom-made hardware for sorting!

Hardware & algorithm
Mostly of theoretical interest but fun to study!

Elements of Sorting Networks
Key Idea:

Perform many comparisons in
parallel.

Key Elements:
Comparators:

Consist of two-input, two-output
wires
Take two elements on the input
wires and outputs them in sorted
order in the output wires.

Network architecture:
The arrangement of the
comparators into interconnected
comparator columns

similar to multi-stage networks

Many sorting networks have been
developed.

Bitonic sorting network
Θ(log2(n)) columns of
comparators.

Bitonic Sequence

Bitonic sequences are
graphically represented
by lines as follows:

1
2

7

0

4
6

Why Bitonic Sequences?
A bitonic sequence can be “easily” sorted in
increasing/decreasing order.

s s1 s2

• Every element of s1 will be less than or equal to every element of s2
• Both s1 and s2 are bitonic sequences.
• So how can a bitonic sequence be sorted?

Bitonic
Split

An example

Bitonic Merging Network

A comparator network that
takes as input a bitonic
sequence and performs a
sequence of bitonic splits
to sort it.

+BM[n]
A bitonic merging
network for sorting in
increasing order an n-
element bitonic
sequence.

-BM[n]
Similar sort in decreasing
order.

Are we done?
Given a set of elements, how do we re-arrange them into
a bitonic sequence?
Key Idea:

Use successively larger bitonic networks to transform the set into
a bitonic sequence.

An example

Complexity
How many columns of
comparators are required
to sort n=2l elements?

i.e., depth d(n) of the
network?

Bitonic Sort on a Hypercube
One-element-per-processor case

How do we map the algorithm onto a hypercube?
What is the comparator?
How do the wires get mapped?

What can you say about the
pairs of wires that are inputs
to the various comparators?

Illustration

Communication Pattern

Algorithm

Complexity?

Bitonic Sort on a Mesh
One-element-per-processor case

How do the wires get mapped?

Which one is better?
Why?

Row-Major Shuffled Mapping

Complexity?

Can we do better?
What is the lowest bound of sorting on a mesh?

communication performed by each process

More than one element per
processor

Hypercube

Mesh

Bitonic Sort Summary

Quicksort

Parallel Formulation
How about recursive decomposition?

Is it a good idea?
We need to do the partitioning of the array around
a pivot element in parallel.

What is the lower bound of parallel
quicksort?

What will it take to achieve this lower bound?

Optimal for CRCW PRAM
One element per processor
Arbitrary resolution of the concurrent writes.
Views the sorting as a two-step process:

(i) Constructing a binary tree of pivot elements
(ii) Obtaining the sorted sequence by performing an inorder
traversal of this binary tree.

Building the Binary Tree

Complexity?

Practical Quicksort
Shared-memory

Data resides on a shared array.
During a partitioning each
processor is responsible for a
certain portion.

Array Partitioning:
Select & Broadcast pivot.
Local re-arrangement.

Is this required?
Global re-arrangement.

Efficient Global Rearrangement

Practical Quicksort
Complexity

Complexity for message-passing is similar assuming that the all-to-all
personalized communication is not cross-bisection bandwidth limited.

A word on Pivot Selection
Selecting pivots that lead to balanced
partitions is importance

height of the tree
effective utilization of processors

Sample Sort
Generalization of bucket sort with data-driven sampling

n/p elements per-processor.
Each processor sorts is local elements.
Each processor selects p-1 equally spaced elements from its
own list.
The combined p(p-1) set of elements are sorted and p-1 equally
spaced elements are selected from that list.
Each processor splits its own list according to these splitters into
p buckets.
Each processor sends its ith bucket to the ith processor.
Each processor merges the elements that it receives.
Done.

Sample Sort Illustration

Sample Sort Complexity

Assumes
a serial sort