Practical In-Place Merging. Programming assignment.
CS 483 – Algorithms DUE: June 18, 11:59 PM
You will implement and evaluate the algorithm in the attached paper, which
merges two lists using only constant additional space, but still runs in
linear time. (Comm of the ACM, vol 31, 1988, pp. 348-352.)
The algorithm has two parts: the main approach when the two input lists are
large and alternate approach when one of the subarrays is small. (The latter
is discussed in the first paragraph of section 3.)
We will make two simplifying assumptions:
(1) the input size N is a perfect square, where N = n*n, and
(2) the length of each of the two lists is a multiple of n.
This allows to use a somewhat simpler approach for creating the initial “buffer” as
described below, rather than that described in section 3.
Two algorithm topics mentioned in the paper need to be explained.
First, suppose an array X is composed of subarray Y followed by subarray Z.
Our goal is to put Z in front of Y, with constant additional space. This can
be done as follows: reverse Y, reverse Z, and then reverse all of X. This runs
in linear time
Second, “straight selection sort”, which runs in quadratic time but makes only
linear data movements is, for array L is the same as SelectionSort in the book.
I would recommend that you have a procedure BufferMerge(a, b, c, d, e) which,
using the buffer L[a..b-1], merges sorted subarrays L[b..c] and L[d..e] into
the sorted subarray L[a..c], leaving the buffer elements in L[d..e].
To create the initial “buffer” begin by locating the blocks A and B as described
in section 3. Now let C be the remainder of the block containing A, and D is the
remainder of the block containing B. Reverse B and D. Merge C and D, using
B as a buffer. (C and D now form a sorted block of length n at the end of the
array; this block will be ignored until we are finished with everything else;
at that point we merge it with the rest of the array, using the procedure for
short arrays at the beginning of section 3.) Now A and B are together, forming
the whole buffer, and the buffer is interchanged with the first block. Now
you can begin by sorting the blocks….
In the above discussion, either C or D could be empty, which is a simpler
case that should be handled correctly.
Note that the sorting of blocks must be done more carefully than the paper
indicates. When two blocks have the same final values, resolve the ties by
examining the first values.
WHAT DO YOU DO?
You will code the algorithm as detailed above. You will construct various
random inputs and run the algorithm on these inputs, and collect statistics on
the behavior of the algorithms. In particular you will report the number of
key comparisons (but not any other comparisons) and the number of record moves,
where a swap of two records requires 3 moves. Report these statistics for inputs
of size N=n*n, for n equal to 5, 10, 50, 100. A random input will be formed by:
1) generating an array of N keys, each key uniformly selected from [1,n],
2) selecting a number x, uniformly from [1,n-1],
3) let the first xn keys be sublist 1, and the last (n-x)n keys be sublist 2
4) sort the two sublists, using insertion sort. [this step is free of any counting]
For each n the statistics reported should be the average of 5 random inputs.
Turn in your code and the runs.
You will then write an analysis of the statistics (about 1-page) describing how
they agree (or do not agree?) with your expectations, after reading the paper.
Your code should be sufficiently readable that someone grading it can readily
determine how you handled the various unspecified algorithmic details.
NO GROUP WORK PERMITTED.
You may ask any questions of me (during office hours or better during class).
Questions to others must NOT relate to coding or algorithmic details.