Problem Description
I introduce here a problem of a paritioning type, which I will call Grouping Monotone Sequence. You are given as input a sequence of n positive inte- gers a1, a2, . . . , an and a positive integer k ≤ n. We assume that the input sequence is sorted in a non-decresing order, that is, a1 ≤ a2 ≤ ··· ≤ an. Given a group of consecutive numbers ai, ai+1, . . . , aj for a starting index i and end index j, with i ≤ j, we call
j
cij =(al−aveij)2,
l=i
the cost of group ai,ai+1,…,aj, where
1 j
aveij = j − i + 1 · (
is the average of group ai,ai+1,…,aj; note that j−i+1 is the size of group ai,ai+1,…,aj. Your task is to group these numbers a1,a2,…,an into k groups of consecutive numbers, that is, as they appear in the input order a1, a2, . . . , an, so that total sum of costs of all k groups is minimised.
More formally, you have to decide the k − 1 cutting positions 1 ≤ i1 < i2 <···
You should write a procedure that for any given input sequence of n sorted positive integers (multiple indentical numbers allowed) and any given k ≤ n, finds a grouping with minimum value of the objective function value. Your procedure should only output the value of the objective of this optimal solution (grouping) that only is the integral part of this value. That is, it should compute the grouping with minimum possible value of the objective function among all possible groupings, and then output the objective value (its integral part) of this optimal grouping.
Additionally, you should include a brief idea of your solution in the commented text in your code and you should also include a short analysis and justification of the running time of your procedure. These descriptions are part of the assessment of your solution.
Hints
You are supposed to solve the Grouping Monotone Sequence problem by dynamic programming. Observe that in this problem we have two kinds of objects to maintain: the numbers a1, . . . , an and groups G1, . . . , Gk, which suggests that the dynamic programming solution should involve two kinds of indices. Let C(j,l) denote the minimum objective function value of the grouping of a1, a2, . . . , aj into l groups. Try to express C(j, l) by using values C(j′,l′) for some values j′ < j,l′ < l. These hints should lead to an appropriate recurrence solution to the problem which then you should translate into a sequential solution that fills in this dynamic programming table in an appropriate way.
Programming Language
You will be using Java as the programming language.
Program framework description
I provide a template program called Main.java that you will use (without altering anything but the place to put your code) to include the code of your procedure to solve the Grouping Monotone Sequence problem.
To use this template, after you write your code inside of method called GroupingMonotoneSequence, you must include in the current directory the input text files dataOne.txt and dataTwo.txt. Note, however, that if you need any additional procedures, you may include them outside of the text of the procedure GroupingMonotoneSequence.
To compile your program in command line (under Linux/Unix) use some- thing like this (this may differ within your system):
javac Main.java
Then, you can run your program from command line like this
java Main -opt1
which will run the program with dataOne.txt as input file and will out- put answers (that is the optimal objective function values (their integral part)) to all instances in order they appear in dataOne.txt. You may use your own dataOne.txt in the format (see below) to experiment with your program. Input file dataOne.txt may contain any number of correct input sequences.
Now, if you run the program with
java Main -opt2
this will run the program with dataTwo.txt as input file. In this case, the output will be the indices of inputs from dataTwo.txt that were solved incorrectly by your program together with percentage of correcly solved in- stances. If all instancea are solved correctly, the output should be 100%. File dataTwo.txt contains the same instances as dataOne.txt, but, in addition dataTwo.txt also contains the correct, that is optimal objective function values (their integral part), answers to these instances. You may use dataTwo.txt to test the correctness of your program.
Description of the input data format:
Input data text file dataOne.txt has the following format (this example
has 2 inputs, each input ends with A):
k a1
a2 ... ... an A k a1 a2 ... ... an A
In general file dataOne.txt can have an arbitrary number of distinct inputs of arbitrary, varying lengths. File dataOne.txt contains 31 different instances of the problem. Observe that n is not explicitly given as part of the instance.
Input data text file dataTwo.txt has the following format (this example has again 2 inputs, each input ends with A):
k ans1 a1 a2 ... ... an A k ans2 a1 a2 ... ... an A
There ans1 (ans2, respectively) is the correct value of the optimal (mini- mum) value (its integral part) of the objective function in the optimal solution to the first (second, respectively) instance. File dataTwo.txt con- tains the same 31 instances of the problem as in file dataOne.txt but in addition has the answers. This data can be used to test the correctness of your procedure.