CSE 3500 — Algorithms and Complexity — Yufeng Wu — Spring 2019
Programming Assignment ——- Due: 03/29/19 by the end of the day.
In this assignment, you are going to implement two algorithms for determining whether the h-index from a given list of citation numbers. This is exactly the problem five in exam one. In case you forget about it, here is the problem description.
There are n articles that receives C1, C2, . . . , Cn citations from fellow researchers. Now he wants to compute the so-called “h-index” of his work. According to the definition of h-index on Wikipedia: “A scientist has index h if h of his/her n papers have at least h citations each, and the other n − h papers have no more than h citations each.” Here is an example. Suppose the number of citations C = [3,0,6,1,5] for n = 5. Then h-index is 3. This is because [3,0,6,1,5] means the researcher has 5 papers and each of them had received 3, 0, 6, 1, 5 citations respectively; since the researcher has 3 papers with at least 3 citations each and the remaining two with no more than 3 citations each, the h-index is 3. Note: If there are several possible values for h, the maximum one is taken as the h-index.
For this problem, the students in the exam used various types of algorithms: some algorithms are more efficient and others are not. We consider two algorithms here:
A We start from the lowest possible h-index: 0 and work our way up by checking each possible h-index. For each h-index candidate value h, scan the entire list to see if there are h articles receiving at least h citations. Stop at the first time when the above test fails. The h-index is the largest h value that passes the test.
B First sort the citation numbers and then find the h-index by starting from the highest citation number. For details, check out the posted solution.
Now implement both algorithms using your favorite language (perhaps Java). Then run your program on the provided data files to compare how the two different algorithms perform on small to large data files. Here are more detailed instructions:
File format The data file format is simple. Each line has a single number (the citation number). That is, if there are 100 articles, there are 100 lines in the file.
Implementation You may use existing functions to perform sorting that is provided by the language. If you write sorting yourself, be careful to make sure they are efficient (e.g. sorting takes O(nlogn) time).
What to submit?
Submit a short report containing the following.
Settings Results
Conclusion Code
Write down the language you use, the machine (its CPU frequency and memory size) you use for testing your program.
Provide a table comparing the two algorithms by listing the average running time for each data size. Then plot the running time in a diagram (with the x-axis being the data size and y-axis being the running time); choose the proper scale to best demonstrate the results.
Draw conclusion on why choice of algorithm matters.
Attach the source code of your implementation. If it is short enough, you may simply include your code as part of your report.
1