Your task for this practical assignment consists of two parts:
1. Develop a Perl script using CGI.pm for a web-based system that provides the functionality stated in the Requirements section below.
2. Make the system that you have created accessible and usable via the URL
http://cgi.csc.liv.ac.uk/cgi-bin/cgiwrap/
taking care that the access rights for the file stat.pl are neither too restrictive nor too permissive.
Requirements
The DBLP Computer Science Bibliography is an on-line collection of bibliographic information for over 3.6 million publications by 1.8 million authors. Your system is intended to produce statistics for small subsets of that data, more precisely, it is intended to determine how many publications authors have written on a particular topic.
1. The script should display a web page that contains a form with two text fields and a `Submit’ button. The first text field should allow a user to enter keywords that describe an author or topic, called query in the following. The second text field should allow a user to enter a number that specifies the maximal number of publications that should be included in the statistics that the system will produce, called maxHits in the following
2. If a user presses the `Submit’ button, the system should first check the input. The system should check that query is a non-empty string and should take measures against code injection via this input. The system also needs to check that maxHits is a natural number greater or equal to zero. If one or more of these checks fails, then the system should generate a HTML page containing an error message for each check that has failed and prompt the user to start again.
3. If the user’s input passes these checks, then the system should retrieve information in XML format from the DBLP for up to maxHits publications matching query, this will be the query result. This can be done by retrieving the URL
http://www.dblp.org/search/api/?q=query&h=maxHits&c=4&f=0&format=xml;
Note that the query result is using UTF-8 encoding. You must make sure that your script and the output it produces correctly handles UTF-8 encoded Unicode characters.
4. For each publication the query result will include a list of authors. The system should count for each author in the query result how many publications in the query result he/she is an author of. The system should also determine how many publications are contained in the query result.
5. Once the system has completed the count, it should produce a HTML page that includes
· a statement of the query entered by the user, the XML data that was retrieved, and a statement of the number of publications that were retrieved.
· if and only if a non-zero number of publications was retrieved, two HTML tables, the first showing the ten authors with the most publications and the number of their publications (listed in order of the number of publications) and the second showing the ten authors with the least publications and the number of their publications (listed in reverse order of number of publications).
Each table should have two columns, one for the names of the authors, one for the number of publications and one row for each author. The columns should have appropriate headings, the tables should have appropriate titles. You are permitted to use Perl’s built-in sort function to produce those tables.
6. This HTML page should be displayed to the user as response to the URL the user has entered.
7. As this is an assignment on Perl, no other scripting languages should be used. In particular, JavaScript should not be used for input validation.
8. Your code should be properly commented. This includes pointing out which parts of your code have been developed with the help of on-line sources or textbooks by including references for these sources at the appropriate points.
Test data
Test data, together with the expected results, can be found at http://cgi.csc.liv.ac.uk/~ullrich/COMP284/tests-2016-17N/.
Note that the content of the DBLP changes over time. The following tests were performed on 24 February 2017. The results may differ if the tests are performed at a later date. If you believe that the content of the DBLP has indeed changed once and that the results shown below are no longer valid
1. Query: couffignal; Maximal Number of Hits: 20
The script is expected to show the XML data retrieved from the DBLP, an indication that this XML data contains 3 publications, and the following two tables:
Author
No of publications
Sophie Couffignal
1
Marie-Lise Lair
1
Pierre E. Mounier-Kuhn
1
Valery Bocquet
1
Girolamo Ramunni
1
Gwenaelle Vidal-Trecan
1
Claudine Blum-Boisgard
1
Laurence M. Renard
1
10 Authors with most publications
Author
No of publications
Laurence M. Renard
1
Claudine Blum-Boisgard
1
Gwenaelle Vidal-Trecan
1
Girolamo Ramunni
1
Valery Bocquet
1
Pierre E. Mounier-Kuhn
1
Marie-Lise Lair
1
Sophie Couffignal
1
10 Authors with least publications
Note: The order in which authors with the same number of publications is presented is arbitrary. The two tables overlap as there are less than 20 authors in total. Each of the tables contains less than 10 entries as there were in fact less than 10 authors in total.
2. Query: hustadt; Maximal Number of Hits: 100
The script is expected to show the XML data retrieved from the DBLP, an indication that this XML data contains 60 publications, and the following two tables:
Author
No of publications
Ullrich Hustadt
60
Renate A. Schmidt
29
Clare Dixon
16
Michael Fisher
7
Boris Motik
7
Ulrike Sattler
6
Boris Konev
6
Lan Zhang
6
Cláudia Nalon
5
Lilia Georgieva
4
10 Authors with most publications
Author
No of publications
Ewa Orlowska
1
Frank Wolter
1
Alexandre Riazanov
1
Alexander Leitsch
1
M. Carmen Fernández Gago
1
Dimiter Vakarelov
1
Tanel Tammet
1
Christoph Weidenbach
1
Andrei Voronkov
1
Christoph Meyer
1
10 Authors with least publications
Note: The names contain UTF-8 encoded Unicode characters. Make sure that these are displayed correctly, as shown above.
3. Query: Dunne; Maximal Number of Hits: 1000
The script is expected to show the XML data retrieved from the DBLP, an indication that this XML data contains 392 publications, and the following two tables:
Author
No of publications
Paul E. Dunne
108
Lucy E. Dunne
29
Trevor J. M. Bench-Capon
28
Steve Dunne
27
Michael Wooldridge
18
Cody Dunne
18
Sebastian Dnnebeil
14
Helmut Krcmar
13
Ali Sunyaev
12
Ben Shneiderman
11
10 Authors with most publications
Author
No of publications
David O’Sullivan
1
Juan Pedro Fernndez-Palacios
1
Jason Laks
1
Michael Dunne
1
Adam Zachary Wyner
1
J. Barrie Thompson
1
Debra Nestel
1
Liam Murphy
1
Franois Soulat
1
Daniel O’Hare
1
10 Authors with least publications
4. Query: Peter Jones; Maximal Number of Hits: 1000
The script is expected to show the XML data retrieved from the DBLP, an indication that this XML data contains 378 publications, and the following two tables:
Author
No of publications
Gareth J. F. Jones
22
Peter Jones
22
Alan F. Smeaton
11
Peter Wilkins
11
Peter C. Jones
11
Gerard Guiraudon
10
Charles Safran
10
Terry M. Peters
10
Douglas L. Jones
10
Peter Willett 0002
9
10 Authors with most publications
Author
No of publications
Diana D. Day
1
Daiya Takai
1
Stuart Lithwick
1
Cliff Click
1
Jinqi Li
1
Safa A. Najim
1
Jon Burgess
1
Tim D. Fryer
1
Dustin G. Mark
1
Peter Z. Revesz
1
10 Authors with least publications
5. Query: nobody; Maximal Number of Hits: 100
The script is expected to show the XML data retrieved from the DBLP, an indication that this XML data contains 30 publications, and the following two tables:
Author
No of publications
Rukun Mao
2
Husheng Li
2
Christoph Bussler
2
Justin Cappos
1
Holger Giese
1
Wolfgang Thomas
1
Zhixing Zhang
1
Philip J. McParlane
1
Peter M. Maurer
1
Sai Teja Peddinti
1
10 Authors with most publications
Author
No of publications
Brigit van Loggem
1
Esben Fisker
1
Yashar Moshfeghi
1
Rafail Ostrovsky
1
Dinei A. F. Florêncio
1
Eric Horvitz
1
Michael Cook
1
Oliver Marschollek
1
Hideo Okuma
1
Daniel A. Epstein
1
10 Authors with least publications
6. Query: xyzz; Maximal Number of Hits: 10
The script is expected to show the XML data retrieved from the DBLP and an indication that this XML data contains 0 publications. As the number of publications is 0, no HTML tables should be shown.
7. Query: ; Maximal Number of Hits: 100
An empty query should produce an error message
8. Query: Grant; Maximal Number of Hits: zero
A maximal number of hits that is not a natural number greater or equal to 0 should produce an error message
9. Query: Davide; Maximal Number of Hits: -1
A maximal number of hits that is not a natural number greater or equal to 0 should produce an error message
10. Query: ; Maximal Number of Hits:
An empty query and an empty maximal number of hits should produce two error messages, one for each error. The same applies for an empty query and a maximal number of hits that is not a natural number greater or equal to 0.
11. Query: McCabe; Maximal Number of Hits: 0
This is valid input to the system. The script is expected to execute this query, show the XML data retrieved from the DBLP, and indicate that this XML data contains 0 publications. As the number of publications is 0, no HTML tables should be shown.