COMP2123 Programming technologies and tools Assignment 1 – Linux commands and shell script
Due: February 22th (Fri) 17:30
1. [15%] Get Maximum
Write a shell script, getMax.sh, which read a list of comma separated values in a file, and output the maximum of a specific column. The script takes two command line arguments. The first argument is the filename of a file with a list of comma separated values. The second argument is the column that the maximum value is to be printed. Column number starts from 1.
Print “Not enough arguments” if less than two command line arguments are specified. Sample input and output
Assume we have test1.txt in the current directory, with the following content:
Some sample execution of the script and the corresponding output is showed below, input is highlighted.
• You can assume that the file specified always contain a list of comma separated values.
• You can assume that the column number specified is within the number of columns in the file
specified.
Submission
Please evaluate your script in the corresponding VPL activity on Moodle, your script must be named getMax.sh.
You can create your own txt file in VPL to test your script by using the “Run” button. However, please note that the “Evaluate” button will test your script using our version of txt files.
Make sure that your script can pass all test cases, a copy of the txt file used in evaluation can be found on Moodle. Note that additional test cases will be used when grading.
2. [35%] Statistics by timestamp
The Metro company in HK has installed smart card scanners in the gate machines. All scanners are connected to a central backend system that stores the access records of passengers. The system generates a log file every day, each line in the log file represents a transaction made through the gate machine and has the following format:
1,5,7
3,4,3
2,7,10
7,1,0
9,2,8
$
10
$ ./getMax.sh test1.txt Not enough arguments
./getMax.sh test1.txt 3
[Timestamp],[SmartcardID],[Station name],[Enter/Exit]
• Timestamp has the format YYYY-MM-DD_HH:MM:SS.Millisecond.
• SmartcardID is an 8-digit value identifying a smartcard user.
• Station name is the name of a metro station; it may consist of spaces.
• Enter/Exit represents whether the user is entering the station or exiting the station.
• Comma “,” is used as field separator in the log file.
• The log files of the Metro company have “Metro_” as prefix in their filenames, followed by the date in YYYY-MM-DD, followed by “.log”.
• You can assume that the transactions are sorted in ascending order of the Timestamps.
Write a shell script, stat.sh, which reads all log files (file name ending in .log) in the current folder
and report the most number of distinct smartcard user matching a specific timestamp.
The script takes one command line argument, which is a timestamp prefix of records to be extracted.
The script should display the station name of stations entered by the most number of distinct smartcard users, preceded by the corresponding distinct smartcard user count, in a descending lexicographical order of station name. It should print “No records found” if there is no station to output.
You can assume that an argument will always be specified when your script is executed.
Sample input and output
Suppose there are two log files, Metro_2019-01-01.log, and Metro_2019-01-02.log.
Metro_2019-01-01.log:
2019-01-01_12:18:44.974599,10712571,Kowloon Tong,Enter 2019-01-01_12:18:44.981546,10712571,Kowloon,Exit 2019-01-01_12:18:46.753441,42421231,HKU Station,Enter 2019-01-01_12:18:48.421412,85489422,HKU Station,Enter 2019-01-01_12:18:59.974599,10712571,Kowloon Tong,Enter 2019-01-01_13:09:29.373117,94144216,Sha Tin,Enter 2019-01-01_13:02:44.152494,98761664,Tung Chung,Enter 2019-01-01_13:11:32.290396,91743551,Sunny Bay,Enter 2019-01-01_13:12:10.546858,79143597,Kennedy Town,Enter
Metro_2019-01-02.log:
2019-01-02_12:00:01.836803,91743551,Prince Edward,Enter 2019-01-02_12:20:32.882912,45573708,Sunny Bay,Enter 2019-01-02_12:35:24.976605,78954490,Prince Edward,Enter 2019-01-02_13:11:42.316110,91743551,Prince Edward,Exit 2019-01-02_13:13:31.148177,91743551,Prince Edward,Enter 2019-01-02_13:22:01.514125,91743551,Sunny Bay,Exit 2019-01-02_14:42:11.241242,91743551,Sunny Bay,Enter 2019-01-02_14:51:42.512432,91743551,Prince Edward,Exit
Some sample execution of the script and the corresponding output is showed below. input is highlighted.
$ ./stat.sh 2019-01-01_12:18:44.974599 1 Kowloon Tong
$ ./stat.sh 2019-01
2 Sunny Bay
2 Prince Edward 2 HKU Station
$ ./stat.sh 2018 No records found
Submission
Please evaluate your script in the corresponding VPL activity on Moodle, your script must be named stat.sh.
You can create your own set of log files in VPL to test your script by using the “Run” button. However, please note that the “Evaluate” button will test your script using our original set of log files.
Make sure that your script can pass all test cases, a copy of the set of log files used in evaluation can be found on Moodle. Note that additional test cases will be used when grading.
3. [50%] Log analysis
The St. John Hospital has installed a patient treatment management system to keep track of the medicine treatment records of patients in the hospital. The system generates a log file every day, each line in the log file represents a medicine given to a patient and has the following format:
• Date has the format YYYY-MM-DD.
• Timestamp has the format HH:MM:SS.Millisecond.
• PatientID is an 8-digit value identifying a patient.
• Medicine Name is the name of a medicine treatment the patient received.
• Comma “,” is used as field separator in the log file.
• The log files of the hospital have “St.John_Hospital_” as prefix of their filenames, followed
by the date in format YYYY-MM-DD, followed by “.log”.
Consider an example St.John_Hospital_2019-01-01.log file that stores the medicine
treatment records on 2019-01-01:
[Date],[Timestamp],[PatientID],[Medicine Name]
…
2019-01-01,05:54:16.086146,89754518,Benadryl Allergy 2019-01-01,06:26:15.064900,94344223,Vanacof G 2019-01-01,06:32:50.644160,33161119,Topco Allergy 2019-01-01,06:54:40.763127,91469497,Ceron 2019-01-01,06:58:05.840255,62891461,Topco Allergy 2019-01-01,07:22:54.949143,78457243,Triaminic
…
2019-01-01,07:44:18.572933,95661866,Bromax 2019-01-01,07:52:14.112270,63337975,Dytan 2019-01-01,07:58:20.876797,84562613,Dicopanol 2019-01-01,08:53:03.493598,22398534,Lohist 2019-01-01,09:42:52.141223,33161119,Topco Allergy …
• You can assume that the records are sorted in ascending order of the Timestamps.
• From the log file we can trace a patient’s treatment records in a day. For example, we can see from the highlighted records that the patient 33161119 has received Topco Allergy treatment
twice on 2019-01-01, at 06:32:50 and 09:42:52, respectively.
Part a: analyze.sh
Create a shell script analyze.sh that do the following:
• For each “St.John_Hospital_” log file, find the top three medicines in descending order of
the treatment counts (the number of treatment records in the log file of that medicine), and output
the counts and the medicine names.
• As an example, Topco Allergy will have count of 3 in the above running example because there are
three treatment records of the medicine Topco Allergy (A patient receiving the same medicine for
two times contribute to 2 in the count, not 1).
• If we run analyze.sh on the directory that consists of log files from 2019-01-01 to 2019-01-05
we provided, the following will output (spacing doesn’t matters):
St.John_Hospital_2019-01-01.log:
11 Dytan
10 Topco Allergy
7 Vanacof G
St.John_Hospital_2019-01-02.log:
9 Dytan
7 Tussi-12S
7 Triaminic
St.John_Hospital_2019-01-03.log:
12 Genahist
7 Percogesic
7 Lohist
St.John_Hospital_2019-01-04.log:
9 Dicopanol
9 Allermax
7 Sinucon
St.John_Hospital_2019-01-05.log:
9 Banophen
8 Teldrin
8 Suttar SF
• You can assume that all the “St.John_Hospital_” log files are stored in the same directory of analyze.sh.
• You can assume that all log files are with file extensions “.log”.
• If the medicines have the same count, we output them in decreasing lexicographical order of the
medicine name. Therefore “9 Dicopanol” is put ahead of “9 Allermax” under “St.John_Hospital_2019-01-04.log” in the example above.
After you have finished the shell script, you are strongly encouraged to create some other input data to test if your script is correctly implemented. E.g., Updating St.John_Hospital_2019-01- 01.log to make 10 records for Vanacof G, to see if you are able to output the correct result.
Submission
Please evaluate your script in the corresponding VPL activity on Moodle, your script must be named analyze.sh.
You can create your own set of log files in VPL to test your script by using the “Run” button. However, please note that the “Evaluate” button will test your script using our original set of log files.
Make sure that your script can pass all test cases, a copy of the set of log files used in evaluation can be found on Moodle. Note that additional test cases will be used when grading.
Part b: trace.sh
Suppose that the Hong Kong Hospital Authority has developed a centralized patient treatment management system for all hospitals in Hong Kong. Besides the “St.John_Hospital_” log files, the centralized system also stores the log files of other hospitals.
For instance, consider the HKU_Clinic_2019-01-01.log file that consists of all the treatment records made in the HKU University Healthcare Service Clinics on 2019-01-01:
These log files follow the same format as in the “St.John_Hospital_” log files, same set of assumptions apply. Apart from that, you can observe the followings.
• You can also assume that the patient ID is unique for each patient across the log files of all hospitals.
• From the log files we can trace a patient’s treatment record across various hospitals. E.g., the four records of user 33161119 (Two in St.John_Hospital_2019-01-01.log previously shown,
and two in HKU_Clinic_2019-01-01.log highlighted above), we found that the patient:
HadtwotreatmentsofTopcoAllergyinSt.JohnHospitalon2019-01-01attime06:32:50and
at 09:42:52.
Had another treatment of Suttar SF in HKU Clinic on 2019-01-01 at time 23:47:14. HadanothertreatmentofDytaninHKUClinicon2019-01-01attime23:59:44.
Create a shell script trace.sh that does the following:
• trace.shtakesonecommandlineinputargument.TheinputargumentrepresentsthepatientID of the patient that we would like to trace.
• You can assume that the “St.John_Hospital_” and “HKU_Clinic_” log files are all located in the same directory as trace.sh.
If we run trace.sh as follows:
trace.sh will then generate a file 33161119.log that contains a list of treatment records made by the patient with patientID 33161119 in all log files, sorted in ascending order of the date and timestamp.
As an example, first seven transactions recorded in 33161119.log is shown below as an example.
2019-01-01,23:22:12.124658,32789788,Percogesic 2019-01-01,23:27:50.144289,86226341,Genahist 2019-01-01,23:44:59.380586,47324275,Percogesic 2019-01-01,23:47:14.226021,33161119,Suttar SF 2019-01-01,23:51:05.755840,98983647,Topco Allergy 2019-01-01,23:55:36.320216,15898537,Triaminic
2019-01-01,23:59:44.823150,33161119,Dytan
$
./trace.sh 33161119
2019-01-01,06:32:50.644160,33161119,Topco Allergy 2019-01-01,09:42:52.141223,33161119,Topco Allergy 2019-01-01,23:47:14.226021,33161119,Suttar SF 2019-01-01,23:59:44.823150,33161119,Dytan 2019-01-02,13:03:45.008479,33161119,Polyhist NC 2019-01-03,03:22:25.323878,33161119,Genahist 2019-01-05,22:01:02.033978,33161119,Bromax
…
Note that trace.sh will only create the trace log but output nothing except the following cases:
• If the number of input arguments is not 1, then trace.sh prints the message: “Usage: ./trace.sh (patientID)”
• If there are no treatment records found, then 33161119.log should not be generated. The script should print “No records found for 33161119”.
After you have finished the shell script, you are strongly encouraged to create some other input data to test if your script is correctly implemented.
Submission
Please evaluate your script in the corresponding VPL activity on Moodle, your script must be named trace.sh.
You can create your own set of log files in VPL to test your script by using the “Run” button. However, please note that the “Evaluate” button will test your script using our original set of log files.
Make sure that your script can pass all test cases, a copy of the set of log files used in evaluation can be found on Moodle. Note that additional test cases will be used when grading.
— END —