代写 html graph 1. Reconstructing ls

1. Reconstructing ls
You are working at a company which, sadly, only has access to a very old version of the ls command. Recent versions provide the ability to sort files by their size (in bytes), using the -S switch, but the version your team uses does not have this. You also may not use the –sort=size option, which has exactly the same effect.
Without using the new -S switch, develop a script called ls_by_size.sh to list files, sorted by their size.
[10 marks]
2. Words
Given a file of simple text, as a sample, develop a script called word_counter.sh to uniquely list all words found in a file (list each unique word once). We’ll define a “word” to be any sequence of one-or-more alphabetic characters, the words “Hello” and “hello” are distinct words, and all other non-alphabetic characters should be ignored. Your script should read from standard input, and write to standard output.
[10 marks]
3. Weather
Your colleagues have been complaining that on the weekend, the weather is worse than during the working week (when they are unable to enjoy it as much).
Look at the Bureau of Meteorology page containing Perth daily weather observations: http://www.bom.gov.au/climate/dwo/IDCJDW6111.latest.shtml. The page contains a link to a “plain text version” of the page data, in .csv (comma-separated value) format.
Write a script which downloads the .csv file from the BOM site, and analyses the results of the current month to determine how “enjoyable” each day’s weather is. (You may use your own definition of “enjoyable”, but it should involve at least 3 columns from the data, and you should add a comment in your script justifying the definition.) Call your script weekends.sh. It should output one line per day, containing two columns, separated by commas: the date (in YYYY-MM-DD format – note that this is different to the format used in the BOM data file), and a column saying either “enjoyable” or “unenjoyable”. [5 marks]
Then write a script called weather_analyser.sh, which can be used to analyse the output of weekends.sh and report whether the weather on weekends is, indeed, less enjoyable than that of weekdays. You may use whatever analysis you like, but should add comments in your script justifying it. Your script should simply output the word “supported” or “unsupported” depending on weather the data it is given supports, or does not support, the claim that weekend weather is worse. [5 marks]
4. Plague
It is London, 1854, and plague stalks the land. You have accepted a data science position with physician John Snow, who is investigating possible causes of a cholera outbreak in London. His data is available in tab-separated format at http://www.randomservices.org/random/data/Snow.html (see the links at the bottom of the page).
He has two files, containing data on the locations of water pumps in London, and the location of people who have died of cholera.
He asks you to develop two different visualizations of the data to display to London public officials. Write two scripts, vis1.sh and vis2.sh, which will analyse the data and create a visualization, producing HTML output. The scripts should take two command-line arguments, for a deaths file and a pumps file, and send their output to standard output.
You may choose the representations. However, only one representation may be a ‘simple’ one, such as a histogram. The other should present some more insightful information (trends, geographical imagery, etc.).
[5 marks per script]

Frequently Asked Questions
Question 1
• Can we use the –sort=size option?
Answer:
No, that has exactly the same effect as the -S option, and is prohibited.
• Do we have to handle all the other possible flags of the ls command? e.g. Do we have to consider the case where a user types ls_by_size.sh -r, which would be equivalent to ls -r -S?
Answer:
That’s not necessary, though if you do it correctly, that will be considered a bonus. You can obtain full marks if your script handles invocations of the form ls_by_size.sh my_file (where my_file is some file or directory).
• Will our script take a list of files on standard input? Or will the user give them as command-line arguments to the script?
Answer:
Ideally, your script will use command-line arguments as input – i.e., you can imagine your script being invoked as something like ls_by_size.sh file1 file2 …. (Hint: you will need to work out how to do this, but it’s very similar to accessing arguments to a function.)
However, if you are having difficulties with that, then you may instead assume a list of files is supplied to standard input, and this will be considered only a minor flaw in your solution. If you do this, then please add a README text file to your submission, stating that’s what you’ve chosen to do.
• Should we output the filenames and sizes? Or just the filenames?
Answer:
Ideally, your script will output just the filenames. E.g., if a user types ls_by_size.sh ., and there are three files called small, medium and large in the current directory, of 1 MB, 5MB and 10MB size respectively, then your script should output:
 
large
medium
small
Correction: as per the man page, the -S flag causes ls to sort in descending order.
However, if your output contains size information as well, that will be considered only a minor flaw.
• Should we output the result in multiple columns, like ls does by default? Or can we output files one per line?
Answer:
One per line.
• Is there anything we can use to test script 1?
Answer:
There is – here is a script, test-q1.sh, which runs some basic tests on your ls_by_size.sh script. Put it in the same directory as your ls_by_size.sh script, and run it, and it will tell you if your script passes or fails those basic tests.
Question 2
• How many words are in, say, the string http://www.example.org/ch01s02?
Answer:
Six words: “http”, “www”, “example”, “org”, “ch” and “s”.
• Will we be supplied the name of a file as a command-line argument? Or can we assume it will always be called unix-1969-1971.txt?
Answer:
Your script should read a file – that is, the contents of a file – from standard input.
• Is there a test script for question 2?
Answer:
There is: test-q2.sh. This performs some very basic tests of your script – can it be executed, and does it give the correct answer for some small inputs of up to five words’ length.
If you are unsure whether your script is getting the right answer for longer sample texts, you might like to try pasting them into the form at this page. It won’t tell you the exact word count, but it will let you know if you are close.
Question 3
• What is the CSV file called that we need to use?
Answer:
The current one is called IDCJDW6111.201905.csv, and can be obtained from the link that says “plain text version (4 kb)”.
• Should we allow for the fact that the location of the .csv file will change? For instance, May’s data is available from the URL http://www.bom.gov.au/climate/dwo/201905/text/IDCJDW6111.201905.csv, but April’s was found at http://www.bom.gov.au/climate/dwo/201904/text/IDCJDW6111.201904.csv, and presumably June’s will be at http://www.bom.gov.au/climate/dwo/201906/text/IDCJDW6111.201906.csv.
Answer:
If you like, you may hard-code the URL for the date of the current file, as at the date the project is submitted.
Although, better practice would be to (for instance) check the date at the time the script is being run, and work out the appropriate URL to use.
• Can our script make use of programs like wget or curl to download the data?
Answer:
Yes, absolutely – Bash has no way of downloading data from URLs itself,1 so your script should certainly use wget or curl to do so.
• What is YYY-MM-DD format? Do you mean YYYY-MM-DD?
Answer:
The later is correct, and I’ve updated the specification. It means a format like 2019-04-28, representing the 28th of April, 2019.
• So we need to write two scripts?
Answer:
You do.
• Where does our second script, weather_analyser.sh, get its input from?
Answer:
It should expect to receive lines on standard input, where each line contains two columns, separated by commas; the first column should be a date (in YYYY-MM-DD format), and the second either the word “unenjoyable” or “enjoyable”.
For example:
2019-03-01,enjoyable
2019-03-02,unenjoyable
2019-03-03,enjoyable
This means your scripts could be put in a pipeline, weekends.sh | weather_analyser.sh, to get a single line of output, either “supported” or “unsupported”.
• Should our second script, weather_analyser.sh, consider the month’s data as a whole? Or should it analyse each week separately?
Answer:
It’s up to you, but make sure you add some comments explaining why you chose the analysis you did. For instance, you might consider the percentage of weekdays that were enjoyable, versus the percentage of weekends that were enjoyable, and output the word “supported” when the former is higher. (Is that a good method of analysis? I leave it to you to decide.)
• Is there a test script for question 3?
Answer:
There is: test-q3.sh. This performs some basic tests of your two scripts for this question – can they be executed, and do they give output in the correct format.

1. Well, it does in some versions – see the StackExchange answer at https://unix.stackexchange.com/posts/83927/revisions. But the better approach is still to use wget or curl.↩