Programming Assignment 3

Programming Assignment 3

Due: 11:59pm, Tuesday, October 20

Worth 8% of final grade

Overview

The goals of this assignment are to:

  1. Use file I/O
  2. Use arrays and collections
  3. Think about runtime

Setup

Open a new Linux terminal window. In your home directory, create/make a new directory called HW3 and change directory into that directory:

There are no starter files for this project, but there are test input files and validation output files.

Part I: Reading (10 pts)

Read the following articles and answer the following questions in a file named PART1 (note: not part1.doc or part1.txt. The bundle script will only accept a file named PART1):

1. Hashing
2. https://en.wikipedia.org/wiki/Big_O_notation 3. https://en.wikipedia.org/wiki/Sorting_algorithm

Q1: What are the pros/cons of using arrays?
Q2: Briefly explain what an ArrayList is and list their pros/cons. Q3: Briefly explain what a HashSet is and list their pros/cons. Q4: What is the big-O runtime for insert sort? For QuickSort? Q5: Why does a sort have to be at least O(n)?

Part II: Word cloud (30 pts)

Overview: You will create a program, WordCloud.java, to read in a text file and output the top N used words in order. The words are case insensitive (River, river, rivER would all count as the same word). The program will take two arguments: the text file to read in and the number of words to print out.

$ mkdir ~/HW3
$ cd ~/HW3

Based on Rick Ord’s problem set 2.

Your program will also need to read in the text file:

/home/linux/ieng6/cs8b/public/HW3/common.txt

filled with words that should not appear in the resulting output, e.g. the words: the, a, an. The removed common words are also case insensitive.

Example usage:

$ java WordCloud input.txt 3
states – 110
president – 102
united – 85

Sample solutions for the two input files have been provided in the public HW3 directory. You can compare via:

$ java WordCloud ~/../public/HW3/poem.txt 6
$ java WordCloud ~/../public/HW3/allbooks.txt 10

And see how your output looks vs. the provided poem_6.out and allbooks_10.out.

Additional Details

  •   If the user requests the top 100 words, but there are only 50 unique words in the input file, then the top 50 words should be printed.
  •   If multiple words have the same frequency, you can print them in any order, e.g.
$ java WordCloud input.txt 2
states – 50
presidents – 50

or

$ java WordCloud input.txt 2
presidents – 50
states – 50

both outputs are valid.

 If the user does not enter the file name to read, or the number of words to report, the program should state the following:

$ java WordCloud
Usage: java WordCloud <file name> <#words>

Based on Rick Ord’s problem set 2.

 You may decide to use multiple classes to solve this problem. The turnin instructions have been modified for this assignment. Read through the guide at the end of the document to be sure you submit all the correct files.

This assignment will be graded on a range of:

  • –  Correctness: program outputs correct top words
  • –  Effort: program is optimized by using an ArrayList, Set, or Map or some sort of collection
  • –  Clarity: Program flow is logical, clear, and easy to understand (comments and appropriate

    variable names will help)

    Fastest time to solution

    Tutors have the option of selecting the fastest program to represent their group. We will collect the fastest programs from each section and time their execution on my desktop. The fastest program will receive a 100% on the final and the wining team’s tutor will receive a gift card. If the fastest programs have almost the same runtime, the award will go to the student that turns in their assignment first.

    Note, the assignment must work correctly to be entered into the contest. I’ll be testing with a variety of inputs, e.g. allbooks.txt <10, 100, 1000, 10000, 100000>. You can find the time of your program execution via:

    $ time java WordCloud <file> <#nwords>
    

    Style Requirements (10 pts)

    You will be graded for the style of programming on this assignment. A few requirements for style are given below and at https://google.github.io/styleguide/javaguide.html. These guidelines for style will have to be followed for all the remaining assignments. Read them carefully. In the template code provided below for this assignment, all of these requirements are met (replace comments appropriately).

    ● Use reasonable comments to make your code clear and readable. ● Use Javadoc style comments for all classes and methods.

    ● The comments should describe the purpose of your program and methods.
    ● Use reasonable variable names that are meaningful.
    ● Use static final constants to make your code as general as possible. No hardcoding constant values inline.
    ● Judicious use of blank spaces around logical chunks of code makes your code much easier to read and debug.
    ● Keep all lines less than 80 characters. Make sure each level of indentation lines up evenly.
    ● Every time you open a new block of code (use a ‘{‘), indent farther. Go back to the previous level of indenting when you close the block (use a ‘}’).
    ● Always recompile and run your program right before turning it in, just in case you commented out some code by mistake.

    Turnin Instructions

    Remember the deadline to turn in your assignment is Tuesday, October 20, by 11:59pm. Based on Rick Ord’s problem set 2.

Make sure the program works correctly on the ieng6 linux servers. Because there is flexibility in the file names that you use for your program, you’ll need to create a README file that outlines how to compile and run your program, e.g.

When you are ready to turn in your program in, type in the following command and answer the prompted questions:

$ cat README
To compile: javac WordCloud.java
To Run:     java WordCloud <file> <#words>
$ cd ~/HW3
$ tar cvf HW3.tar *.java PART1 README
$ bundleP3
Good; all required files are present:

HW3.tar

Do you want to go ahead and turnin these files? [y/n]y
OK.  Proceeding.
Performing turnin of approx. 6416 bytes (+/- 10%)
Copying to /home/linux/ieng6/cs8b/turnin.dest/cs8bezz.P3
...
Done.
Total bytes written: 6656
Please check to be sure that's reasonable.
Turnin successful.

You can turnin your program multiple times. The turnin program will ask you if you want to overwrite a previously-turned in homework. ONLY THE LAST TURNIN IS USED!

Note: The command:

Packages all .java files, PART1, and README into the file HW3.tar. You can test that all files are present by extracting the files with the command:

Read more on the tar command by running ‘man tar’.

tar cvf HW3.tar *.java PART1 README
tar xvf HW3.tar

Based on Rick Ord’s problem set 2.