Assignment 3
AVL Trees, Hashes, URLs, and IP Addresses
This assignment is due on May 9th at midnight (end of the day).
For assignment three, you will construct an AVL balanced binary search tree, and compare the efficiency of searching the tree with the hash that you wrote in assignment 2.
AVL trees are balanced trees that ensure the heights of all the subtrees are approximately the same. AVL trees require that the maximum difference in height between any two subtrees is no more than 1, and once the difference in height is greater than 1, the tree must be rebalanced by rotation.
Recall that to correct violations to these rules you will need to use a leftRotation, a rightRotation, a leftRightRotation, or a rightLeftRotation depending on the particular imbalance that you are dealing with.
We will make a generic AVL tree that can handle any data types. Our tree will hold both K keys and V values in its nodes. Your tree will adhere to the AVLTreeI interface that is in your git repositories on edoras.sdsu.edu. You should check those out using a similar approach that you have used in the previous assignments. The original instructions are still available online. Note that you must not implement this using the array approach that we discussed for heaps. You must implement this using pointers and nodes.
The Node Class
Your node class should be an inner class in the AVL tree and should store both the K key and V value.
Your node class also has pointers for the left child (e.g. Node
Methods
Your tree will adhere to the AVLTreeI interface that is in your git repositories on edoras.sdsu.edu.
You will need to create a file called AVLTree.java that contains the code. Instructions are provided on how to create this file and add it to the Git repository.
Your AVL tree will include an add method public void add(K key, V value) to add generic data to the tree, the data should end up in the right place, and the tree should be balanced (using the rules above) after every addition to the tree. In order to figure out whether things are greater or less than other things, you will need to use the compareTo method! You may ignore duplicate items and assume that I will not use duplicate items in any grading.
We are only going to add things to the tree, and not remove things from the tree. Therefore, your tree does not need to include a remove public E remove(K key) method.
Your tree will include a find method public boolean contains(K key) that returns true if the object is found in your tree and a method public V get(K key) that returns the value associated with a key.
Your tree will include a height method public int height() that returns the height of the tree for the longest path in the tree. To facilitate this, and to identify whether there is an imbalance in your tree, you should encode a private height method that accepts a node as a parameter private int height(Node
Your tree will include an iterator for all the keys in the tree, and the iterator should return the keys in the order they would appear if they were visited via an in-order (i.e. left-root-right) traversal.
Finally, your tree will include a print method public void print() that prints out the entire tree in a recursive manner. The tree should be printed out in in-order traversal order (i.e. left-root-right). You can print out one node per line, and use dots or hyphens to indicate levels on the tree. For example, for the tree shown as:
错误!未指定文件名。
Your code can print out:
..1
.3
..5
6 (root)
..7
.8
..9
…10
Of course, this tree just has one data point per node. Your tree will have keys and values, and you should think about how to represent those in the printout.
Additional Methods
You will need to write some additional methods that are not described in the interface to enable you to program the AVL tree. For example, these methods may help you:
• public int size() – this method returns the number of elements in the tree.
• public boolean isEmpty() – this method returns true if the tree is empty.
• private void rotate(Node
• Rotation methods:
• private void leftRotate(Node
• private void rightRotate(Node
• private void leftRightRotate(Node
• private void rightLeftRotate(Node
Default constructor
If you write code with a constructor that includes arguments, you should include default constructor in your code!
Comparing the tree to the hash
We will use IP Address and URL classes from Assignment 2 to compare the tree to the hash! In that assignment, I provided you with a file of ~2,000 URLs and IPAddresses. In this assignment, we will use a larger data set. There are two data sets you can use, either top-250k.ip or top-1m.ip. The first file has 250.000 IP addresses and URLs. The second file has 1,198,307 IP addresses and URLs. You can download those files from the link. Do not add those files to your git repository, because your account on edoras does not have enough space to store the file!
In these files, the IP address is unique, but the URL may or may not be unique. We will use the IP address as the key and the URL as the value, and we will construct both trees and hashes that hold those data.
To do this:
• copy the HashI.java, Hash.java, HashListI.java, and LinkedList.java code from Assignment2/data_structures to Assignment3/data_structures
• copy the URL.java and IPAddress.java from Assignment2/dns_resolver to Assignment3/dns_resolver (you can leave them in that package, but copy them to Assignment3).
Timing the data structures.
We are going to time the efficiency of your data structures by adding the IP addresses and URLs from the top-250k.ip or top-1m.ip files to the data structures. We will time (a) how long it takes to populate the data structure (i.e. how long it takes to read the file and store it in your tree or hash), and (b) how long it takes to search for everything in the data structure.
In addition, you will test how long it takes using two of the standard libraries in the Java API: java.util.hashMap, a chained-hash that is similar to your hash from Assignment2 and java.util.treeMap, a tree that is similar to your tree from this assignment, although it has been implemented as a Red/Black tree, not an AVL tree.
To time the data structures, write a new class called TimeHashAVLTree.java in the package time_data_structures. The goal here is to load our data structure with the IP addresses and URLs, and then search for every entry in each data structure. This will assess how long it takes to load the data, and more importantly how long it takes to search for the data.
We need something to just store the IP addresses in so we can search for them all. We could use a standard Java library for this, but instead, we are going to use a Linked List that you have written since you have them! Use your data_structures/LinkedList.java (which is the same linked list that the hash uses). All we need is the add method and the iterator to get that data back out later. Read the IP address file once, and store the first part of the line, as IPAddress objects, in a linked list. You’ll only need to do this once, and we will keep this LinkedList of data and use it several times.
Second, create a hash, and read the file again, this time making IPAddress and URL objects and storing them in the hash. Time how long it takes to add everything to the hash, so you know how long it takes to store a lot of entries into a hash.
Now, take all the IPAddress objects that are in the linked list and find them in the hash. We are searching for every linked list in the hash, and so each time we need to calculate the hashCode, make it positive, mod it on the table size and then search for the entry. Time how long it takes to do this so you know how long it takes to search for a lot of entries in a hash.
Set your hash to null to clear the memory.
Create an AVL tree using your AVL tree code. Read the file again, and load all the IPAddress and URL entries into your AVL tree. Time how long it takes to do this, so you know how long it takes to store a lot of entries into a tree.
Now, take all the IPAddress objects that are in the linked list and find them in the AVL tree. Time how long it takes to do this so you know how long it takes to search for a lot of entries in a tree.
Set your AVL tree to null to clear the memory.
Using the Java API, import java.util.hashMap and create a new java.util.hashMap object. Load all the IPAddress and URL elements from the file into the hashMap, timing how long it takes to do so. Again, search for all the elements in the linked list and time how long that takes.
Finally, import java.util.treeMap and repeat the exercise, loading the data and searching for all the keys.
You should end up with eight numbers:
A. The time it takes to load your data into:
1. Your hash from Assignment 2
2. Your AVL tree from Assignment 3
3. The Java API hashMap
4. The Java API treeMap
B. The time it takes to search your data in:
5. Your hash from Assignment 2
6. Your AVL tree from Assignment 3
7. The Java API hashMap
8. The Java API treeMap
Write a report (no more than 5 pages) discussing how long it took to load the data into each data structure versus how long it took to search for data in each data structure. For example:
• Does your timing tests make sense?
• Is your hash faster than your AVL tree or vice-versa?
• Is your code as efficient at the Java API code?
• If there is a difference in run time between your code and the Java API, what do you think that difference is caused by?
Finally, remember that we can specify the initial table size for our hash. Does it make a difference to your load time or run time if you start with a small hash (e.g. 16 elements) or a large hash (e.g. 50,000 elements or more).
Timing the code
To time the code, we will use the Java current time in milliseconds function call. Put this line where you want to start timing:
long start = System.currentTimeMillis();
and this line where you want to stop timing:
long stop = System.currentTimeMillis();
The difference between stop and start is the time taken to run that piece of your code.
Java API
You are not allowed to include any of the java.util classes in this work except those explicitly allowed above. You must not import java.util.*. In fact, you should never import java.util.*.
Project Submission:
You must submit your code to the git repository. You can do that using eclipse.
You must submit your write up using the turnitin link on blackboard.
Note: You must submit the code for your AVL tree, your linked list and hash, your IPAddress and URL classes, and your timing test code. The grading will be largely for the timing test code and AVL tree since the other code was graded in assignment 2.
Grading policy
Grades will be assigned as discussed in class and include:
• 10% for git commits. You should commit and push your code to the git repository at least twice per week. I will check the history to see how often you submit code.
• There will be a 10% penalty if you do not make at least one commit of code before class on 4/30/19 in addition to the 10% grade that you get for commits
• 10% for Javadoc. Your code should include appropriate Javadoc comments for all methods — include those that implement interface methods whether or not the interface has comments.
• 60% for code that compiles without error, and runs the program as described.
• 40% of the total grade will be for the hash and the AVL tree.
• 20% of the total grade will be for the timing tests
• 15% of the total will be for your write up. This fulfills the University’s policies on written work for upper division classes.
• 5% for coding style. Pay attention to the Good Programming Practices of Riggin’s reader.
File output
Your files should not print out extraneous information (for example, debugging or other information). None of the classes described here except the print() method in the AVL Tree need to print out anything. You will lose marks if your code prints out things it does not need to.
Submitting Your Work
The last timestamp that you submit your work to the git repository will be the time used for the submission. You should submit your work to git before the deadline to avoid a late penalty.
Early Work
You can receive 5% per day extra credit for submissions completed early. The deadline for this assignment is the end of the day on 05/09/19. Therefore, submissions received before the end of the day (11:59:59 pm) 05/06/19 will receive +15% extra credit, those received between midnight at the beginning of 05/07/19 and the end of the day on 05/07/19 will receive +10% extra credit, those received between the beginning of the day 05/08/19 and the end of the day 05/08/19 will receive +5% extra credit.
Late Work
Pay very careful attention to the deadline. Late work is heavily penalized, and you are usually better off submitting partially complete work than waiting until the very final deadline. If your code is two days late, you will not be able to get an A. If your code is four days late you will not be able to get a B. The late penalty (5% per day) is subtracted from your total based on the above criteria. Work will not be accepted nor graded more than 4 days after the deadline for the assignment.
Honesty policy
The work that you commit and turn in should be your own work. You may discuss ideas and concepts with other students, but you should never, under any circumstances, copy the code from another student. I will use some automatic software to detect cheating (see the course description on blackboard) and will report people who cheat in this class to the Center for Student Rights and Responsibilities. This page may be updated during the period that the assignment is active in response to questions or comments.
错误!未指定文件名。