HashTable Nachos Java
Threads and Synchronization
Goal: The baseline Nachos implementation has an incomplete thread system. In this assignment, your job is to complete it, and then use it to solve synchronization problems. You will:
- Gain experience with simple thread programming (execute multiple Nachos threads (and possibly Java threads) with synchronization).
- Implement Nachos locks and condition variables.
- Turn a thread-unsafe data structure into thread-safe data structure.
- Write a test code to spawn threads and access a hash table concurrently.
Background: Properly synchronized code should work no matter what order the scheduler chooses to run the threads on the ready list. In other words, we should be able to put a call to KThread.yield (causing the scheduler to choose another thread to run) anywhere in your code where interrupts are enabled, and your code should still be correct. You will be asked to write properly synchronized code as part of the later assignments, so understanding how to do this is crucial to being able to do the project.
You will be modifying source code files in the threads subdirectory, and compiling in the proj1 subdirectory. You are encouraged to add new classes to your project as you see fit; the code we provide you is not a complete skeleton for the project. However, be careful to add any additional classes to the packages (directories) permitted. For all of the projects in this course:
- Do not modify Makefile, except in the test directory for projects 2 and 3.
- Only modify nachos.conf according to the project specifications.
- Do not modify any classes in the nachos.machine package, the nachos.ag package, or the nachos.security package. Also do not add any classes to these packages, as they would not be used during grading.
- Do not add any new packages to your project. All the classes you submit must reside in the packages we provide.
- Do not modify the API for methods that the grader uses. This constraint is enforced every time you run Nachos by Machine.checkUserClasses. If an assertion fails there, you will know you have modified an interface that needs to stay the way it was given to you.
- Do not directly use Java native threads (the java.lang.Thread class). The Nachos security manager will not permit it. All the threads you use should be managed by TCB objects (see the documentation for nachos.machine.TCB).
- Do not use the synchronized keyword in any of your code. We will grep for it and reject any submission that contains it.
- Do not directly use Java File objects (in the java.io package). In later projects, when we start dealing with files, you will use a Nachos file system layer.
- Sprinkled throughout the Nachos classes there are some calls to the Nachos autograder. Leave them as they are.
There should be no busy-waiting in any of your solutions to this assignment. (The initial implementation of Alarm.waitUntil is an example of busy-waiting.)
Tasks
Part A. Nachos thread system (50 basic points, 20 extra points)
(a) Implement KThread.join, which synchronizes the calling thread with the completion of the called
thread. As an example, if thread B executes the following:
KThread A = new KThread (…);
…
A.join ();
we say that thread B (the ”parent”) joins with thread A (the ”child”). When B calls join on A, there are two possibilities. If A has already finished, then B returns immediately from join without blocking. If A has not finished, then B waits inside of join until A finishes; when A finishes, it resumes B.
Note that join does not have to be called on a thread. A thread should be able to finish successfully even if no other thread calls join on it.
A thread can not join to itself. (The initial implementation already checks for this case and invokes Lib.assert when it happens. Keep this Lib.assert call in your code.)
Join can be called on a thread at most once. If thread B calls join on A, then it is an error for B or any other thread C to call join on A again. Assert on this error.
Testing Implement tests that verify if a parent calls join on a child and the child is still exe- cuting, the parent waits; if a parent calls join on a child and the child has finished executing, the parent does not block; if a thread calls join on itself, Nachos asserts; if join is called more than once on a thread, Nachos asserts; one parent thread can join with multiple child threads in succession; independent pairs of parent/child threads can join with each other without interference.
(b) Implement condition variables directly, by using interrupt enable and disable to provide atom- icity. We have provided a sample implementation that uses semaphores; your job is to provide an equivalent implementation without directly using semaphores (you may of course still use locks). Once you are done, you will have two alternative implementations that provide the ex- act same functionality. Your second implementation of condition variables must reside in class nachos.threads.Condition2.
A thread must have acquired the lock associated with the condition variable when it invokes methods on the CV. The underlying implementation of the Lock class already has code to assert in these cases, but we recommend writing a test program that causes such an error so that you can see what happens.
Testing Implement tests that verify that sleep blocks the calling thread; wake wakes up at most one thread, even if multiple threads are waiting; wakeAll wakes up all waiting threads; if a thread calls any of the synchronization methods without holding the lock, Nachos asserts; wake and wakeAll with no waiting threads have no effect, yet future threads that sleep will still block (i.e., the wake/wakeAll is ”lost”, which is in contrast to the semantics of semaphores).
(c) Complete the implementation of the Alarm class (extra points). A thread calls waitUntil(long x) to suspend its execution until wall-clock time has advanced to at least now + x. This method is useful for threads that operate in real time, such as blinking the cursor once per second. There is no requirement that threads start running immediately after waking up; just put them on the ready queue in the timer interrupt handler after they have waited for at least the right amount of time. Do not fork any additional threads to implement waitUntil; you need only modify waitUntil and the timer interrupt handler. waitUntil itself, though, is not limited to being called by one thread; any number of threads may call it and be suspended at any one time. If the wait parameter x is 0 or negative, return without waiting (do not assert).
Note that only one instance of Alarm may exist at a time (due to a limitation of Nachos), and Nachos already creates one global alarm that is referenced via ThreadedKernel.alarm.
Testing: Implement tests that verify that a thread waits (approximately) for its requested dura- tion; if the wait parameter is 0 or negative, the thread does not wait; multiple threads waiting on the alarm are woken up at the proper times, and in the proper order.
To help you get jump started on testing, a couple of example test programs across the various problems are given in the attachment.
Part B. Concurrent HashTable in Nachos (70 basic points, 30 extra points)
you have studied several hash table implementations. However all of them shared one thing in common – they could only be accessed sequentially by a single thread, which we can thread-unsafe. In this assignment you will implement a thread safe hash table – a concurrent hash table that can be accessed and modified by several threads simultaneously.
Quick review A hash table is a data structure designed to support three operations in time O(1) : lookup, insertion and deletion. The hash table itself is an array of m slots (also called buckets), and uses a hash function f() to map elements into those slots. When an element e is inserted into the hash table it is placed in the slot f(e). To look up whether an element e is in the table, we only need to compute f(e) and check whether the element is in that slot. The problem with this very basic scheme is collisions. A collision happens when two elements, e1 and e2 are hashed into the same slot. There are several ways to resolve this problem, and in this exercise we will use ”chaining”. In this technique every slot in the hash table contains a linked list of elements. For example, consider a hash table with 10 slots that contains integers, and the hash function f(e) = e%10 . Suppose the user inserts 5 elements with the values 1, 15, 31, 55 and 57. The resulting hash table will look like figure 1:
Figure 1: A simple hash table example using mod as hash function
Why Concurrent Hash Tables Matter? A hash table is a venerable and well-understood data structure often used for high-performance applications because of its excellent average lookup time. Many operating systems, such as Linux, relies on hash tables to manage pages, buffers, inodes, and other kernel-level data objects.
Most operating systems support multi-threads and multi-core programming. The performance of the operating system depends on the efficiency and scalability of these data structures, and thus the support to concurrent data structures are on the critical path. If hash tables in kernel can only be accessed by one thread at a time, then it becomes the bottleneck of the performance despite the efficiency of multi-threads.
In this assignment, you will implement a fine-granularity concurrent hash table in the Nachos OS module that can be used by many threads simultaneously, as shown in the figure 2.
Figure 2: Concurrent hash table in the Nachos
Fine-granularity vs Coarse-granularity Synchronization
(a) The simplest way to synchronize concurrent accesses to a hash table is to implement a coarse- granularity locking on the entire hash table, as shown in figure 3. However, this approach blocks out all threads that are not sharing the same key as the thread that holds the lock and thus loses the advantages of concurrent execution. Coarse-granularity is not recommended or required.
(b) You are required to implement the concurrent hash table in a fine-granularity manner. You have two options:
- Lock the index/chain, as shown in figure 4
ii. Lock the key on the chain, as shown in figure 5
Feel free to choose one of them to implement your concurrent hash table.
(c) You are required to implement the concurrent hash table in two versions: using Nachos semaphores and using Nachos condition variables. However, as an incremental development strategy, you can start with Java thread library while in the middle of developing the nachos thread system.
(d) You are required to develop comprehensive and thorough tests on your concurrent hash table.
Figure 3: Locking entire hash table
Figure 4: Locking the index or the chain to implement fine-granularity synchronization
Figure 5: Locking the key on the chain to implement fine-granularity synchronization
Interface Your hash table must support the following operations:
- (a) The hash table constructor: HashTable(int n buckets)
i. n buckets – the initial number of buckets in the hash table. - (b) Single element insertion: void insert(int k, int val). This function inserts a single element with the key k, and value val into the hash table. Note that if an element already exists in the hash table, it should not be added again, and an exception should be thrown out.
- (c) Single element removal: void remove(int k). This function removes a single element with the key k from the hash table. If the element has not been removed because it was not in the table, an exception should be thrown out.
- (d) Single element query: int get(int k). This function checks whether an element with the key k is in the hash table, and returns the value associated with the key. If the element to be searched is not in the table, an exceptions should be thrown out.
- (e) Bucket size query: int getbucketsize(). This function returns the number of elements currently in bucket number bucket.
- (f) Batch operations: void batch(int n ops, ThreadOperation [] ops). Your hash table must support requests to perform several different operations concurrently. The additional parameters passed to the batch function are:
- n ops – the number of operations in the batch
- ops – an array of ThreadOperation instances, where each instance represents a single element
operation. The definition of this class could be like below:
enum OperationType{
INSERT,
REMOVE,
QUERY }
Class ThreadOperation
{
int k;
OperationType op;
int result;
};
All operations in the batch must run concurrently. This means every operation must be executed by a different thread. All of those threads must be created as early as possible- you should not wait for some operations in a batch to complete before starting others. The return value of each operation should be written to the result field in the corresponding ThreadOperation instance. The batch function returns only after all operation in the batch complete.
Dynamic splitting (10 extra points) In practice, the worstcase performance of a hashtable op- eration depends on the maximum number of element in any one bucket. If some buckets get too large, we would like to increase the number of buckets, thus getting a better distribution of the elements and decreasing the number of elements in most buckets.
In this homework we encourage students to use a very simple scheme to handle this issue. Students who implement dynamic splitting will receive 10 extra points. Your hash table will have a predefined maximum number of elements per bucket, max. Whenever an insert operation causes any bucket to have more than max elements, a split operation is triggered: the number of buckets is doubled, and elements are redistributed into the new buckets.
Implement replace using Hand-over-hand locking (20 extra points) Many concurrent HashTable also allows the thread to update the value of a pre-existing key in one single operation, and this is called replace. This extra-point tasks requires you to implement replace efficiently using hand-over- hand locking while the search runs through the chain list. You must also update the batch as well as test code to include replace as a single element operation.