Tries (or Prefix Trees)
✤
✤
✤
✤
We have focused on binary trees, where each node has 0, 1, or 2 children.
In principle, you can operate with trees of various branching factors.
As an example, we consider tries, also called prefix trees, a data structure often used to maintain dictionary entries of fixed length.
Trees are another data structure that can implement the SET datatype. (It can handle insert and ismember?.)
Tries: examples, an informal definition
✤
✤
✤
✤
✤
In a trie over the alphabet A, each node may have up to |A| children.
Words are “stored” at the end of the path they index. Thus, each leaf corresponds to a stored element.
A trie over the English alphabet containing A, to, tea, ted, ten, and inn.
Idea: Suppose set items are words over an alphabet.
English words are over the alphabet {a, b, c, …, z, A, B, …, Z}
Positive integers are words over the alphabet {0, 1, 2, …,9}
The cost of trie operations
✤
✤
✤
Roughly, the time taken to search a trie (answer a ismember? query) or insert grows linearly with the length of the key.
This can be much worse than a binary search tree!
However, for large sets of data of roughly the same length (e.g., an English dictionary), this can be a good solution. Note that no “balancing” is required (cf. our discussion of search trees).
Implementing the Trie
Imagine a trie for storing large numbers. One natural way to implement a node:
(trie0 trie1 … trie9)
✤
✤
✤
Each node must store a (sub)trie for each numeral {0, 1, …, 9}. Note that tries are often very sparse: a given node may often have only a few nonempty subtries. For this reason, we may choose the following implementation:
((0 . trie0) (1 . trie1) … (9 . trie9))
where the subtries are left out if they are empty.
An example with this implementation
Then, the top node:
((3 . t3) (4 . t4) (7 . t7))
✤
347
347
Its leftmost child:
((6 . t6) (9 . t9))
69
1
36
✤
A leaf:
()
36 39 41
7 1
3671
✤
367 413
416
Abstract data types
Computer Scientists frequently organize their thinking about data types with the idea of an Abstract Data Type.
An abstract data type is a high-level description of the functionality provided by a data type, without any reference to exactly how it is implemented.
This “abstraction layer” hides implementation details: any implementation of an ADT can be “plugged in” to a computing infrastructure that requires it.
You’ve seen an example of this in our discussion of the SET datatype.
✤
✤
✤
✤
The SET ADT
Recall our discussion of the SET ADT.
A SET datatype must offer two functions: insert(x, S) and ismember(x,
S). (It also provides a distinguished object called emptyset).
To fully describe the ADT, we describe the functions it provides and, more importantly, the semantics that these functions offer.
For SET, this is easy: ismember(x,S) returns TRUE if insert(x, S) has ever been called, and FALSE otherwise.
✤
✤
✤
✤
✤
Note that we have defined the behavior of these operations without specifying the implementation.
Implementations of SET
We’ve seen three implementations of SET:
Lists
Binary search trees (though this required that the set have a linear order)
Tries (though this required that the set be sets of words over a fixed alphabet)
✤
✤
How do these implementations differ? In terms of the data structure they maintain and, additionally, their computational requirements.
✤
✤
✤
A richer SET ADT
We’ve been discussing a very basic notion of the SET ADT. You could ask for much more:
Union(S1, S2) Intersection(S1, S2) Delete(x, S1)
✤
✤
✤
✤
✤
✤
Producing an efficient implementation of such a rich set ADT is a deep mathematical issue: it was a longstanding research problem with fairly sweeping successes in the late 1990s.
The STACK ADT
The STACK is an ADT that intuitively represents a “stack” of objects, from which you can push on new objects, and pop off recent objects.
The STACK ADT requires the following family of operations:
push(x,S): “pushes” an element x onto the top of a stack S, returning the new stack
top(S): returns the top element of a stack.
pop(S): “pops” off the top element of a stack, returning the
resulting stack.
empty?(S): returns TRUE if a stack is empty, FALSE elsewhere.
✤
✤
✤
✤
✤
✤
The STACK intuition
These functions implement the natural behavior of a “stack” of objects.
✤
Implementing a STACK in SCHEME
✤
Stacks can (of course) be naturally implemented in terms of lists:
(define (push x S) (cons x S))
(define (top S) (car S))
(define (pop S) (cdr S))
(define (empty? S) (if (null? S) #t #f))
This implementation seems hard to beat: any of these operations requires a fixed number of procedure calls.
The QUEUE ADT
The QUEUE ADT captured the natural behavior of a queue:
Elements can be added to the end of the queue (enqueue). Elements can be removed from the front of the queue (dequeue).
✤
✤
✤
The QUEUE ADT functionality
The QUEUE ADT must provide the following functionality:
enqueue(x, Q): place x at the end of the queue Q. dequeue(Q): remove the front element from the queue Q.
front(Q): return the front element of the queue Q (without removing it).
empty?(Q): returns TRUE if the queue Q is empty, FALSE otherwise.
✤
✤
✤
✤
✤
Implementing the QUEUE ADT in SCHEME
✤
We can naturally implement the queue ADT as a list in scheme. There’s a problem: placing an element at the back is expensive
Can you think of a better implementation?
(define (enqueue x Q)
(if (null? Q)
(list x)
(cons (car Q)
(enqueue x (cdr Q)))))
(define (front Q) (car Q))
(define (empty? Q) (null? Q))
(define (dequeue Q) (cdr Q))
✤
Efficiency of this implementation of the QUEUE ADT
✤
✤
✤
✤
Now dequeue(Q) is cheap, so is front(Q); each uses only a fixed number of calls.
enqueue is terribly expensive: we need to step through the entire list to place the element at the end.
Can you do better? Could both enqueue and dequeue be done with just a fixed number of calls, independent of the length of the queue?
With this implementation:
(first second … last)
The PRIORITY QUEUE ADT
✤
✤
✤
✤
A priority queue is an ADT that maintains a set of numbers, but only supports extraction of the minimum element as a means of deletion:
Empty? Determines whether the priority queue is empty. Insert(x) Inserts the number x into the priority queue.
Extract_Min Removes the smallest element from the priority queue and returns it.
Implementation of PRIORITY QUEUEs
As an unsorted list.
Extract_min is slow! Have to traverse entire list. Insertion is fast: Constant time.
✤
✤
✤
✤
✤
✤
As a sorted list.
Extract_min is fast! Constant time. Insertion is slow: at least N procedure calls, if there are N elements in the priority queue.
As a heap.
Extract_min takes ~log(N) time to traverse the heap. Insertion takes ~log(N) time to traverse the heap. Heaps are really good at this.
Designing more efficient implementations of such ADTs…
…motivates our next topic:
Destructive
assignment
✤