程序代写代做代考 data structure scheme Tries (or Prefix Trees)

Tries (or Prefix Trees)




We have focused on binary trees, where each node has 0, 1, or 2 children.
In principle, you can operate with trees of various branching factors.
As an example, we consider tries, also called prefix trees, a data structure often used to maintain dictionary entries of fixed length.
Trees are another data structure that can implement the SET datatype. (It can handle insert and ismember?.)

Tries: examples, an informal definition





In a trie over the alphabet A, each node may have up to |A| children.
Words are “stored” at the end of the path they index. Thus, each leaf corresponds to a stored element.
A trie over the English alphabet containing A, to, tea, ted, ten, and inn.
Idea: Suppose set items are words over an alphabet.
English words are over the alphabet {a, b, c, …, z, A, B, …, Z}
Positive integers are words over the alphabet {0, 1, 2, …,9}

The cost of trie operations



Roughly, the time taken to search a trie (answer a ismember? query) or insert grows linearly with the length of the key.
This can be much worse than a binary search tree!
However, for large sets of data of roughly the same length (e.g., an English dictionary), this can be a good solution. Note that no “balancing” is required (cf. our discussion of search trees).

Implementing the Trie
Imagine a trie for storing large numbers. One natural way to implement a node:
(trie0 trie1 … trie9)



Each node must store a (sub)trie for each numeral {0, 1, …, 9}. Note that tries are often very sparse: a given node may often have only a few nonempty subtries. For this reason, we may choose the following implementation:



 ((0 . trie0) (1 . trie1) … (9 . trie9))
where the subtries are left out if they are empty.


An example with this implementation
Then, the top node:
((3 . t3) (4 . t4) (7 . t7))

347
347
Its leftmost child:
((6 . t6) (9 . t9))
69
1
36

A leaf:
()
36 39 41
7 1
3671

367 413
416

Abstract data types
Computer Scientists frequently organize their thinking about data types with the idea of an Abstract Data Type.
An abstract data type is a high-level description of the functionality provided by a data type, without any reference to exactly how it is implemented.
This “abstraction layer” hides implementation details: any implementation of an ADT can be “plugged in” to a computing infrastructure that requires it.
You’ve seen an example of this in our discussion of the SET datatype.



The SET ADT
Recall our discussion of the SET ADT.
A SET datatype must offer two functions: insert(x, S) and ismember(x,
S). (It also provides a distinguished object called emptyset).
To fully describe the ADT, we describe the functions it provides and, more importantly, the semantics that these functions offer.
For SET, this is easy: ismember(x,S) returns TRUE if insert(x, S) has ever been called, and FALSE otherwise.





Note that we have defined the behavior of these operations without specifying the implementation.

Implementations of SET
We’ve seen three implementations of SET:
Lists
Binary search trees (though this required that the set have a linear order)
Tries (though this required that the set be sets of words over a fixed alphabet)


How do these implementations differ? In terms of the data structure they maintain and, additionally, their computational requirements.


A richer SET ADT
We’ve been discussing a very basic notion of the SET ADT. You could ask for much more:
Union(S1, S2) Intersection(S1, S2) Delete(x, S1)






Producing an efficient implementation of such a rich set ADT is a deep mathematical issue: it was a longstanding research problem with fairly sweeping successes in the late 1990s.

The STACK ADT
The STACK is an ADT that intuitively represents a “stack” of objects, from which you can push on new objects, and pop off recent objects.
The STACK ADT requires the following family of operations:
push(x,S): “pushes” an element x onto the top of a stack S, returning the new stack
top(S): returns the top element of a stack.
pop(S): “pops” off the top element of a stack, returning the
resulting stack.
empty?(S): returns TRUE if a stack is empty, FALSE elsewhere.





The STACK intuition
These functions implement the natural behavior of a “stack” of objects.

Implementing a STACK in SCHEME

Stacks can (of course) be naturally implemented in terms of lists:
(define (push x S) (cons x S))
(define (top S) (car S))
(define (pop S) (cdr S))
(define (empty? S) (if (null? S) #t #f))
This implementation seems hard to beat: any of these operations requires a fixed number of procedure calls.

The QUEUE ADT
The QUEUE ADT captured the natural behavior of a queue:
Elements can be added to the end of the queue (enqueue). Elements can be removed from the front of the queue (dequeue).


The QUEUE ADT functionality
The QUEUE ADT must provide the following functionality:
enqueue(x, Q): place x at the end of the queue Q. dequeue(Q): remove the front element from the queue Q.
front(Q): return the front element of the queue Q (without removing it).
empty?(Q): returns TRUE if the queue Q is empty, FALSE otherwise.




Implementing the QUEUE ADT in SCHEME

We can naturally implement the queue ADT as a list in scheme. There’s a problem: placing an element at the back is expensive
Can you think of a better implementation?
(define (enqueue x Q)
(if (null? Q)
(list x)
(cons (car Q)
(enqueue x (cdr Q)))))
(define (front Q) (car Q))
(define (empty? Q) (null? Q))
(define (dequeue Q) (cdr Q))

Efficiency of this implementation of the QUEUE ADT




Now dequeue(Q) is cheap, so is front(Q); each uses only a fixed number of calls.
enqueue is terribly expensive: we need to step through the entire list to place the element at the end.
Can you do better? Could both enqueue and dequeue be done with just a fixed number of calls, independent of the length of the queue?
With this implementation:
(first second … last)

The PRIORITY QUEUE ADT




A priority queue is an ADT that maintains a set of numbers, but only supports extraction of the minimum element as a means of deletion:
Empty? Determines whether the priority queue is empty. Insert(x) Inserts the number x into the priority queue.
Extract_Min Removes the smallest element from the priority queue and returns it.

Implementation of PRIORITY QUEUEs
As an unsorted list.
Extract_min is slow! Have to traverse entire list. Insertion is fast: Constant time.






As a sorted list.
Extract_min is fast! Constant time. Insertion is slow: at least N procedure calls, if there are N elements in the priority queue.
As a heap.
Extract_min takes ~log(N) time to traverse the heap. Insertion takes ~log(N) time to traverse the heap. Heaps are really good at this.

Designing more efficient implementations of such ADTs…
…motivates our next topic:
Destructive
assignment