COMP2521
Data Structures & Algorithms
Week 9.3
Tries
1
In this lecture
Why?
Storing a large set of strings naively can be costly, we need
a more efficient way
What?
Tries
Tries insert
Tries lookup
2
Tries
Tries are a data structure for representing strings that support
O(L) lookup and insertion (where L is the length of a string)
3
Tries Concenptualised
Depth of trie = length of longest word
4 . 1
Tries Concenptualised
N
o
te: N
o
t every w
o
rd
h
ere is in
clu
d
ed
in
th
e trie
4 . 2
Tries Structure
Each node in a trie:
Contains one part of a key (typically one character)
May have up to 26 children
May be tagged as a “finishing” node
But even “finishing” nodes may have children
May contain other data for application (e.g. word frequency)
A “finishing” node marks the end of one key
#define ALPHABET_SIZE 26
typedef struct Node *Trie;
typedef struct Node {
char onechar; // current char in key
Trie child[ALPHABET_SIZE];
bool finish; // last char in key?
Item data; // no Item if !finish
} Node;
typedef char *Key; // just lower-case letters
1
2
3
4
5
6
7
8
9
10
11
12
5
Tries Search
find(trie,key):
| Input trie, key
| Output pointer to element in trie if key found
| NULL otherwise
|
| node=trie
| for each char c in key do
| | if node.child[c] exists then
| | node=node.child[c] // move down one lev
| | else
| | return NULL
| | end if
| end for
| if node.finish then // “finishing” node reach
| return node
| else
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
6
Tries Insertion
Trie insert(trie, item, key):
if trie is empty then:
t = new trie node
if m = 0 then // end of key
t.finish = true
t.data = item
else:
first = key[0]
rest = key[1..m-1]
t.child[first] = insert(t.child[first], item, rest)
return t
1
2
3
4
5
6
7
8
9
10
11
7
Tries Complexity
Space complexity: O(n)
n = Sum of lengths of all strings
Insertion complexity: O(m)
m = length of the key string
Search complexity: O(m)
8
BST-like Tries
Above representation is space inefficient
Each node has 26 possible children
Even with very many keys, most child links are unused
And if we allowed all ascii chars in alphabet, 128 children
We could reduce branching factor by reducing “alphabet”
Break each 8-bit char into two 4-bit “nybbles”
Branching factor is 16, even for full ascii char set
But each branch is twice as long
9 . 1
BST-like Tries
9 . 2
Compressed Tries
Have internal nodes of degree ≥ 2; each node contains ≥ 1 char
Obtained by compressing non-branching chains of nodes
Compact representation of compressed trie to encode array S of strings:
Requires O(s) space (s = #strings in array S)
10 . 1
Compressed Tries
10 . 2
Feedback
11