CS计算机代考程序代写 data structure chain algorithm COMP2521

COMP2521
Data Structures & Algorithms

Week 9.3
Tries

 

1

In this lecture

Why?

Storing a large set of strings naively can be costly, we need
a more efficient way

What?

Tries
Tries insert
Tries lookup

2

Tries

Tries are a data structure for representing strings that support
O(L) lookup and insertion (where L is the length of a string)

3

Tries Concenptualised

Depth of trie = length of longest word
4 . 1

Tries Concenptualised

N
o

te: N
o

t every w
o

rd
h

ere is in
clu

d
ed

in
th

e trie

4 . 2

Tries Structure

Each node in a trie:

Contains one part of a key (typically one character)
May have up to 26 children
May be tagged as a “finishing” node
But even “finishing” nodes may have children
May contain other data for application  (e.g. word frequency)

A “finishing” node marks the end of one key

#define ALPHABET_SIZE 26

typedef struct Node *Trie;

typedef struct Node {
char onechar; // current char in key
Trie child[ALPHABET_SIZE];
bool finish; // last char in key?
Item data; // no Item if !finish
} Node;

typedef char *Key; // just lower-case letters

1
2
3
4
5
6
7
8
9
10
11
12

5

Tries Search

find(trie,key):
| Input trie, key
| Output pointer to element in trie if key found
| NULL otherwise
|
| node=trie
| for each char c in key do
| | if node.child[c] exists then
| | node=node.child[c] // move down one lev
| | else
| | return NULL
| | end if
| end for
| if node.finish then // “finishing” node reach
| return node
| else

1
2
3
4
5
6
7
8
9

10
11
12
13
14
15
16

6

Tries Insertion

Trie insert(trie, item, key):
if trie is empty then:
t = new trie node
if m = 0 then // end of key
t.finish = true
t.data = item
else:
first = key[0]
rest = key[1..m-1]
t.child[first] = insert(t.child[first], item, rest)
return t

1
2
3
4
5
6
7
8
9
10
11

7

Tries Complexity

Space complexity: O(n)
n = Sum of lengths of all strings

Insertion complexity: O(m)
m = length of the key string

Search complexity: O(m)

8

BST-like Tries

Above representation is space inefficient

Each node has 26 possible children
Even with very many keys, most child links are unused

And if we allowed all ascii chars in alphabet, 128 children
 

We could reduce branching factor by reducing “alphabet”

Break each 8-bit char into two 4-bit “nybbles”
Branching factor is 16, even for full ascii char set
But each branch is twice as long

9 . 1

BST-like Tries

9 . 2

Compressed Tries
   

Have internal nodes of degree ≥ 2;   each node contains ≥ 1 char
Obtained by compressing non-branching chains of nodes
Compact representation of compressed trie to encode array S  of strings:
Requires O(s) space  (s = #strings in array S)

10 . 1

Compressed Tries

10 . 2

Feedback

11