Trees
Algorithms and Data Structure
Expression Trees
Dr. Fadi Mohsen
Postdoctoral Fellow
Information Systems, Bernoulli Institute
E: f.f.m.mohsen@rug.nl
‹#›
|
‹#›
|
Outline
Tries – review
Expression trees.
‹#›
|
Short Questionnaire (Anonymus)
Using your mobile phone or computer:
Navigate to socrative.com
Click the Login button (Top-right).
Choose Student Login
Enter “MOHSEN5509” as the room name then click Join.
Answer questions.
Keep your screen open.
‹#›
|
Types of Tries
For optimization we will now consider different types of tries:
Standard trie
Compressed trie
Compact trie
Suffix trie
‹#›
|
Standard Tries
An efficient data structure for looking up words in a dictionary.
A standard trie T for the collection W of words is a tree:
The root of T is empty, and every other node contains a letter.
The children of a node of T contain different letters and are in alphabetical order.
The branches in T from the root correspond exactly with the words in W.
How much memory is required for a standard trie?
O(n), n is the total length of all words in W.
‹#›
|
Let n be the sum of the lengths of the words in W.
There are at most n + 1 nodes in T, upper limit, no overlap.
There are n edges.
Every node contains one letter, fixed amount of memory.
5
Search In Trie
algorithm SearchInTrie(T,w)
input: standard trie T, word w
output: Yes if w occurs in T, otherwise No
k ← root of T
While w not empty do
x ← first letter of w
w ← w minus x
if k has no child containing x then
return No
k ← child of k that contains x
if k is a leaf then
return Yes
else
return No
During the search, we successively go from a node to the next node to match the next letter in w.
When there is no such node, the search stops with a negative result.
When we reach the end of w, we check whether we are in a leaf of T. If so, we have found that w occurs in T. Otherwise the search ends negatively
‹#›
|
Search In Trie
Problem: What if a word in is an initial segment of another word?
W = {we,well}
Solutions:
Add special character (e.g. $) for word ends.
Mark each node that ends a word.
$
e
$
l
w
l
‹#›
|
Compressed Trie
Is obtained from a standard trie by compressing the non-branching parts of a branch in a single node.
The root is empty, and every other node contains a nonempty string.
The children of a node of T contain different letters and are ordered in alphabetically on the initial letter of the string;
There are no nodes with branching degree 1 (if W contains at least two words).
The branches from the root correspond exactly with the words in W.
‹#›
|
8
b
i
e
a
l
e
u
s
t
l
o
l
c
l
l
r
d
y
p
k
l
b
s
e
id
u
ell
to
ll
ll
y
ck
p
ar
Standard
Compressed
Reducing the number of nodes from O(n) to O(m); however, memory use remain the same ( m is the number of words, and n is total length of all words).
W = {bear, bell, bid, bull, buy, sell, stock, stop}
‹#›
|
Compact Trie
Obtained from the compressed trie by replacing the strings in the nodes by their coordinates.
Every node except the root contains two numbers referring to a string;
The children of a node are ordered alphabetically on initial letter.
There are no nodes with branching degree 1.
The branches from the root corresponds exactly with the words in W.
Reduced the size of every node to a fixed value.
Total memory use of a compact trie is O(m) with m the number of words.
‹#›
|
We are trying to fix the memory consumption thing.
10
b
s
e
id
u
ell
to
ll
ll
y
ck
ar
A = {b,e,a,r,b,e,l,l,b,i,d,b,u,l,l,b,u,y,s,e,l,l,s,t,o,c,k,s,t,o,p}
p
0
b
1
e
2
a
3
r
4
b
5
e
6
l
7
l
8
b
9
i
10
d
11
b
12
u
13
l
14
l
15
b
16
u
17
y
18
s
19
e
20
l
21
l
22
s
23
t
24
o
25
c
26
k
27
s
28
t
29
o
30
p
0 – 0
18 – 18
1 – 1
9 – 10
12 – 12
19 – 21
23 – 24
17 – 17
25 – 26
30 – 30
13 – 14
6 – 7
2 – 3
0
a
1
b
2
c
3
d
4
e
5
f
6
g
7
h
8
i
9
j
10
k
11
l
12
m
13
n
14
o
15
p
16
q
17
r
18
s
19
t
20
u
21
v
22
w
23
x
24
y
25
z
Would this work?
‹#›
|
https://www.geeksforgeeks.org/pattern-searching-using-suffix-tree/
11
Searching For A Pattern in A Text.
A trie enables fast search for a word in a set, e.g. a dictionary.
Does this help us when searching for a pattern p in a long text T?
First idea: make a trie for the set of all substrings of T
Example: for “hello”, use {h, e, l, l, o, he, el, ll, lo, hel, ell, llo, hell, ello, hello}
Problem: this set has n + (n − 1) + · · · + 2 + 1 = (n2 + n)/2 elements.
Second idea: every substring is the prefix of a suffix!
Example: for “hello”, use {o, lo, llo, ello, hello}
‹#›
|
Suffix Tries
A Suffix Tree for a given text is a compressed trie for all suffixes of the given text.
After building suffix tree of text (preprocessing), searching any pattern of size m should only take O(m) time.
Length of pattern is generally much smaller than text.
Works best for fixed or less frequently changings text (e.g. dictionary).
It is really costly to build the suffix tree.
Every substring of a string is the prefix of a suffix.
‹#›
|
With compact tries, searching for a pattern p of size k in W with size O(m) is possible in O(k) time.
13
Suffix Tries
abracadabras
bracadabras
racadabras
acadabras
cadabras
adabras
dabras
abras
bras
ras
as
s
a
bra
cadabras
dabras
s
cadabras
s
s
bra
cadabras
cadabras
dabras
s
ra
s
cadabras
Generate all suffixes of given text.
Consider all suffixes as individual words and build a compressed trie.
Sorted:{abracadabras, abras, acadabras, adabras, as, bracadabras, bras, cadabras, dabras,racadabras,ras, s}
‹#›
|
14
Application: Expression Trees
Binary trees: make structure of (arithmetical) expressions explicit:
Non-leaf nodes contain operators.
Leaf nodes contain number or variables.
Structure of the tree indicates the order – no parenthesis needed.
+
÷
6
*
3
̶
z
2
y
An expression tree of the infix expression (((y) – (6 x 3)) / z) + 2
Prefix version: + / – y * 6 3 z 2
‹#›
|
Prefix Expressions
Prefix: operator first, then arguments. Also called Łukasiewicz or Polish notation.
Example: the prefix expression “+ 2 2” means “2 + 2” in infix notation.
And / + * 1 2 * 3 4 + * 5 6 * 7 8 means (1*2 + 3*4) / (5*6 + 7*8)
Observe:
prefix notation never needs parentheses!
prefix notation corresponds to preOrder traversal.
‹#›
|
Łukasiewicz
16
Prefix Expressions
A prefix expression is a number, or an identifier, or an operator followed by two prefix expressions.
‹prefixExp› ::=
‹number›
| ‹identifier›
| ‘+’ ‹prefixExp› ‹prefixExp›
| ‘-’ ‹prefixExp› ‹prefixExp›
| ‘*’ ‹prefixExp› ‹prefixExp›
| ‘/’ ‹prefixExp› ‹prefixExp›
‹#›
|
Prefix Expressions
In prefix expressions, placing the operator before its operands:
Makes it easier to interpret; thus parenthesis are not needed.
Infix: 3 + (t*7)
Prefix: +3 *t7
Prefix: +- z 33 z
Infix: (z-33)+3
‹#›
|
Prefix Expressions
What is the value of / + * 4 9 / 6 2 + 7 * 2 – 8 5 ?
Using your mobile phone or computer:
Navigate to socrative.com
Click the Login button (Top-right).
Choose Student Login
Enter “MOHSEN5509” as the room name then click Join.
Answer the question.
Keep your screen open.
((4*9)+(6/2))/(7+((2*(8-5))
‹#›
|
Source Code
Scan the input and transform it to a token list.
Build an expression tree from the token list.
Input / output sample.
The code is available on Themis, run it!
‹#›
|
https://runestone.academy/runestone/books/published/pythonds/BasicDS/InfixPrefixandPostfixExpressions.html
20
Source Code
The program that we are going to cover next does the following:
Scan a string and produce a token list.
Recognize a prefix expression and build an expression tree.
Evaluate a numerical expression tree.
Example:
“+ * 3 4 5”
Scan
+
*
3
4
5
Recognize
Eval
17
+
4
*
5
3
?
(Assignment 4: do the same for infix formulas)
‹#›
|
Recall: token lists
‹#›
|
Expressions Trees: Type Definition
Similar to the type TokenList, but with 2 pointers to nodes instead of 1.
‹#›
|
Expression Trees: Creating a NEW Node
‹#›
|
Expression Trees: Auxiliary Functions (1)
Reuse valueNumber from evalExp.c.
Now we define valueIdentifier and valueOperator.
‹#›
|
Expression Trees: Auxiliary Functions (2)
‹#›
|
Expression Trees: Freeing Memory
Compare with freeTokenList: strings in the identifier nodes are not freed. Why?
Because newExpTreeNode did not allocate memory to copy identifier strings.
Instead, the line new->t = t only points to already existing strings.
Hence, t.identifier will be freed when the token list is freed.
‹#›
|
Expression Trees: Recall the Grammar
‹#›
|
Building an Expression Tree (1)
‹#›
|
Building an Expression Tree (2)
‹#›
|
Building an Expression Tree (2) – Preventing Memory Leak.
‹#›
|
Expression Trees: Printing
‹#›
|
Expression Trees: No Identifiers Present?
‹#›
|
Expression Trees Evaluation
‹#›
|
Dialog with the User
‹#›
|
Example of a Dialog
‹#›
|
Summary
In today’s class, we covered:
Reviewed Tries.
Expression Trees.
Reading: 2.5 (Algorithms and Data Structures in C, Gerard R. Renardel de Lavalette).
‹#›
|
Coming Up
Assignment 3 (both parts and report) until tomorrow.
Tutorials.
Assignment 4
Next week: graphs
‹#›
|
Short Questionnaire (Anonymus)
Using your mobile phone or computer:
Navigate to socrative.com
Click the Login button (Top-right).
Choose Student Login
Enter “MOHSEN5509” as the room name then click Join.
Answer only one question.
‹#›
|