Syntax
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Regular Expression
Definition:
• A character
• Empty string (𝜀)
• Concatenation of two RE (e.g., (ab))
• Alternation of two RE, separated by “|” (e.g., (a|b)) • Closure (Kleene star) (e.g.(a*))
2
Precedence of RE
The order is (high to low) • Closure (*), then
• Concatenation, then
• Alternation
Analogy in arithmetic: • Exponentiation
• Multiplication
• Addition
ab|cd (ab)|(cd) a|bc*d (a)|(b(c*)d)
3
UNIX Extensions to RE
Extension [abcd]
. (wild cast) a?
a+
[a-z]
\[ \] \.
Core RE
a|b|c|d
a|b|c|…(all char. except new line) 𝜀 |a
a(a*)
a|b|c|…|z
[ ] .
4
Grammar
𝐴→𝜔
nonterminal Lang. definition in symbol meta language
How to write down a RE grammar?
5
Language of tokens (C)
Identifier: letters, digits and underscore ‘_’ only. The first character must be an underscore or a letter
numbers: digits, decimal point, suffix such as “l”, “u” operators: + – * / …
keywords : if, while, for, int, … punctuation:{ } [ ] ; …
Grammar
𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑒𝑟 → [ _a-zA-Z ][ _a-zA-Z0-9]∗ 𝑛𝑢𝑚𝑏𝑒𝑟 → (0|[1-9][0-9]∗)(\. [0-9]+)? (l?|u?) 𝑜𝑝𝑒𝑟𝑎𝑡𝑜𝑟 →+|-|∗|/|…
𝑘𝑒𝑦𝑤𝑜𝑟𝑑 →if|while|for|int|… 𝑝𝑢𝑛𝑐𝑡𝑢𝑎𝑡𝑖𝑜𝑛 →{|}|\[|\]|;|…
6
Is a RE grammar correct?
• Does it cover all legal strings?
• Does it allow any illegal string?
7
Examples
bit
4 bits
(non-empty) bits (non-empty) Even # of bits frag: Num bet. 0 and 255
0|1 (0|1)(0|1)(0|1)(0|1) (0|1)+ ((0|1)(0|1))+
[0-9] | [1-9][0-9] |1 [0-9][0-9] |2 [0-4][0-9]
| 25[0-5]
Use the RE above ?
IP Address
bits with equal 0’s and 1’s
8