程序代写代做代考 algorithm Context-Free Grammar (CFG)

Context-Free Grammar (CFG)
A context-free grammar looks like this bunch of rules:
Main idea:
E→E+E E→M M→M×M M→A A→0 A→1
A → (E)
􏰀 E, M, A are non-terminal symbols aka variables. When you see them, you apply rules to expand.
One of them is designated as the start symbol. You always start from it. I will designate E as the start symbol.
􏰀 +, ×, 0, 1, (, ) are terminal symbols. They are the characters you want in your language.
1/14

Derivation (aka Generation)
Derivation is a finite sequence of applying the rules until all non-terminal symbols are gone. Often aim for a specific final string.
E → M
→ M × M
→ A × M
→ 1 × M
→ 1 × A
→ 1 × (E)
→ 1 × (E + E)
→ 1 × (M + E)
→ 1 × (A + E)
→ 1 × (0 + E)
→ 1 × (0 + M)
→ 1 × (0 + M × M) → 1 × (0 + A × M) → 1 × (0 + 1 × A) → 1 × (0 + 1 × 1)
Context-free grammars can support: matching parentheses, unlimited nesting.
2/14

Backus-Naur Form (BNF)
Backus-Naur Form is a computerized, practical notation for CFGs.
􏰀 Surround non-terminal symbols by <>; allow multi-letter names.
􏰀 Merge rules with the same LHS.
􏰀 (Some versions.) Surround terminal strings by single or
double quotes.
􏰀 Use ::= for →.
Our example grammar in BNF:
::= “+” | ::= “*” | ::= “0” | “1” | “(” “)”
3/14

Extended Backus-Naur Form (EBNF)
􏰀 {…} for 0 or more occurrences.
􏰀 […] for 0 or 1 occurrences.
􏰀 Some versions: No <> needed around non-terminal symbols.
Example: Lisp/Scheme S-expression1 grammar (basic):
In BNF:
::= | “(” “)” ::= |
In EBNF:
::=
| “(” { } “)”
So you need fewer artificial non-terminals and rules that merely mean “at least 0 of this”, “at least 1 of that”, etc.
1“symbolic expression”
4/14

Parse Tree aka Derivation Tree
A parse tree aka derivation tree presents a derivation with more structure (tree), less repetition.
E E+E
E+E M MMA AA0
00
This example generates 0 + 0 + 0.
5/14

Parse Trees: General Points
􏰀 Internal nodes are non-terminal symbols.
􏰀 Both operators and operands are terminal symbols at leaves.
􏰀 Whole string recorded, just scattered.
􏰀 Purpose: Present derivation, help visualize derivation and grammar.
6/14

Ambiguous Grammar
Two different trees generate the same 0 + 0 + 0: EE
E+E E+E E+EMME+E MMAAMM AA00AA 00 00
If this happens, the grammar is ambiguous. We try to design unambiguous grammars. (Bad news: CFG ambiguity is undecidable.)
7/14

Unambiguous Grammar Example
An unambiguous grammar that generates the same language as our ambiguous grammar example:
::= “+” | ::= “*” | ::= “0” | “1” | “(” “)”
Exercise: Findtheparsetreesfor0+0+0and0×0×0. Observe that you are forced only one answer, and it’s left-leaning.
(Bad news: Equivalence of two CFGs is also undecidable.)
8/14

Left Recursive vs Right Recursive
::= “+”
That is a left recursive rule. The recursion is at the beginning (left). ::= “+”
That is a right recursive rule. The recursion is at the end (right). Sometimes they convey intentions of left association or right
association. But not always.
They affect some parsing algorithms.
9/14

Recursive Descent Parsing
Recursive descent parsing is a simple strategy for writing a parser. For each non-terminal, a procedure based on RHS of rule:
􏰀 Non-terminal: procedure call, possibly mutual recursion. (Thus “recursive descent”, also “top-down”.)
􏰀 Left recursion needs special treatment to avoid infinite loops.
􏰀 Terminal: Consume input and check.
􏰀 Alternatives: lookahead to choose, or try and backtrack.
Some options for handling left recursion:
􏰀 Re-design grammar to not have left recursion.
􏰀 Many left recursive rules just express left-associating operators. Can be done without left recursive code.
10/14

Recursive Descent Parser Example
Example grammar suitable for recursive descent parsing:
::= “-” | ::= “0” | “1” | “(” “)”
Pseudo-code of recursive descent parser:
sub:
try (atom;
read; if not “-” then fail;
sub;)
if that failed: atom;
atom: read;
if “0” or “1”: success; if “(“: sub;
read; if not “)” then fail; else: fail;
11/14

Abstract Syntax Tree (AST) (vs Parse Tree)
Parse tree: Abstract syntax tree: E+
E+E M MMA AA0
00
E+E+
0 00
12/14

Abstract Syntax Tree: General Points
􏰀 Internal nodes are operators/constructs. Example construct: if-then-else.
􏰀 Non-terminal symbols gone or replaced by constructs.
􏰀 Many terminal symbols gone too if they play no role other than nice syntax (e.g., spaces, parentheses, punctuations).
Those bearing content, replaced by appropriate representations, not stay as characters.
E.g., Character ’+’ replaced by a data constructor, character ’0’ replaced by number 0.
􏰀 Purpose: Present essential structure and content, ready for interpreting, compiling, analyses.
􏰀 Parsers usually output abstract syntax trees when successful.
13/14

Lexical Analysis aka Tokenization
In principle: Grammar and parser can work on characters directly. But usually messy.
In practice: two stages:
1. Chop character stream into chunks and classify into lexemes aka tokens, discard spaces, e.g.,
” (xa * xb)**25 ”
􏰁→
[Open, Var “xa”, Op Mul, Var “xb”, Close,
Op Exp, NumLiteral 25]
Lexical Analysis, tokenization. Lexers/tokenizers need only regular expressions.
2. Parsing based on CFG, but terminal symbols are tokens, not characters.
14/14