CS计算机代考程序代写 Java flex Context Free Languages Slide 1

Slide 1

Context-Free Grammars
Chapter 11

Languages and Machines

Context-free Grammars, Languages, and PDAs
Context-free Language
Context-free Grammar
PDA

L
Accepts

Context-Free Grammars

A context-free grammar G is a quadruple,
(V, , R, S), where:

V is the rule alphabet
, a subset of V, the set of terminals
V – , the set of nonterminals
R, a finite subset of (V – ) V*, the set of rules
S, an element of V – , the start symbol

Example:

({S, a, b}, {a, b}, {S  a S b, S  }, S)

Derivations

x G y iff x =  A 

and A    R

y =   

w0 G w1 G w2 G . . . G wn is a derivation in G.

Let G* be the reflexive, transitive closure of G.
Then the language generated by G, denoted L(G), is:
L(G) = {w  * : S G* w}.

An Example Derivation

Example:

Let G = ({S, a, b}, {a, b}, {S  a S b, S  }, S)

S  a S b  aa S bb  aaa S bbb  aaabbb

S * aaabbb

Definition of a Context-Free Grammar

A language L is context-free iff it is generated by some
context-free grammar G.

AnBn
S  
S  aSb

Balanced Parentheses
S  
S  SS
S  (S)

Recursive Grammar Rules
A rule is recursive iff it is X  w1Yw2, where:

Y * w3Xw4 for some w1, w2, w3, and w4 in V*.

A grammar is recursive iff it contains at least one recursive rule.

Examples: S  (S) S  (T)

T  (S)

Self-Embedding Grammar Rules
A rule in a grammar G is self-embedding iff it is :

X  w1Yw2, where Y * w3Xw4 and
both w1w3 and w4w2 are in +.

A grammar is self-embedding iff it contains at least one self-embedding rule.

Example: S  aSa is self-embedding

S  aS is recursive but not self-
embedding
S  aT
T  Sa is self-embedding

Where Context-Free Grammars Get Their Power
If a grammar G is not self-embedding then L(G) is regular.

If a language L has the property that every grammar that defines it is self-embedding, then L is not regular.

PalEven = {wwR : w  {a, b}*}
G = {{S, a, b}, {a, b}, R, S}, where:

R = { S  aSa
S  bSb
S   }.

Equal Numbers of a’s and b’s
Let L = {w  {a, b}*: #a(w) = #b(w)}.
G = {{S, a, b}, {a, b}, R, S}, where:

R = { S  aSb
S  bSa
S  SS
S   }.

Arithmetic Expressions
G = (V, , R, E), where
V = {+, *, (, ), id, E},
 = {+, *, (, ), id},
R = {
E  E + E
E  E  E
E  (E)
E  id }

Backus-Naur Form (BNF)
The symbol | should be read as “or”.

Example: S  aSb | bSa | SS | 

‘ ‘ replaced by “: : -” (read as “can be”)

Allow a nonterminal symbol to be any sequence of characters surrounded by angle brackets.

Examples of nonterminals:


A notation for writing practical context-free grammars

BNF for a Java Fragment
::= {} | {}
::= |
::= | while () |
if () |
do while (); |
; |
return | return |
;

HTML

  • Item 1, which will include a sublist
    • First item in sublist
    • Second item in sublist
  • Item 2

A grammar:

HTMLtext  Element HTMLtext | 

Element  UL | LI | … (and other kinds of elements that
are allowed in the body of an HTML document)

UL 

    HTMLtext

LI 

  • HTMLtext
  • English
    S  NP VP
    NP  the Nominal | a Nominal | Nominal |
    ProperNoun | NP PP
    Nominal  N | Adjs N
    N  cat | dogs | bear | girl | chocolate | rifle
    ProperNoun  Chris | Fluffy
    Adjs  Adj Adjs | Adj
    Adj  young | older | smart
    VP  V | V NP | VP PP
    V  like | likes | thinks | shots | smells
    PP  Prep NP
    Prep  with

    Designing Context-Free Grammars
    ● Generate related regions together.

    AnBn

    ● Generate concatenated regions:

    A  BC

    ● Generate outside in:

    A  aAb

    Concatenating Independent Languages
    Let L = {anbncm : n, m  0}.

    The cm portion of any string in L is completely independent of the anbn portion, so we should generate the two portions separately and concatenate them together.
    G = ({S, N, C, a, b, c}, {a, b, c}, R, S} where:
    R = { S  NC
    N  aNb
    N  
    C  cC
    C   }.

    L = { : k  0 and i (ni  0)}

    Examples of strings in L: , abab, aabbaaabbbabab

    Note that L = {anbn : n  0}*.
    G = ({S, M, a, b}, {a, b}, R, S} where:

    R = { S  MS
    S  
    M  aMb
    M  }.

    Unequal a’s and b’s
    L = {anbm : n  m}

    G = (V, , R, S), where
    V = {a, b, S, A, B},
     = {a, b},
    R =
    S  A /* more a’s than b’s
    S  B /* more b’s than a’s
    A  a /* at least one extra a generated
    A  aA
    A  aAb
    B  b /* at least one extra b generated
    B  Bb
    B  aBb

    Context free languages:

    We care about structure.

    E

    E + E

    id E * E

    3 id id

    5 7
    Structure

    To capture structure, we must capture the path we took through the grammar. Derivations do that.

    Example:

    S  
    S  SS
    S  (S)

    1 2 3 4 5 6
    S  SS  (S)S  ((S))S  (())S  (())(S)  (())()
    S  SS  (S)S  ((S))S  ((S))(S)  (())(S)  (())()
    1 2 3 5 4 6

    But the order of rule application doesn’t matter.
    Derivations

    Parse trees capture essential structure:

    1 2 3 4 5 6
    S  SS  (S)S  ((S))S  (())S  (())(S)  (())()
    S  SS  (S)S  ((S))S  ((S))(S)  (())(S)  (())()
    1 2 3 5 4 6

    S

    S S

    ( S ) ( S )

    ( S ) 


    Derivations

    Parse Trees
    A parse tree, derived by a grammar G = (V, , R, S), is a rooted, ordered tree in which:

    ● Every leaf node is labeled with an element of   {},

    ● The root node is labeled S,

    ● Every other node is labeled with some element of:
    V – , and

    ● If m is a nonleaf node labeled X and the children of m
    are labeled x1, x2, …, xn, then R contains the rule
    X  x1x2…xn.

    Ambiguity
    A grammar is ambiguous iff there is at least one string in L(G) for which G produces more than one parse tree.

    For most applications of context-free grammars, this is a problem.

    An Arithmetic Expression Grammar
    E  E + E id + id * id
    E  E  E
    E  (E)
    E  id

    Even a Very Simple Grammar Can be Highly Ambiguous
    S  (())()
    S  SS
    S  (S)

    Inherent Ambiguity
    Some languages have the property that every grammar for them is ambiguous. We call such languages inherently ambiguous.

    Example:

    L = {anbncm: n, m  0}  {anbmcm: n, m  0}.

    It can be proved that L is inherently ambiguous.

    We can generate anbncm and anbmcm unambiguously but anbncn will be generated in two ways.

    Inherent Ambiguity
    Both of the following problems are undecidable:

    Given a context-free grammar G, is G ambiguous?

    Given a context-free language L, is L inherently

    ambiguous?

    But We Can Often Reduce Ambiguity
    We can get rid of:

    ●  rules like S  ,

    ● rules with symmetric right-hand sides, e.g.,

    S  SS
    E  E + E

    ● rule sets that lead to ambiguous attachment of
    optional postfixes.

    A Highly Ambiguous Grammar
    S 
    S  SS
    S  (S)

    Resolving the Ambiguity with a Different Grammar
    The biggest problem is the  rule.

    A different grammar for the language of balanced parentheses:

    S*  
    S*  S
    S  SS
    S  (S)
    S  ()

    Nullable Variables
    A variable X is nullable iff either:
    (1) there is a rule X  , or
    (2) there is a rule X  PQR… and P, Q, R, …
    are all nullable.

    So compute N, the set of nullable variables, as follows:

    1. Set N to the set of variables that satisfy (1).
    2. Repeat until no change
    Add variables satisfying (2)

    A General Technique for Getting Rid of -Rules
    Definition: a rule is modifiable iff it is of the form:

    P  R, for some nullable R, P    

    removeEps(G: cfg) =
    1. Let G = G.
    2. Find the set N of nullable variables in G.
    3. For each modifiable rule P  R of G do
    Add the rule P  .
    4. Delete from G all rules of the form X  .
    5. Return G.

    L(G) = L(G) – {}

    An Example
    G = {{S, T, A, B, C, a, b, c}, {a, b, c}, R, S), R =
    { S  aTa
    T  ABC
    A  aA | C
    B  Bb | C
    C  c |  }
    Nullable variables = {A, B, C, T}

    G: add : remove:
    S  aa T  B C  
    T  BC T  C
    T  AC T  A
    T  AB A  a
    B  b

    What If   L?
    atmostoneEps(G: cfg) =
    1. G = removeEps(G).
    2. If SG is nullable then /* i. e.,   L(G)
    2.1 Create in G a new start symbol S*.
    2.2 Add to RG the two rules:
    S*  
    S*  SG.
    3. Return G.

    But There is Still Ambiguity
    S*   What about ()()() ?
    S*  S
    S  SS
    S  (S)
    S  ()

    Eliminating Symmetric Recursive Rules
    S*  
    S*  S
    S  SS
    S  (S)
    S  ()

    Replace S  SS with one of:

    S  SS1 /* force branching to the left
    S  S1S /* force branching to the right

    So we get:
    S*   S  SS1
    S*  S S  S1
    S1  (S)
    S1  ()

    Eliminating Symmetric Recursive Rules
    So we get:
    S*  
    S*  S
    S  SS1
    S  S1
    S1  (S)
    S1  ()
    S*

    S

    S S1

    S S1

    S1

    ( ) ( ) ( )

    Arithmetic Expressions
    E  E + E
    E  E  E
    E  (E)
    E  id
    E E

    E E E E

    E E E E

    id  id  id id  id  id
    Problem 1: Associativity

    Arithmetic Expressions
    E  E + E
    E  E  E
    E  (E)
    E  id
    E E

    E E E E

    E E E E

    id  id + id id  id + id
    Problem 2: Precedence

    Arithmetic Expressions – A Better Way
    E  E + T
    E  T
    T  T * F
    T  F
    F  (E)
    F  id

    Example:

    id + id * id

    *

    Dangling “else”
    The dangling else problem:

    ::= if then
    ::= if then else

    Consider:

    if cond1 then if cond2 then stmt1 else stmt2

    cond2

    if then
    if then else
    cond1
    stmt1
    stmt2
    cond2

    if then else
    if then
    cond1
    stmt2
    stmt1
    Dangling “else” ambiguity

    Dangling “else” solution
    ::=
    |
    ::= if then else
    |
    ::= if then
    | if then else

    cond2

    if then
    if then else
    cond1
    stmt1
    stmt2


    other_stmt
    other_stmt
    Dangling “else” solution

    k
    k
    n
    n
    n
    n
    n
    n
    b
    a
    b
    a
    b
    a

    2
    2
    1
    1