Syntax and Grammars
1 / 21
Outline
What is a language?
2 / 21
What is a language?
Abstract syntax and grammars
Abstract syntax vs. concrete syntax Encoding grammars as Haskell data types
What is a language?
Language: a system of communication using “words” in a structured way
Natural language
• used for arbitrary communication • complex, nuanced, and imprecise
Programming language
• used to describe aspects of computation
i.e. systematic transformation of representation • programs have a precise structure and meaning
We use a broad interpretation of “programming language”
English, Chinese, Hindi, Arabic, Spanish, . . .
Haskell, Java, C, Python, SQL, XML, HTML, CSS, . . .
What is a language?
3 / 21
Object vs. metalanguage
What is a language?
4 / 21
Important to distinguish two kinds of languages:
• Object language: the language we’re defining
• Metalanguage: the language we’re using to define the structure and meaning of the object language!
A single language can fill both roles at different times! (e.g. Haskell)
Syntax vs. semantics
What is a language?
5 / 21
Two main aspects of a language:
• syntax: the structure of its programs
• semantics: the meaning of its programs
Metalanguages for defining syntax: grammars, Haskell, . . .
Metalanguages for defining semantics: mathematics, inference rules, Haskell, . . .
Outline
What is a language?
Abstract syntax and grammars
Abstract syntax vs. concrete syntax Encoding grammars as Haskell data types
Abstract syntax and grammars
6 / 21
Programs are trees!
Abstract syntax tree (AST): captures the essential structure of a program • everything needed to determine its semantics
+ * if 2* ++ true+5
345678 23
2 + 3 * 4 (5 + 6) * (7 + 8) if true then (2+3) else 5
Abstract syntax and grammars
7 / 21
Grammars
Grammars are a metalanguage for describing syntax
The language we’re defining is called the object language
syntactic category nonterminal symbol
s∈Sentence ::= nvn | sands
n ∈ Noun ::= cats | v ∈ Verb ::= chase
terminal symbol
dogs | cuddle
ducks
production rules
|
Abstract syntax and grammars
8 / 21
Generating programs from grammars How to generate a program from a grammar
1. start with a nonterminal s
2. find production rules with s on the LHS
3. replace s by one possible case on the RHS
A program is in the language if and only if it can be generated by the grammar!
Animal behavior language
s∈Sentence ::= nvn | sands
s ⇒nvn
⇒ cats v n
⇒ cats v ducks
⇒ cats cuddle ducks
Abstract syntax and grammars
9 / 21
n ∈ Noun v ∈ Verb
::= cats | ::= chase
dogs | cuddle
ducks
|
Exercise
Animal behavior language
s∈Sentence ::= nvn | sands
n ∈ Noun ::= cats | dogs | ducks
v ∈ Verb ::= chase | cuddle
Is each “program” in the animal behavior language?
• cats chase dogs
• cats and dogs chase ducks
• dogs cuddle cats and ducks chase dogs
• dogs chase cats and cats chase ducks and ducks chase dogs
Abstract syntax and grammars
10 / 21
Abstract syntax trees
Grammar (BNF notation) Example ASTs
t ∈ Term ::=
| false
not
not false
true | not t
true
if false
| if t t t
Language generated by grammar: set of all ASTs
true
true
Term = {true,false} ∪ {
not
t
| t ∈ Term} ∪ {
if
t1 t2 t3
| t1,t2,t3 ∈ Term}
Abstract syntax and grammars
11 / 21
Exercise
Arithmetic expression language
i∈Int ::= 1|2|… e∈Expr ::= add e e | mulee
| nege |i
1. Draw two different ASTs for the expression: 2+3+4
2. Draw an AST for the expression: -5*(6+7)
3. What are the integer results of evaluating the following ASTs:
neg add
add neg 535
3
Abstract syntax and grammars
12 / 21
Outline
What is a language?
Abstract syntax and grammars
Abstract syntax vs. concrete syntax Encoding grammars as Haskell data types
Abstract syntax vs. concrete syntax
13 / 21
Abstract syntax vs. concrete syntax
Abstract syntax: captures the essential structure of programs • typically tree-structured
• what we use when defining the semantics
Concrete syntax: describes how programs are written down
• typically linear (e.g. as text in a file)
• what we use when we’re writing programs in the language
Abstract syntax vs. concrete syntax
14 / 21
Parsing
Parsing: transforms concrete syntax into abstract syntax
source code abstract
(concrete syntax) syntax tree
Typically several steps:
• lexical analysis: chunk character stream into tokens
• generate parse tree: parse token stream into intermediate “concrete syntax tree” • convert to AST: convert parse tree into AST
Not covered in this class . . . (CS 480)
Abstract syntax vs. concrete syntax
15 / 21
Parser
Pretty printing
Pretty printing: transforms abstract syntax into concrete syntax Inverse of parsing!
Abstract syntax vs. concrete syntax
16 / 21
abstract syntax tree
source code
(concrete syntax)
Pretty Printer
Abstract grammar vs. concrete grammar
Abstract grammar Concrete grammar
t∈Term ::= true
| false
t∈Term ::= |
true
false
not t iftthentelset (t)
|nott | |ifttt | |
Our focus is on abstract syntax
• we’re always writing trees, even if it looks like text
• use parentheses to disambiguate textual representation of ASTs but they are not part of the syntax
Abstract syntax vs. concrete syntax
17 / 21
Outline
What is a language?
Abstract syntax and grammars
Abstract syntax vs. concrete syntax Encoding grammars as Haskell data types
Encoding grammars as Haskell data types
18 / 21
Encoding abstract syntax in Haskell
Abstract grammar
b ∈ Bool ::= true | false t∈Term ::= not t
| ifttt |b
defines set
Abstract syntax trees
true true
if false
not
not false
linear encoding
true
Haskell data type definition Haskell values
data Term = Not Term
| If Term Term Term
• Lit True
• If (Lit True)
| Lit Bool
(Lit False)
Encoding grammars as Haskell data types
19 / 21
defines set
(Lit True)
• Not (Not (Lit False))
Translating grammars into Haskell data types
Strategy: grammar → Haskell
1. For each basic nonterminal, choose a built-in type, e.g. Int, Bool
2. For each other nonterminal, define a data type
3. For each production, define a data constructor
4. The nonterminals in the production determine the arguments to the constructor
Special rule for lists:
• ingrammars, s ::= t∗ isshorthandfor: s ::= ε | ts or s ::= ε | t,s • can translate any of these to a Haskell list:
data Term = …
type Sentence = [Term]
Encoding grammars as Haskell data types
20 / 21
Example: Annotated arithmetic expression language
Abstract syntax Haskell encoding
n ∈ Nat ::= (natural number) c ∈ Comm ::= (comment string)
type Comment = String
e ∈ Expr
::=
| | | |
neg e e @ c e + e e * e n
negation comment addition multiplication literal
Encoding grammars as Haskell data types
21 / 21
data Expr
= Neg Expr
| Annot Expr Comment
| Add Expr Expr
| Mul Expr Expr
| Lit Int