程序代写代做代考 computer architecture compiler algorithm Compilers and computer architecture From strings to ASTs (2): context free grammars

Compilers and computer architecture From strings to ASTs (2): context free grammars
Martin Berger
October 2015

Recall the function of compilers

Recall we are discussing parsing
Source program
Lexical analysis
Intermediate code generation
Optimisation
Syntax analysis
Semantic analysis, e.g. type checking
Code generation
Translated program

Introduction

Introduction
Remember, we want to take a program given as a string and:
􏹩 Checkifit’ssyntacticallycorrect,e.g.iseveryopened bracket later closed?
􏹩 ProduceanASTforthecodegenerator.

Introduction
Remember, we want to take a program given as a string and:
􏹩 Checkifit’ssyntacticallycorrect,e.g.iseveryopened bracket later closed?
􏹩 ProduceanASTforthecodegenerator.
We split that task into two phases, lexing and parsing. Lexing throws away some information (e.g. how many white-spaces) and prepares a token-list, which is used by the parser. The token-list simplifies the parser, because some detail is not important for syntactic correctness:
ifx <2+3thenP elseQ is syntactically correct exactly when ify <111+222thenP elseQ Introduction The token-list simplifies the parser, because some detail is not important for syntactic correctness: ifx <2+3thenP elseQ is syntactically correct exactly when ify <111+222thenP elseQ So from the point of view of the next stage (parsing), all we need to know is that the input is T_if T_var T_less T_int T_plus T_int T_then ... Of course we cannot throw away the names of variables etc completely, as the later stages (type-checking and code generation) need them. They are just irrelevant for syntax checking. We keep them and our token-lists are like this T_if T_var ( "x" ) T_less T_int ( 2 ) T_plus ... Two tasks of syntax analysis Two tasks of syntax analysis As with the lexical phase, we have to deal with two distinct tasks. Two tasks of syntax analysis As with the lexical phase, we have to deal with two distinct tasks. 􏹩 Specifyingthatthesyntacticallycorrectprograms(token lists) are. Two tasks of syntax analysis As with the lexical phase, we have to deal with two distinct tasks. 􏹩 Specifyingthatthesyntacticallycorrectprograms(token lists) are. 􏹩 Checkingifaninputprogram(tokenlist)issyntactically correct according to the specification, and output a corresponding AST. Two tasks of syntax analysis As with the lexical phase, we have to deal with two distinct tasks. 􏹩 Specifyingthatthesyntacticallycorrectprograms(token lists) are. 􏹩 Checkingifaninputprogram(tokenlist)issyntactically correct according to the specification, and output a corresponding AST. Let’s deal with specification first. What are our options? How about using regular expressions for this purpose? Two tasks of syntax analysis As with the lexical phase, we have to deal with two distinct tasks. 􏹩 Specifyingthatthesyntacticallycorrectprograms(token lists) are. 􏹩 Checkingifaninputprogram(tokenlist)issyntactically correct according to the specification, and output a corresponding AST. Let’s deal with specification first. What are our options? How about using regular expressions for this purpose? Alas not every language can be expressed in these formalisms. Example: Two tasks of syntax analysis As with the lexical phase, we have to deal with two distinct tasks. 􏹩 Specifyingthatthesyntacticallycorrectprograms(token lists) are. 􏹩 Checkingifaninputprogram(tokenlist)issyntactically correct according to the specification, and output a corresponding AST. Let’s deal with specification first. What are our options? How about using regular expressions for this purpose? Alas not every language can be expressed in these formalisms. Example: Alphabet = {′(′,′ )′}. Two tasks of syntax analysis As with the lexical phase, we have to deal with two distinct tasks. 􏹩 Specifyingthatthesyntacticallycorrectprograms(token lists) are. 􏹩 Checkingifaninputprogram(tokenlist)issyntactically correct according to the specification, and output a corresponding AST. Let’s deal with specification first. What are our options? How about using regular expressions for this purpose? Alas not every language can be expressed in these formalisms. Example: Alphabet = {′(′,′ )′}. Language = all balanced parentheses (), ()(), (()), ((()(()())()(()))), ... FSAs/REs can’t count FSAs/REs can’t count Let’s analyse the situation a bit more. Why can we not describe the language of all balanced parentheses using REs or FSAs. FSAs/REs can’t count Let’s analyse the situation a bit more. Why can we not describe the language of all balanced parentheses using REs or FSAs. Each FSA has only a fixed number (say n) of states. But what if we have more than n open brackets before we hit a closing bracket? FSAs/REs can’t count Let’s analyse the situation a bit more. Why can we not describe the language of all balanced parentheses using REs or FSAs. Each FSA has only a fixed number (say n) of states. But what if we have more than n open brackets before we hit a closing bracket? Since there are only n states, when we reach the n + 1 open bracket, we must have gone back to a state that we already visited earlier, say when we processed the i-th bracket with i

Related Posts