代写 C++ C compiler security Compiler Theory

Compiler Theory
Day 2, January 24, 2019
Example of using Visual Studio with
a Simple Lexical Analyzer

This exercise will guide you through using Visual Studio 2017 to set up and
run a C program, using a lexical analyzer for decimal fractions as an example.

1. Launch Visual Studio 2017.

2. In the File menu, choose New, and then Project.

3. In the “Templates” at left, “Win32” should be highlighted.
Highlight “Win32 Console Application”, since the lexical analyzer will
run in a terminal window (also called a console window). Have the
type be “Visual C++” is okay, since C programs are also legal C++ programs.

4. Down at the bottom in the “Name” box, enter a name for the project. This
will be the name used for the executable file. For this example. use the
name “Lexer”.

5. In the “Location” box, put a folder name where you want the project files.
You can use the Browse button at the right.

6. If you want the project to be in its own subfolder of the folder in step 5,
check the “Create directory for solution” and fill in the “Solution name”,
which is usually the same as the “Name” earlier.

7. Click OK.

8. You get a Win32 Application Wizard. Click “Application Settings” at the left.
Under “Additional Options” check “Empty project” and uncheck “Precompiled
Header” and “Security Development Lifecycle checks”. Click OK.

9. You should now be in your project. In the “Solution Explorer” window at
left, right-click on “Source Files”, “Add”, “New item”. Then highlight
“C++ File (.cpp)”. In the “Name” box below, enter “lexer.c”. Using the “c”
extension instead of “cpp” tells Visual Studio it is C language rather
than C++. Click the “Add” button.

10. You get an edit window for lexer.c. Enter the program listed at the
end of this document.

11. In the “Build” menu, click “Build Solution”. If you get error messages,
fix them and “Build Solution” again.

12. In the “Debug” menu, click “Start Debugging”. A console window will appear,
blank, and you can type in text for the program to analyze. Note that your
program does not get any of the text until you hit ENTER at the end of a line.

13. You can end the program by closing its window, or typing CTRL-Z ENTER.

ASSIGNMENT:

Modify the sample lexical analyzer to recognize the tokens of ATTO-C. You should
not record the lexeme of a comment, since a comment can be arbitrarily long. You
do have to record the lexeme of a string, since your compiler needs to remember
what the string was, but since a string cannot have newlines in it, you can make
the lexeme long enough to hold any reasonable string. Decimal fractions are NOT
tokens in ATTO-C, so the decimal fraction stuff should be removed.
OVER

/* lexer.c

A simple lexical analyzer for integers and decimal fractions.

Author: Ken Brakke
Date: Jan. 26, 2017

A Finite State machine is used to recognize strings consisting of digits
followed by a decimal point followed by digits. There must be at least
one digit before the decimal point. The decimal point and digits after
the decimal point are optional. Characters not part of legal tokens
will cause a REJECT message, and the lexer will move on to the next
token.

Note: The FINAL state here does not refer to any particular node of the
FSM. It is used after each true “final” state to signal the lexical
analyzer to start a new token.

Usage: Launch. A console window will appear. You may type in strings
followed by ENTER, and they will either be accepted by the Finite State
Machine and print ACCEPT, or not and print REJECT. The token type and
lexeme (the characters of the token) will also be printed. Multiple
numbers may be entered on one line. To exit the program, hit CTRL-Z and
ENTER.

*/

#include // standard input-output declarations: printf, stdin
#include // standard library declarations: exit
#include // character type test declarations: isdigit, isalpha, isalnum

// Finite State Machine states
#define START 1
#define INTEGER 2
#define DEC_FRAC 3
#define FINAL 4

// Token types
#define INTEGER_TOK 101
#define DECIMAL_FRACTION_TOK 102

// Size of the lexeme buffer
#define LEX_SIZE 100

// Special look-ahead character value to indicate none
#define NO_CHAR 0

int main()
{ int state; // The current state of the FSM.
int next_char; // The next character of input.
char lexeme[LEX_SIZE]; // The characters of the token.
int lex_spot; // Current spot in lexeme.
int token_type; // The type of token found.

// Infinite loop, doing one token at a time.
next_char = NO_CHAR; // no lookahead character to start with
while ( 1 )
{ // Initialize the Finite State Machine.
state = START;
lex_spot = 0;
// Loop over characters of the token.
while ( state != FINAL )
{ if ( next_char == NO_CHAR )
next_char = getc(stdin); // get one character from standard input
if ( next_char == EOF ) // EOF is special character for End-Of-File
exit(0); // exit the program with exit code 0, which is “success”.
switch ( state )
{ case START:
if ( next_char == ‘\n’ ) // just eat the newline and stay in START
next_char = 0;
else if ( isdigit(next_char) )
{ state = INTEGER;
lexeme[lex_spot++] = next_char; // Add the character to the lexeme
next_char = NO_CHAR; // eat the character
}
else
{ printf(“REJECT %c\n”,next_char); // This is not a legal final state
state = FINAL; // but we want to end the token anyway
next_char = NO_CHAR; // eat the offending character
}
break; // Need “break” at the end of a case, else you will continue
// to the next case.
case INTEGER:
if ( isdigit(next_char) )
{ state = INTEGER;
lexeme[lex_spot++] = next_char;
next_char = NO_CHAR;
}
else if ( next_char == ‘.’ )
{ state = DEC_FRAC;
lexeme[lex_spot++] = next_char;
next_char = NO_CHAR;
}
else
{ lexeme[lex_spot] = 0; // null for end of string
token_type = INTEGER_TOK;
printf(“ACCEPT INTEGER %s\n”,lexeme); // This is a final state
state = FINAL; // leave next_char alone, for next token
}
break;
case DEC_FRAC:
if ( isdigit(next_char) )
{ state = DEC_FRAC;
lexeme[lex_spot++] = next_char;
next_char = NO_CHAR;
}
else
{ lexeme[lex_spot] = 0; // null for end of string
token_type = DECIMAL_FRACTION_TOK;
printf(“ACCEPT DECIMAL_FRACTION %s\n”,lexeme); // This is a final state
state = FINAL; // leave next_char alone, for next token
}
break;
default:
printf(“INTERNAL ERROR: Illegal state %d\n”,state);
state = FINAL;
break;

} // end of switch

} // end of while state

} // end of infinite loop

return 0; // successful exit code
} // end of main

Reference:
Beginner’s C syntax:
https://www.tutorialspoint.com/cprogramming/c_program_structure.htm
https://www.tutorialspoint.com/cprogramming/c_basic_syntax.htm
etc.