Instructions
2 Regular expressions
You are going to write a module that implements regular expression filtering. Regular expressions are a family of patterns widely used to search in text, among other things. One of the reasons is that it is possible to do this research very efficiently by compiling these patterns to what are called deterministic finite state machines (DFA). This compilation is often done by passing through an intermediate phase which transforms the regular expression into a non-deterministic finite state machine (NFA). In this lab, you will implement this first phase, as well as an interpreter for these NFA machines.
Copyright By PowCoder代写 加微信 powcoder
Regular expressions usually have the following form:
C constant pattern that accepts the character C
. pattern accepting any character
C1…Cn accepts any character from C1…Cn
ε accept the empty string
Re1 Re2 concatenation: accept Re1 followed by Re2
Re1 | Re2 pattern disjunction
Re? optional pattern
Re∗ pattern repeat
Re+ non-zero repetition
(?Name: Re) named pattern
To search in text, we often add all kinds of extensions such as patterns that recognize the beginning or the end of a line. On a more theoretical side, the language of regular expressions also includes conjunction and negation. However within the framework of this work, we will limit ourselves to the primitives listed above.
3 The NFAs
A non-deterministic finite state machine is a directed graph where each node represents a state and each edge indicates a possible transition from one state to another. The arcs can either be linked to an event, in which case this transition is only possible if the event takes place, or otherwise they can be of type ε, i.e. the transition can take place at any what moment.
For our application, the events will simply be characters, so the machine will step through a string of characters using the current character to decide which transitions are possible. There will be a starting state and some states will be considered as final, which means that when the machine arrives at such a state it has finished its filtering work and the string it has traversed is the string accepted by the machine ( and therefore accepted by the regular expression it represents).
For example, the following machine (with s1 as the starting state) can be used to match the pattern “a*a”, i.e. to match a character string that contains an arbitrary repetition of a followed by a letter has additional:
s1 : ε→s2,a→s1 s2 : a→s3
s3 : success
A more important detail is that from state s1 one can always jump to state s2, even if the current character is a and there are therefore often several possible transitions (hence the “non-deterministic”) . So the execution of such a machine must in general try all possible transitions in order to find which allow the machine to reach a final state.
4 The code
The provided code consists of the following parts:
— A set of predicates that define what a valid NFA is. These
predicates are not directly useful, except as documentation of the format to use (and occasionally for debugging): your code must use machines that adhere to these rules. These predicates have names that end in _wf (for well-formed).
— Same thing for regular expressions.
— A direct implementation of the search for regular expressions, without
go through an NFA machine. Consider this code as a kind of specification that will allow you to compare the execution of your code to a reference implementation. The main entry point is search.
— A few other relationships that complete the code you need to write.
What to do
You need to implement a regular expression search algorithm that uses an NFA internally. This is done in two parts:
1. You need to implement the code for nfa_match, which executes a given NFA on a given string.
2. You need to implement the code for re_comp, which takes a regular expression and translates (compiles) it into an NFA.
You need to implement these missing pieces of code so that the re_search rule, which compiles a regular expression into an NFA and then passes it to nfa_search, behaves as closely as possible to search. You can of course add as many auxiliary rules as you want.
Be careful to implement re_comp and nfa_match independently enough so that it is possible to test your re_comp with our nfa_match and vice versa. This is also what the nfa_wf and state_wf tests are for.
· The code should never exceed 80 columns.
· The code must be well indented.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com