EECS 376: Foundations of Computer Science
University of Michigan, Winter 2022 Discussion Notes 4 1 Formal Languages
1.1 Decision Problems
The simplest types of problems in an abstract sense are those that have yes or no answers. These are called decision problems. Here are some examples of decision problems:
Copyright By PowCoder代写 加微信 powcoder
• Are there tickets available for the basketball game tomorrow?
• Can you get from Detroit to Los Angeles in under 600 miles of travel? • Can we fit 6 apples into a paper bag?
These are all decision problems because they have a yes or no answer. Note that a question such as ”What is the length of the shortest possible path between the wikipedia page on elephants and cats” is not a decision problem since the answer is not a yes or no: it is a number.
Also notice that we can generalize decision problems to be in terms of specific parameters. Going back to our second example ”Can you get from Detroit to Los Angeles in under 600 miles of travel”, we can generalize that problem to instead be ”Can we get from point A to point B in under z miles of travel”. Now we have a set of parameters A, B, and z that form a tuple which we can call w (w = (A, B, z)). Each individual w has a yes or no answer associated with that w.
1.2 Languages
A language is simply the set of all inputs that are ”yes” answers to some decision problem. For our example from above the set of all (A, B, z) tuples such that you can travel from A to B in under z miles makes up a language. We can formally define it as follows:
LcanT ravel = {(A, B, z) : There is a path to travel from A to B in less than z miles}
A given w = (A, B, z) is in the language LcanTravel (w ∈ LcanTravel) if and only if the path from
A to B is less than z miles.
Given that a language is nothing more than a set, we can do all of our normal set operations with 1 or more languages. The complement of this language would be:
LcanT ravel = {(A, B, z) : There is no way to travel from A to B in less than z miles}
Notice that every possible (A, B, z) tuple must be in either LcanTravel or LcanTravel. And no
(A, B, z) tuple can be in both LcanTravel or LcanTravel. 1.3 Alphabets and Sentences/Inputs
An alphabet is simply a set of symbols from which you can construct sentences/inputs. For example we can say our alphabet is the set of all English characters or ASCII from which we can then construct infinitely many finite length strings that all can be thought of as inputs.
EECS 376: Foundations of Computer Science
University of Michigan, Winter 2022 Discussion Notes 4 1.4 Special Languages
Certain languages are so common and useful that we’ve developed special symbols for them. The ones that will be discussed are Σ∗ and ∅. Σ∗ is the set of all finite length strings. And ∅ is the empty set (consists of nothing). Both of these languages are also called trivial languages
2 Deterministic Finite Automaton (DFA)
Note that for the discussion notes, we are using the term “DFA” as opposed to “FSA” used in some other contexts, but they refer to the same thing.
A deterministic finite automaton represents a very simple type of computational device or algorithm which has a finite amount of memory (in the form of states) and can only view its input sequentially once (i.e. it makes one pass over the input from beginning to end).
We can picture a DFA as a state diagram or directed graph where each node represents a state of the DFA and each (directed) edge represents the behavior of the transition function, δ. We can then simulate the execution of the DFA on an input w = w1w2w3 . . . wn as follows:
current state = q0 i = 1
while i ≤ n do
current state = δ(wi, current state)
▷ We have finished parsing the input
if current state is an accept state then
accept w else
▷ We take q0 to be the starting state by convention ▷ i is effectively a pointer to our position in w
One natural question which arises is, given a DFA, what strings does that DFA accept?
Definition 2.1 (D decides language A). For a language A, we say DFA D decides A if and only ifforallw∈A,Dacceptswandrejectsallw̸∈A. WedefineL(D)=Aasshorthandforthe language D decides.
Definition 2.2 (Regular language). A language A is a regular language if and only if there exists a DFA that decides A.
Example 2.3. What language does the following DFA decide?
EECS 376: Foundations of Computer Science
University of Michigan, Winter 2022
Discussion Notes 4
Solution: The DFA decides {ab∗,ba∗}. There are two accept states; we examine q1 and q2 is handled similarly. To get to state q1, the input must start with a. Once in state q1, the DFA can stay in state q1 by reading any number of b. If the DFA instead reads a, the DFA will transition to state q3 at which it cannot reach any accept state.
Example 2.4. Show that {a, b}∗ is a regular language by building a DFA that decides it.
Solution: To show this, we would need to build a DFA which decides {a,b}∗. Consider the following single state DFA with Σ = {a, b}:
Terminology Summary
Term Alphabet
Word / Sentence / Input Language
Definition
set of symbols
string drawn from alphabet set of words / sentences
Example ASCII EECS376 {Kamil, Lee, Pettie}
(Optional) Regular Expressions (regex)
During your time studying computer science, you may have come across regular expressions, or regex for short. Regular expressions are a sequence of characters that, among other uses, allow computer programs to easily search for certain patterns within a string or many strings. Remember regular languages from earlier? A regular expression is just a convenient way of describing a regular
EECS 376: Foundations of Computer Science
University of Michigan, Winter 2022 Discussion Notes 4
language!1 For example, consider the regular language {a, b}∗ from Example 1.4. This language contains all strings of only a’s or b’s and the empty string. In regular expression form, it could be written as:
Note that the Kleene star is used here in the same way it was used for defining languages. In practice, however, this would not be a very useful regular expression – it matches empty strings! If we replaced the “*” with a “+”, then it would match any string with 1 or more a’s or b’s, which is a bit more useful. Here’s an example of a more complex regular expression:
The “A-Z” contained within brackets matches a single character with code in between that of A and Z, i.e. every capital letter. Next, the “\w” matches any “word” character, which refers to all alphanumeric characters and underscores. Finally, the “+” on the end signifies to match one or more of the previous token, which in this case is “\w”. In plain English, this regular expression matches all sequences of characters that begin with a capital letter and are followed by at least one alphanumeric character (or underscore). In the next few examples, we will look at using regex for real applications in a programming language, specifically Python.
Example 3.1. You are writing a paper, and as your professor is reviewing it, they notice that you use articles very often: ”a”, ”an”, and ”the”. You want a way to easily find every article in your paper, but using the built-in Find tool (Ctrl-F) only lets you search one word at a time.
Write a Python function that uses regex to find every article within a string called txt.
Solution: import r e
def find articles(txt):
return re.findall(’\\b[Aa]n?\\b|\\b[Tt]he\\b’, txt)
Let’s unpack what this regex is doing. The “|” in the middle defines a Boolean OR operator, so a pattern on either side is matched.
On the left side, we have “\\b[Aa]n?\\b”. The “\\b” matches a boundary of a word, so generally either a space or a period. Note we have two backslashes because Python treats only one backslash as an escape character. The “[Aa]” matches an upper or lower-case “a”, and the “n?” matches either 0 or 1 n’s. In plain English, this matches instances of the word “A”, “a”, “An”, or “an”.
On the right side, we have “\\[Tt]he\\b”. Again, the word boundary anchors are used. The “[Tt]” matches an upper or lower-case “t”, and the “he” matches “he” exactly. In plain English, this matches instances of the word “The” or “the”.
Thus, combined with the “|” operator, we are able to match every instance of an article in txt.
Example 3.2. As you start examining the use of articles in your paper, you notice that you oc- casionally mix up the use of “a” and “an”, e.g. you incorrectly write “a apple” or “an banana”.
1In some implementations, regular expressions can even describe some languages that are not regular.
EECS 376: Foundations of Computer Science
University of Michigan, Winter 2022 Discussion Notes 4 Ctrl-F maybe would have sufficed for the previous example, but you really don’t want to try ev-
ery possible combination of “a” with vowels and “an” with consonants to find incorrect article usage!
Write a Python program that uses regex to find every instance of incorrect article usage in a string called txt. For this example, assume that words that start with “a”, “e”, “i”, “o”, or “u” always start with a vowel and those that start with any other alphabet character start with a consonant. E.g. “a herb” and “an unicorn” would not be marked as incorrect, although they are.
Solution: import r e
def find incorrect article usage(txt):
return re.findall((’\\b[Aa]\\s\\b[aeiouAEIOU][a−zA−Z]∗’
’\\b[Aa]n\\s\\b[ˆaeiouAEIOU\\s][a−zA−Z]∗’), txt)
This might look like a lot – and it is – but there’s only a few elements in this regex that haven’t been seen in previous examples. Like before, the “|” is used to match two cases: (1) “a” followed by a vowel, and (2) “an” followed by a consonant. Let’s examine the parts that we haven’t seen before. “\\s” matches any whitespace character such as a space, tab, or newline. Followed by a “\\b”, this ensures that “a” or “an” is being followed by a word and not just non-alphabet characters. As you may be able to tell now, “[aeiouAEIOU]” matches any vowel. Next, “[a-zA-Z]*” matches any 0 or more of any alphabet character that follows the vowel that we just detected – it extends the match from the single vowel until the end of the word. Now let’s look at the second half of the regex. Everything is review until “[^aeiouAEIOU\s]”, which is a completely new concept. Unlike regular square brackets “[abc]” which include all characters within them, square brackets with a caret “[^abc]” exclude all characters in them. In this case, we are excluding any vowel or whitespace character. Because we are preceded by a “\\b”, we can guarantee that the character we get is still an alphabet character, so it must be a consonant. Then, just like before, “[a-zA-Z]*” matches until the end of the word.
If you are interested in playing around with regular expressions yourself in an environment that’s more user-friendly than writing a program, try this website which lets you easily build and test regular expressions: https://regexr.com/
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com