程序代写 CSE 3321…

What is a language?
Computer Science and Engineering  The Ohio State University

Regular Expressions

Copyright By PowCoder代写 加微信 powcoder

Computer Science and Engineering  College of Engineering  The Ohio State University

Computer Science and Engineering  The Ohio State University

Programming Languages
Computer Science and Engineering  The Ohio State University
Q: Are C, Java, Ruby, Python, … languages in this formal sense?

Programming Languages
Computer Science and Engineering  The Ohio State University

Regular Expression (RE)
Computer Science and Engineering  The Ohio State University
A formal mechanism for defining a
In math, a clear distinction between:
In computer applications, there isn’t
Precise, unambiguous, well-defined
Characters in strings (the “alphabet”) Meta characters used in writing a RE
Is ‘*’ a Kleene star or an asterisk?
(a|b)*a(a|b)(a|b)(a|b)

A literal represents a character from
the alphabet
f,i,s,h, …
Whitespace is hard (invisible!)
Some are easy:
\t is a tab (ascii 0x09)
\n is a newline (ascii 0x0A)
\r is a carriage return (ascii 0x0D)
So the character ‘\’ needs to be escaped! \\ is a \ (ascii 0x5c)
Computer Science and Engineering  The Ohio State University

Basic Operators
These operators are meta-characters too To represent the literal: \( \) \|
Activity: For each RE above, write out the
corresponding language explicitly (ie, as a
set of strings)
\(61(3|4)\)
Computer Science and Engineering  The Ohio State University
( ) for grouping, | for choice
cat|dog|fish
R(uby|ails)
(G|g)r(a|e)y

Character Class
Set of possible characters (0|1|2|3|4|5|6|7|8|9) is annoying!
[^A-Z] a character that is not a capital letter
Syntax: [ ]
Explicit list as [0123456789]
Activity: Write the language defined by
Range as [0-9]
Negate with ^ at the beginning
0[xX][0-9a-fA-F]
Computer Science and Engineering  The Ohio State University

Character Class Shorthands
\s for whitespace, ie [ \t\r\n]
\w for word character, ie [0-9a-zA-Z_]
And negations too
\D, \S, \W (ie [^\d], [^\s], [^\w])
Warning: [^\d\s] ≠ [\D\S]
POSIX standard (& Ruby) includes [[:alpha:]] alphabetic character
[[:lower:]] lowercase alphabetic character [[:digit:]] decimal digit (unicode! Eg ١) [[:xdigit:]] hexadecimal digit [[:space:]] whitespace including newlines
Computer Science and Engineering  The Ohio State University
\d for digit, ie [0-9]

Includes space, tab, punctuation, etc! But does not include newline
So add . to list of meta-characters Use \. for a literal period
buckeye\.\d
Problem: What is RE for OSU email
address for everyone named Smith?
Answer is not:
Computer Science and Engineering  The Ohio State University
A . matches any character (almost)

Repetition
Computer Science and Engineering  The Ohio State University
Applies to preceding character or ( )
\? \* \+ \{ \}
? means 0 or 1 time
* means 0 or more times (unbounded)
+ means 1 or more times (unbounded) {k} means exactly k times {a,b}means k times, for a ≤ k ≤ b
More meta-characters to escape!

Computer Science and Engineering  The Ohio State University

0[xX](0|[1-9a-fA-F][0-9a-fA-F]*)

Computer Science and Engineering  The Ohio State University
(Language consisting of) strings that:
Contain only letters, numbers, and _ Start with a letter
Do not contain 2 consecutive _’s
Do not end with _
Exemplars and counter-exemplars:
EOF, 4Temp, Test_Case3, _class, a4_Sa
p_X, S__T_2
Write the corresponding RE

Computer Science and Engineering  The Ohio State University
(Language consisting of) strings that:
Contain only letters, numbers, and _ Start with a letter
Do not contain 2 consecutive _’s
Do not end with _
Exemplars and counter-exemplars:
EOF, 4Temp, Test_Case3, _class, a4_Sa
p_X, S__T_2
Write the corresponding RE

Computer Science and Engineering  The Ohio State University
(Language consisting of) strings that:
Contain only letters, numbers, and _ Start with a letter
Do not contain 2 consecutive _’s
Do not end with _
Exemplars and counter-exemplars:
EOF, 4Temp, Test_Case3, _class, a4_Sa
p_X, S__T_2
Write the corresponding RE
[a-zA-Z](_[a-zA-Z0-9]|[a-zA-Z0-9])*

Finite State Automota (FSA)
Computer Science and Engineering  The Ohio State University
An FSA is an “accepting rule”
Finite set of states
Transition function (relation) between
states based on next character in string
DFA vs NFA
 Start state (s0)
Set of accepting states
An FSA “accepts” a string if you can start in s0 and end up in an accepting
state, consuming 1 character per step

What language is defined by this FSA?
Computer Science and Engineering  The Ohio State University

Computer Science and Engineering  The Ohio State University
What language is defined by this FSA?
A. Binary strings (0’s and 1’s) with an
even number of 0’s

Computer Science and Engineering  The Ohio State University
(Language consisting of) strings that:
Contain only letters, numbers, and _ Start with a letter
Do not contain 2 consecutive _’s
Do not end with _
Exemplars and counter-exemplars:
EOF, 4Temp, Test_Case3, _class, a4_Sa
p_X, S__T_2
Write the corresponding FSA

Computer Science and Engineering  The Ohio State University

Fundamental Results
Expressive power of RE is the same as
Expressive power of RE is limited
Write a RE for “strings of balanced
()(()()), ()(), (((()))), … (((, ())((), …
Can not be done! (impossibility result) Take CSE 3321…
Computer Science and Engineering  The Ohio State University

REs in Practice
Possible uses:
Report matching substrings and locations
Replace match with something else
Practical aspects of using REs this way Anchors
Greedy vs lazy matching
Computer Science and Engineering  The Ohio State University
REs often used to find a “match”
A substring s within a longer string such that s is in the language defined by the RE
(CSE|cse) ?3901

Used to specify where matching string
should be with respect to a line of text
Newlines are natural breaking points
^Hello World$
^[^\d].\.jpe?g
^ anchors to the beginning of a line
$ anchors to the end of a line
Ruby: \A \z for beginning/end of string
Computer Science and Engineering  The Ohio State University

Greedy vs Lazy
Repetition (+ and *) allows multiple matches to begin at same place
The match selected depends on whether the repetition matching is
Default is typically greedy
For lazy matching, use *? or +?
Example: <.*>

Title

Title

greedy, ie matches as much as possible lazy, ie matches as little as possible
Computer Science and Engineering  The Ohio State University

Regular Expressions in Science and Engineering  The Ohio State University
 Instance of a class (Regexp) pattern = Regexp.new(‘^Rub.’)
 But literal notation is common: /pattern/ /[aeiou]*/
%r{hello+} #no need to escape /
 Match operator =~ (negated as !~)
‘hello world’ =~ /o/ #=> 4 /or/ =~ ‘hello’ #=> nil
 Case equality, Regexp === String, -> Boolean
 Options post-pended: /pattern/options
Operands: String and Regexp (in either order)
Returns index of first match (or nil if not present)
i ignore case
x ignore whitespace & comments (“free spacing”)

Strings and Regular Expressions
Computer Science and Engineering  The Ohio State University
Find all matches as an array
a.scan /[[:alpha:]]/
Delimeter for spliting string into array
a.split /[aeiou]/
Substitution: sub and gsub (+/- !)
Replace first match vs all (“globally”)
a = “the quick brown fox”
a.sub /[aeiou]/,
#=> “th@ quick brown fox”
a.gsub /[aeiou]/,

Your Turn (Regular Expressions)
Computer Science and Engineering  The Ohio State University
Check if phone number in valid format
phone = “614-292-2900” #not ok phone = “(614) 292-2900” #ok
format = ?
if phone ? format #well-formatted

Your Turn (Regular Expressions)
Computer Science and Engineering  The Ohio State University
Check if phone number in valid format
phone = “614-292-2900” #not ok phone = “(614) 292-2900” #ok
format = /\A\(\d{3}\) \d{3}-\d{4}\z/ if phone =~ format #well-formatted

Expressive power same as RE
Language: A set of strings
RE: Defines a language
Repetition
Recipe for making elements of language Literals
Distinguish characters and metacharacters Character classes
Represent 1 character in RE
Computer Science and Engineering  The Ohio State University

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com