CS计算机代考程序代写 Microsoft PowerPoint – 07_regular-expression-converted.pptx

Microsoft PowerPoint – 07_regular-expression-converted.pptx

Regular Expressions

Max Magguilli (2021)

Regular Expressions

◆A regular expression is a special text string
defining a search pattern for matching text.

◆Regular expressions are used in many
Unix utilities.

– like grep, sed, vi, emacs, awk, …

◆ The form of a regular expression:

– It can be plain text …

> grep unix file (matches all the appearances of
unix in file)

– It can also be special text …

> grep ‘[uU]nix’ file (matches unix and Unix)

Regular Expressions and Filename Expansion

◆Regular expressions are different from file
name expansion.

– Regular expressions are interpreted and
matched by special utilities (such as grep).

– File name expansions are interpreted and
matched by shells.

– They have different wildcarding systems.

– Filename expansion takes place first!

compute[1] > grep ‘[uU]nix’ file

compute[2] > grep [uU]nix file (try yourself)

Regular Expression Wildcards

◆ A dot . matches any single character
a.b matches axb, a$b, abb, a.b

but does not match ab, axxb, a$bccb

◆* matches zero or more occurrences of

the previous single character pattern
a*b matches b, ab, aab, aaab, aaaab, …

but doesn’t match axb

◆ What does the following match?

.*

Regular Expression Wildcards

◆+ matches one or more occurrences of the
previous single character pattern

a\+b matches ab, aab, aaab, aaaab, …

◆ ? matches zero or one occurrence of the
previous single character pattern

a\?b matches b and ab

◆+ and ? have to be escaped with \ to have
the special meaning

◆Use \(r\) with *, \+, and \? if r is not just a
single character

Character Sets and Ranges

◆Matching a set or range of characters is
done with […]
– [wxyz] – match any of w, x, y, z

[u-z] – match a character in range u – z

◆ Combine this with * to match repeated sets
– Example: [aeiou]* – match any number of

vowels

◆ Wildcards lose their specialness inside […]
– If the first character inside the […] is ], it loses its

specialness as well

– Example: ‘[])}]’ matches any of those closing
brackets

7

END OF PART 1

REGULAR EPRESSIONS

Match Parts of a Line

◆ Match beginning of line with ^ (caret)

^TITLE

– matches any line containing TITLE at the beginning

– ^ is only special if it is at the beginning of a regular
expression

◆ Match the end of a line with a $ (dollar sign)

FINI$

– matches any line ending in the phrase FINI

– $ is only special at the end of a regular expression

– Don’t use $ and double quotes (problems with shell)

◆ What does the following match? ^WHOLE$

Matching Parts of Words

◆ Regular expressions have a concept of a “word”
which is a little different than an English word.

– A word is a pattern containing only letters,
digits, and underscores (_)

◆ Match beginning of a word with \< – \

– ox\> matches ox if it appears at the end of a
word

◆ Whole words can be matched too: \

More Regular Expressions

◆ Matching the complement of a set by using the ^

– [^aeiou] – matches any non-vowel

– ^[^a-z]*$ – matches any line containing no lower
case letters

◆ Regular expression escapes
– Use the \ (backslash) to “escape” the special

meaning of wildcards

❖CA\*Net

❖This is a full sentence\.

❖array\[3]

❖C:\\DOS

❖\[.*\]

Regular Expressions Recall

◆ A way to refer to the most recent match

◆ To remember portions of regular expressions
– Surround them with \(…\)

– Recall the remembered portion with \n where n
is 1-9

❖Example: ‘^\([a-z]\)\1’

– matches lines beginning with a pair of
duplicate (identical) letters

❖Example: ‘^.*\([a-z][a-z]*\).*\1.*\1’

– matches lines containing at least three
copies of something which consists of
lower case letters

Matching Specific Numbers of Repeats

◆ X\{m,n\} matches m — n repeats of the one
character regular expression X
– E.g. [a-z]\{2,10\} matches all sequences of 2 to 10

lower case letters

◆ X\{m\} matches exactly m repeats of the one
character regular expression X

– E.g. #\{23\} matches 23 #s

◆ X\{m,\} matches at least m repeats of the one
character regular expression X
– E.g. ^[aeiou]\{2,\} matches at least 2 vowels in a

row at the beginning of a line

◆ .\{1,\} matches more than 0 characters

13

END OF PART 2

REGULAR EPRESSIONS

Regular Expression Examples (1)

◆ How many words in /usr/share/dict/words end in ing?

– grep -c ‘ing$’ /usr/share/dict/words

The -c option
says to count the
number of matches

◆ How many words in /usr/share/dict/words start with un
and end with g?

– grep -c ‘^un.*g$’ /usr/share/dict/words

◆ How many words in /usr/share/dict/words begin with a
vowel?

– grep -ic ‘^[aeiou]’ /usr/share/dict/words

The -i option
says to ignore
case distinction

Regular Expression Examples (2)

◆ How many words in /usr/share/dict/words have
triple letters in them?

– grep -ic ‘\(.\)\1\1’ /usr/share/dict/words

◆ How many words in /usr/share/dict/words start
and end with the same 3 letters?

– grep -c ‘^\(…\).*\1$’ /usr/share/dict/words

◆ How many words in /usr/share/dict/words
contain runs of 4 consonants?

– grep -ic ‘[^aeiou]\{4\}’ /usr/share/dict/words

Regular Expression Examples (3)

◆ What are the 5 letter palindromes present
in /usr/share/dict/words?

– grep -ic ‘^\(.\)\(.\).\2\1$’ /usr/share/dict/words

◆ How many words in /usr/share/dict/words have y
as their only vowel

– grep ‘^[^aAeEiIoOuU]*$’ /usr/share/dict/words
| grep -ci ‘y’

◆ How many words in /usr/share/dict/words do not
start and end with the same 3 letters?

– grep -ivc ‘^\(…\).*\1$’ /usr/share/dict/words

The -v option says to
select non-matching lines

17

END OF PART 3

REGULAR EPRESSIONS

Extended Regular Expressions (1)

◆ Used by some utilities like egrep or grep –E
support an extended set of matching mechanisms.
– Called extended or full regular expressions.

– Less use of escape character \, but no recall

◆ + matches one or more occurrences of the
previous single character pattern.

– a+b matches ab, aab, … but not b (unlike *)

◆ ? matches zero or one occurrence(s) of the
previous single character pattern.

– a?b matches b, ab

◆ + and ? do not need \ to have the special meaning
in extended regular expression.

Extended Regular Expressions (2)

◆ r1|r2 matches regular expression r1 or r2
(| acts like a logical “or” operator).

– red|blue will match either red or blue

– Unix|UNIX will match either Unix or UNIX

◆ (r) allows the *, +, or ? matches to apply to
the entire regular expression r, and not just
a single character.

– (ab)+ requires at least one repetition of ab

Extended Regular Expressions (3)

◆ ‘\(r1\).*\1’ or ‘(r1).*\1’ (recall) is not working
for extended regular expression.

– ‘[^aeiou]\{4\}’ is replaced by ‘[^aeiou]{4}’

– ‘(r1){4}’ can be used

◆ character classes are predefined.

– [:lower:]

– [:upper:]

– [:alpha:]

– [:digit:]

for a-z

for A-Z

for A-Za-z

for 0-9

– [:alnum:] for A-Za-z0-9

21

END OF PART 4

REGULAR EPRESSIONS