Topic5
REGULAR EXPRESSIONS
3/27/2021
Regular Expression
A pattern of special characters used to match strings in a search Typically made up from special characters called metacharacters
Regular expressions are used thoughout UNIX: ◦ Editors: ed, ex, vi
◦ Utilities: grep, egrep, sed, and awk
TOPIC 5 – REGULAR EXPRESSIONS 1
2
Topic 5 – Regular Expressions
1
Metacharacters
3/27/2021
RE Metacharacter
Matches…
.
Any one character, except new line
[a-z]
Any one of the enclosed characters (e.g. a-z)
*
Zero or more of preceding character
? or \?
Zero or one of the preceding characters
+ or \+
One or more of the preceding characters
any non-metacharacter matches itself
The grep Utility
“grep” command: searches for text in file(s)
Examples:
% grep root mail.log
% grep r..t mail.log
% grep ro*t mail.log
% grep ‘ro*t’ mail.log
% grep ‘r[a-z]*t’ mail.log
3
Topic 5 – Regular Expressions
4
Topic 5 – Regular Expressions
2
more Metacharacters
3/27/2021
RE Metacharacter
^
$
\char
[^]
( ) or \( \)
| or \|
x\{m\}
x\{m,\}
x\{m,n\}
Matches…
beginning of line
end of line
Escape the meaning of char following it
One character not in the set
\<
\>
Beginning of word anchor
End of word anchor
Tags matched characters to be used later (max = 9)
Or grouping
Repetition of character x, m times (x,m = integer)
Repetition of character x, at least m times
Repetition of character x between m and m times
5
Topic 5 – Regular Expressions
Regular Expression
An atom specifies what text is to be matched and where it is to be found.
An operator combines regular expression atoms.
TOPIC 5 – REGULAR EXPRESSIONS 6
3
Atoms
An atom specifies what text is to be matched and where it is to be found.
3/27/2021
TOPIC 5 – REGULAR EXPRESSIONS 7
Single-Character Atom
A single character matches itself
TOPIC 5 – REGULAR EXPRESSIONS 8
4
Dot Atom
matches any single character except for a new line character (\n)
TOPIC 5 – REGULAR EXPRESSIONS 9
Class Atom
matches only single character that can be any of the characters defined in a set:
Example: [ABC] matches either A, B, or C.
Notes:
1) A range of characters is indicated by a dash, e.g. [A-Q] 2) Can specify characters to be excluded from the set, e.g.
[^0-9] matches any character other than a number.
TOPIC 5 – REGULAR EXPRESSIONS 10
3/27/2021
5
Example: Classes
TOPIC 5 – REGULAR EXPRESSIONS 11
short-hand classes
[:alnum:] [:alpha:] [:upper:] [:lower:] [:digit:] [:space:]
3/27/2021
12
Topic 5 – Regular Expressions
6
Anchors
Anchors tell where the next character in the pattern must be located in the text data.
TOPIC 5 – REGULAR EXPRESSIONS 13
Back References: \n
used to retrieve saved text in one of nine buffers
can refer to the text in a saved buffer by using a back reference: ex.: \1 \2 \3 …\9
more details on this later
3/27/2021
14
Topic 5 – Regular Expressions
7
Operators
TOPIC 5 – REGULAR EXPRESSIONS 15
Sequence Operator
In a sequence operator, if a series of atoms are shown in a regular expression, there is no operator between them.
3/27/2021
TOPIC 5 – REGULAR EXPRESSIONS 16
8
Alternation Operator: | or \|
operator (| or \| ) is used to define one or more alternatives
3/27/2021
Note: depends on version of “grep”
TOPIC 5 – REGULAR EXPRESSIONS 17
Repetition Operator: \{…\}
The repetition operator specifies that the atom or expression immediately before the repetition may be repeated.
TOPIC 5 – REGULAR EXPRESSIONS 18
9
Basic Repetition Forms
TOPIC 5 – REGULAR EXPRESSIONS 19
Short Form Repetition Operators: * + ?
3/27/2021
TOPIC 5 – REGULAR EXPRESSIONS 20
10
Group Operator
In the group operator, when a group of characters is enclosed in parentheses, the next operator applies to the whole group, not only the previous characters.
Note: depends on version of “grep” use \( and \) instead
TOPIC 5 – REGULAR EXPRESSIONS 21
Grep detail and examples
grep is family of commands ◦ grep
common version ◦ egrep
understands extended REs
(| + ? ( ) don’t need backslash) ◦ fgrep
understands only fixed strings, i.e. is faster ◦ rgrep
will traverse sub-directories recursively
3/27/2021
22
Topic 5 – Regular Expressions
11
Commonly used “grep” options:
-c Print only a count of matched lines.
-i Ignore uppercase and lowercase distinctions. -l List all files that contain the specified pattern.
-n Print matched lines and line numbers.
-s Work silently; display nothing except error messages. Useful for checking the exit status.
-v Print lines that do not match the pattern.
Example: grep with pipe
% ls -l | grep ‘^d’
23
Topic 5 – Regular Expressions
3/27/2021
Pipe the output of the “ls –l” command to grep and list/select only directory entries.
drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x
% ls -l | grep -c ‘^d’ 10
2 krush csci 2 krush csci 2 krush csci 2 krush csci 2 krush csci 2 krush csci 2 krush csci 2 krush csci 4 krush csci 2 krush csci
512 Feb 8 22:12 assignments 512 Feb 5 07:43 feb3
512 Feb 5 14:48 feb5
512 Dec 18 14:29 grades
512 Jan 18 13:41 jan13 512 Jan 18 13:17 jan15 512 Jan 18 13:43 jan20 512 Jan 24 19:37 jan22 512 Jan 30 17:00 jan27 512 Jan 29 15:03 jan29
Display the number of lines where the pattern was found. This does not mean the number of occurrences of the pattern.
24
Topic 5 – Regular Expressions
12
Example: grep with \< \>
% cat grep-datafile
Print the line if it contains the word “north”.
Example: grep with a\|b
% cat grep-datafile
SE Patricia Hemenway
EA TB Savage
NE AM Main Jr.
NO Ann Stephens
CT KRush
% grep ‘\
north NO Ann Stephens
SE Patricia Hemenway
EA TB Savage
NE AM Main Jr.
NO Ann Stephens
CT KRush
Print the lines that contain either the expression “NW” or the expression “EA”
% grep ‘NW\|EA’ grep-datafile
northwest NW Charles Main
eastern EA TB Savage
Note: egrep works with |
25
Topic 5 – Regular Expressions
3/27/2021
northwest
western
southwest
southern
southeast
eastern
northeast
north
central
Extra [A-Z]****[0-9]..$5.00
300000.00
53000.89
290000.73
54500.10
400000.00
440500.45
57800.10
455000.50
575500.70
455000.50
NW Charles Main
WE Sharon Gray
SW Lewis Dalsass
SO Suan Chin
northwest
western
southwest
southern
southeast
eastern
northeast
north
central
Extra [A-Z]****[0-9]..$5.00
300000.00
53000.89
290000.73
54500.10
400000.00
440500.45
57800.10
455000.50
575500.70
300000.00
440500.45
NW Charles Main
WE Sharon Gray
SW Lewis Dalsass
SO Suan Chin
26
Topic 5 – Regular Expressions
13
Example: egrep with +
% cat grep-datafile
Print all lines containing one or more 3’s.
% egrep ‘3+’ grep-datafile
northwest
western
southwest
NW Charles Main
WE Sharon Gray
SW Lewis Dalsass
CT KRush
Note: grep works with \+
Example: egrep with RE: ?
% cat grep-datafile
3/27/2021
northwest
western
southwest
southern
southeast
eastern
northeast
north
central
Extra [A-Z]****[0-9]..$5.00
300000.00
53000.89
290000.73
54500.10
400000.00
440500.45
57800.10
455000.50
575500.70
300000.00
53000.89
290000.73
NW Charles Main
WE Sharon Gray
SW Lewis Dalsass
SO Suan Chin
SE Patricia Hemenway
EA TB Savage
NE AM Main Jr.
NO Ann Stephens
northwest
western
southwest
southern
southeast
eastern
northeast
north
central
Extra [A-Z]****[0-9]..$5.00
300000.00
53000.89
290000.73
54500.10
400000.00
440500.45
57800.10
455000.50
575500.70
290000.73
NW Charles Main
WE Sharon Gray
SW Lewis Dalsass
SO Suan Chin
SE Patricia Hemenway
EA TB Savage
NE AM Main Jr.
NO Ann Stephens
CT KRush
Print all lines containing a 2, followed by zero or one period, followed by a number.
% egrep ‘2\.?[0-9]’ grep-datafile
southwest SW Lewis Dalsass
Note: grep works with \?
27
Topic 5 – Regular Expressions
28
Topic 5 – Regular Expressions
14
Example: egrep with ( )
% cat grep-datafile
Print all lines containing one or more consecutive occurrences of the pattern “no”.
% egrep ‘(no)+’ grep-datafile
northwest
northeast
north
NW Charles Main
NE AM Main Jr.
NO Ann Stephens
CT KRush
Note: grep works with \( \) \+
Example: egrep with (a|b)
% cat grep-datafile
3/27/2021
northwest
western
southwest
southern
southeast
eastern
northeast
north
central
Extra [A-Z]****[0-9]..$5.00
300000.00
53000.89
290000.73
54500.10
400000.00
440500.45
57800.10
455000.50
575500.70
300000.00
57800.10
455000.50
NW Charles Main
WE Sharon Gray
SW Lewis Dalsass
SO Suan Chin
SE Patricia Hemenway
EA TB Savage
NE AM Main Jr.
NO Ann Stephens
northwest
western
southwest
southern
southeast
eastern
northeast
north
central
Extra [A-Z]****[0-9]..$5.00
300000.00
53000.89
290000.73
54500.10
400000.00
440500.45
57800.10
455000.50
575500.70
53000.89
54500.10
NW Charles Main
WE Sharon Gray
SW Lewis Dalsass
SO Suan Chin
SE Patricia Hemenway
EA TB Savage
NE AM Main Jr.
NO Ann Stephens
Print all lines containing the uppercase letter “S”, followed by either “h” or “u”.
% egrep ‘S(h|u)’ grep-datafile
CT KRush
western WE Sharon Gray
southern SO Suan Chin
Note: grep works with \( \) \|
29
Topic 5 – Regular Expressions
30
Topic 5 – Regular Expressions
15
Example: fgrep
% cat grep-datafile
SE Patricia Hemenway
EA TB Savage
NE AM Main Jr.
NO Ann Stephens
CT KRush
Find all lines in the file containing the literal string “[A-Z]****[0-9]..$5.00”. All characters are treated as themselves. There are no special characters.
% fgrep ‘[A-Z]****[0-9]..$5.00’ grep-datafile
Extra [A-Z]****[0-9]..$5.00
Example: Grep with ^
% cat grep-datafile
3/27/2021
northwest
western
southwest
southern
southeast
eastern
northeast
north
central
Extra [A-Z]****[0-9]..$5.00
300000.00
53000.89
290000.73
54500.10
400000.00
440500.45
57800.10
455000.50
575500.70
NW Charles Main
WE Sharon Gray
SW Lewis Dalsass
SO Suan Chin
northwest
western
southwest
southern
southeast
eastern
northeast
north
central
Extra [A-Z]****[0-9]..$5.00
300000.00
53000.89
290000.73
54500.10
400000.00
440500.45
57800.10
455000.50
575500.70
300000.00
57800.10
455000.50
% grep ‘^n’ grep-datafile
northwest
northeast
north
NW Charles Main
NE AM Main Jr.
NO Ann Stephens
NW Charles Main
WE Sharon Gray
SW Lewis Dalsass
SO Suan Chin
SE Patricia Hemenway
EA TB Savage
NE AM Main Jr.
NO Ann Stephens
CT KRush
Print all lines beginning with the letter n.
31
Topic 5 – Regular Expressions
32
Topic 5 – Regular Expressions
16
Example: grep with $
% cat grep-datafile
SE Patricia Hemenway
EA TB Savage
NE AM Main Jr.
NO Ann Stephens
CT KRush
Print all lines ending with a period and exactly two zero numbers.
% grep ‘\.00$’ grep-datafile
northwest NW Charles Main
southeast SE Patricia Hemenway
Extra [A-Z]****[0-9]..$5.00
Example: grep with \char
% cat grep-datafile
3/27/2021
northwest
western
southwest
southern
southeast
eastern
northeast
north
central
Extra [A-Z]****[0-9]..$5.00
300000.00
53000.89
290000.73
54500.10
400000.00
440500.45
57800.10
455000.50
575500.70
300000.00
400000.00
NW Charles Main
WE Sharon Gray
SW Lewis Dalsass
SO Suan Chin
northwest
western
southwest
southern
southeast
eastern
northeast
north
central
Extra [A-Z]****[0-9]..$5.00
300000.00
53000.89
290000.73
54500.10
400000.00
440500.45
57800.10
455000.50
575500.70
NW Charles Main
WE Sharon Gray
SW Lewis Dalsass
SO Suan Chin
SE Patricia Hemenway
EA TB Savage
NE AM Main Jr.
NO Ann Stephens
Print all lines containing the number 5, followed by a literal period and any single character.
% grep ‘5\..’ grep-datafile
Extra [A-Z]****[0-9]..$5.00
CT KRush
33
Topic 5 – Regular Expressions
34
Topic 5 – Regular Expressions
17
Example: grep with [ ]
% cat grep-datafile
CT KRush
Print all lines beginning with either a “w” or an “e”.
% grep ‘^[we]’ grep-datafile
western WE Sharon Gray
eastern EA TB Savage
Example: grep with [^]
% cat grep-datafile
CT KRush
Print all lines ending with a period and exactly two non-zero numbers.
% grep ‘\.[^0][^0]$’ grep-datafile
western WE
southwest SW
eastern EA
Sharon Gray
Lewis Dalsass
TB Savage
35
Topic 5 – Regular Expressions
3/27/2021
northwest
western
southwest
southern
southeast
eastern
northeast
north
central
Extra [A-Z]****[0-9]..$5.00
300000.00
53000.89
290000.73
54500.10
400000.00
440500.45
57800.10
455000.50
575500.70
53000.89
440500.45
NW Charles Main
WE Sharon Gray
SW Lewis Dalsass
SO Suan Chin
SE Patricia Hemenway
EA TB Savage
NE AM Main Jr.
NO Ann Stephens
northwest
western
southwest
southern
southeast
eastern
northeast
north
central
Extra [A-Z]****[0-9]..$5.00
300000.00
53000.89
290000.73
54500.10
400000.00
440500.45
57800.10
455000.50
575500.70
53000.89
290000.73
440500.45
NW Charles Main
WE Sharon Gray
SW Lewis Dalsass
SO Suan Chin
SE Patricia Hemenway
EA TB Savage
NE AM Main Jr.
NO Ann Stephens
36
Topic 5 – Regular Expressions
18
Example: grep with x\{m\}
% cat grep-datafile
Print all lines where there are at least six consecutive numbers followed by a period.
central CT KRush 37 575500.70
Topic 5 – Regular Expressions
Example: grep with \<
% cat grep-datafile
CT KRush
% grep '[0-9]\{6\}\.' grep-datafile
northwest
southwest
southeast
eastern
north
NW Charles Main
SW Lewis Dalsass
SE Patricia Hemenway
EA TB Savage
NO Ann Stephens
3/27/2021
northwest
western
southwest
southern
southeast
eastern
northeast
north
central
Extra [A-Z]****[0-9]..$5.00
300000.00
53000.89
290000.73
54500.10
400000.00
440500.45
57800.10
455000.50
575500.70
300000.00
290000.73
400000.00
440500.45
455000.50
NW Charles Main
WE Sharon Gray
SW Lewis Dalsass
SO Suan Chin
SE Patricia Hemenway
EA TB Savage
NE AM Main Jr.
NO Ann Stephens
northwest
western
southwest
southern
southeast
eastern
northeast
north
central
Extra [A-Z]****[0-9]..$5.00
300000.00
53000.89
290000.73
54500.10
400000.00
440500.45
57800.10
455000.50
575500.70
300000.00
57800.10
455000.50
NW Charles Main
WE Sharon Gray
SW Lewis Dalsass
SO Suan Chin
Print all lines containing a word starting with “north”.
% grep '\