PowerPoint Presentation
4/9/2021
1
Topic 7
AWK
4/9/2021
2
What is awk?
created by: Aho, Weinberger, and Kernighan
scripting language used for manipulating data and generating reports
versions of awk
◦ awk, nawk, mawk, pgawk, …
◦ GNU awk: gawk
TOPIC 7 – AWK
3
What can you do with awk?
awk operation:
◦ scans a file line by line
◦ splits each input line into fields
◦ compares input line/fields to pattern
◦ performs action(s) on matched lines
Useful for:
◦ transforming data files
◦ producing formatted reports
Programming constructs:
◦ format output lines
◦ arithmetic and string operations
◦ conditionals and loops
TOPIC 7 – AWK
4
4/9/2021
3
The Command: awk
TOPIC 7 – AWK
5
Basic awk Syntax
awk [options] ‘script’ file(s)
awk [options] –f scriptfile file(s)
Options:
-F to change input field separator
-f to name script file
TOPIC 7 – AWK
6
4/9/2021
4
Basic awk Program
consists of patterns & actions:
pattern {action}
◦ if pattern is missing, action is applied to all lines
◦ if action is missing, the matched line is printed
◦ must have either pattern or action
Example:
awk ‘/for/’ testfile
◦ prints all lines containing string “for” in testfile
TOPIC 7 – AWK
7
Basic Terminology: input file
A field is a unit of data in a line
Each field is separated from the other fields by the field separator
◦ default field separator is whitespace
A record is the collection of fields in a line
A data file is made up of records
TOPIC 7 – AWK
8
4/9/2021
5
Example Input File
T
O
P
IC
7
–
A
W
K
9
Buffers
awk supports two types of buffers:
record and field
field buffer:
◦ one for each fields in the current record.
◦ names: $1, $2, …
record buffer :
◦ $0 holds the entire record
TOPIC 7 – AWK
10
4/9/2021
6
Some System Variables
FS Field separator (default=whitespace)
RS Record separator (default=\n)
NF Number of fields in current record
NR Number of the current record
OFS Output field separator (default=space)
ORS Output record separator (default=\n)
FILENAME Current filename
TOPIC 7 – AWK
11
Example: Records and Fields
% cat emps
Tom Jones 4424 5/12/66 543354
Mary Adams 5346 11/4/63 28765
Sally Chang 1654 7/22/54 650000
Billy Black 1683 9/23/44 336500
% awk ‘{print NR, $0}’ emps
1 Tom Jones 4424 5/12/66 543354
2 Mary Adams 5346 11/4/63 28765
3 Sally Chang 1654 7/22/54 650000
4 Billy Black 1683 9/23/44 336500
TOPIC 7 – AWK
12
4/9/2021
7
Example: Space as Field Separator
% cat emps
Tom Jones 4424 5/12/66 543354
Mary Adams 5346 11/4/63 28765
Sally Chang 1654 7/22/54 650000
Billy Black 1683 9/23/44 336500
% awk ‘{print NR, $1, $2, $5}’ emps
1 Tom Jones 543354
2 Mary Adams 28765
3 Sally Chang 650000
4 Billy Black 336500
TOPIC 7 – AWK
13
Example: Colon as Field Separator
% cat em2
Tom Jones:4424:5/12/66:543354
Mary Adams:5346:11/4/63:28765
Sally Chang:1654:7/22/54:650000
Billy Black:1683:9/23/44:336500
% awk -F: ‘/Jones/{print $1, $2}’ em2
Tom Jones 4424
TOPIC 7 – AWK
14
4/9/2021
8
awk Scripts
awk scripts are divided into three major parts:
comment lines start with #
T
O
P
IC
7
–
A
W
K
15
awk Scripts
BEGIN: pre-processing
◦ performs processing that must be completed before the file processing starts (i.e., before awk starts
reading records from the input file)
◦ useful for initialization tasks such as to initialize variables and to create report headings
TOPIC 7 – AWK
16
4/9/2021
9
awk Scripts
BODY: Processing
◦ contains main processing logic to be applied to input records
◦ like a loop that processes input data one record at a time:
◦ if a file contains 100 records, the body will be executed 100 times, one for each record
TOPIC 7 – AWK
17
awk Scripts
END: post-processing
◦ contains logic to be executed after all input data have been processed
◦ logic such as printing report grand total should be performed in this part of the script
TOPIC 7 – AWK
18
4/9/2021
10
Pattern / Action Syntax
T
O
P
IC
7
–
A
W
K
19
Categories of Patterns
T
O
P
IC
7
–
A
W
K
20
4/9/2021
11
Expression Pattern types
match
◦ entire input record
regular expression enclosed by ‘/’s
◦ explicit pattern-matching expressions
~ (match), !~ (not match)
expression operators
◦ arithmetic
◦ relational
◦ logical
TOPIC 7 – AWK
21
Example: match input record
% cat employees2
Tom Jones:4424:5/12/66:543354
Mary Adams:5346:11/4/63:28765
Sally Chang:1654:7/22/54:650000
Billy Black:1683:9/23/44:336500
% awk –F: ‘/00$/’ employees2
Sally Chang:1654:7/22/54:650000
Billy Black:1683:9/23/44:336500
TOPIC 7 – AWK
22
4/9/2021
12
Example: explicit match
% cat datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
% awk ‘$5 ~ /\.[7-9]+/’ datafile
southwest SW Lewis Dalsass 2.7 .8 2 18
central CT Ann Stephens 5.7 .94 5 13
TOPIC 7 – AWK
23
Examples: matching with REs
% awk ‘$2 !~ /E/{print $1, $2}’ datafile
northwest NW
southwest SW
southern SO
north NO
central CT
% awk ‘/^[ns]/{print $1}’ datafile
northwest
southwest
southern
southeast
northeast
north
TOPIC 7 – AWK
24
4/9/2021
13
Arithmetic Operators
Operator Meaning Example
+ Add x + y
– Subtract x – y
* Multiply x * y
/ Divide x / y
% Modulus x % y
^ Exponential x ^ y
Example:
% awk ‘$3 * $4 > 500 {print $0}’ file
TOPIC 7 – AWK
25
Relational Operators
Operator Meaning Example
< Less than x < y < = Less than or equal x < = y == Equal to x == y != Not equal to x != y > Greater than x > y
> = Greater than or equal to x > = y
~ Matched by reg exp x ~ /y/
!~ Not matched by req exp x !~ /y/
TOPIC 7 – AWK
26
4/9/2021
14
Logical Operators
Operator Meaning Example
&& Logical AND a && b
|| Logical OR a || b
! NOT ! a
Examples:
% awk ‘($2 > 5) && ($2 <= 15) {print $0}' file % awk '$3 == 100 || $4 > 50′ file
TOPIC 7 – AWK
27
Range Patterns
Matches ranges of consecutive input lines
Syntax:
pattern1 , pattern2 {action}
pattern can be any simple pattern
pattern1 turns action on
pattern2 turns action off
TOPIC 7 – AWK
28
4/9/2021
15
Range Pattern Example
T
O
P
IC
7
–
A
W
K
29
awk Actions
T
O
P
IC
7
–
A
W
K
30
4/9/2021
16
awk expressions
Expression is evaluated and returns value
consists of any combination of numeric and string constants, variables, operators, functions, and
regular expressions
Can involve variables
As part of expression evaluation
As target of assignment
TOPIC 7 – AWK
31
awk variables
A user can define any number of variables within an awk script
The variables can be numbers, strings, or arrays
Variable names start with a letter, followed by letters, digits, and underscore
Variables come into existence the first time they are referenced; therefore, they do not need to
be declared before use
All variables are initially created as strings and initialized to a null string “”
TOPIC 7 – AWK
32
4/9/2021
17
awk Variables
Format:
variable = expression
Examples:
% awk ‘$1 ~ /Tom/
{wage = $3 * $4; print wage}’
filename
% awk ‘$4 == “CA”
{$4 = “California”; print $0}’
filename
TOPIC 7 – AWK
33
awk assignment operators
= assign result of right-hand-side expression to
left-hand-side variable
++ Add 1 to variable
— Subtract 1 from variable
+= Assign result of addition
-= Assign result of subtraction
*= Assign result of multiplication
/= Assign result of division
%= Assign result of modulo
^= Assign result of exponentiation
TOPIC 7 – AWK
34
4/9/2021
18
Awk example
File: grades
john 85 92 78 94 88
andrea 89 90 75 90 86
jasper 84 88 80 92 84
awk script: average
# average five grades
{ total = $2 + $3 + $4 + $5 + $6
avg = total / 5
print $1, avg }
Run as:
awk –f average grades
TOPIC 7 – AWK
35
Output Statements
print
print easy and simple output
printf
print formatted (similar to C printf)
sprintf
format string (similar to C sprintf)
TOPIC 7 – AWK
36
4/9/2021
19
Function: print
Writes to standard output
Output is terminated by ORS
◦ default ORS is newline
If called with no parameter, it will print $0
Printed parameters are separated by OFS,
◦ default OFS is blank
Print control characters are allowed:
◦ \n \f \a \t \\ …
TOPIC 7 – AWK
37
print example
% awk ‘{print}’ grades
john 85 92 78 94 88
andrea 89 90 75 90 86
% awk ‘{print $0}’ grades
john 85 92 78 94 88
andrea 89 90 75 90 86
% awk ‘{print($0)}’ grades
john 85 92 78 94 88
andrea 89 90 75 90 86
TOPIC 7 – AWK
38
4/9/2021
20
print Example
% awk ‘{print $1, $2}’ grades
john 85
andrea 89
% awk ‘{print $1 “,” $2}’ grades
john,85
andrea,89
TOPIC 7 – AWK
39
print Example
% awk ‘{OFS=”-“;print $1 , $2}’ grades
john-85
andrea-89
% awk ‘{OFS=”-“;print $1 “,” $2}’ grades
john,85
andrea,89
TOPIC 7 – AWK
40
4/9/2021
21
Redirecting print output
Print output goes to standard output
unless redirected via:
> “file”
>> “file”
| “command”
will open file or command only once
subsequent redirections append to already open stream
TOPIC 7 – AWK
41
print Example
% awk ‘{print $1 , $2 > “file”}’ grades
% cat file
john 85
andrea 89
jasper 84
TOPIC 7 – AWK
42
4/9/2021
22
print Example
% awk ‘{print $1,$2 | “sort”}’ grades
andrea 89
jasper 84
john 85
% awk ‘{print $1,$2 | “sort –k 2”}’ grades
jasper 84
john 85
andrea 89
TOPIC 7 – AWK
43
print Example
% date
Wed Nov 19 14:40:07 CST 2008
% date |
awk ‘{print “Month: ” $2 “\nYear: “, $6}’
Month: Nov
Year: 2008
TOPIC 7 – AWK 44
44
4/9/2021
23
printf: Formatting output
Syntax:
printf(format-string, var1, var2, …)
◦ works like C printf
◦ each format specifier in “format-string” requires argument of matching type
TOPIC 7 – AWK
45
Format specifiers
%d, %i decimal integer
%c single character
%s string of characters
%f floating point number
%o octal number
%x hexadecimal number
%e scientific floating point notation
%% the letter “%”
TOPIC 7 – AWK
46
4/9/2021
24
Format specifier examples
TOPIC 7 – AWK
47
Given: x = ‘A’, y = 15, z = 2.3, and $1 = Bob Smith
Printf Format
Specifier What it Does
%c printf(“The character is %c \n”, x)
output: The character is A
%d printf(“The boy is %d years old \n”, y)
output: The boy is 15 years old
%s printf(“My name is %s \n”, $1)
output: My name is Bob Smith
%f printf(“z is %5.3f \n”, z)
output: z is 2.300
Format specifier modifiers
between “%” and letter
%10s
%7d
%10.4f
%-20s
meaning:
◦ width of field, field is printed right justified
◦ precision: number of digits after decimal point
◦ “-” will left justify
TOPIC 7 – AWK
48
4/9/2021
25
sprintf: Formatting text
Syntax:
sprintf(format-string, var1, var2, …)
◦ Works like printf, but does not produce output
◦ Instead it returns formatted string
Example:
{
text = sprintf(“1: %d – 2: %d”, $1, $2)
print text
}
TOPIC 7 – AWK
49
awk builtin functions
tolower(string)
returns a copy of string, with each upper-case character converted to lower-case. Nonalphabetic
characters are left unchanged.
Example: tolower(“MiXeD cAsE 123”)
returns “mixed case 123”
toupper(string)
returns a copy of string, with each lower-case character converted to upper-case.
TOPIC 7 – AWK
50
4/9/2021
26
awk Example: list of products
103:sway bar:49.99
101:propeller:104.99
104:fishing line:0.99
113:premium fish bait:1.00
106:cup holder:2.49
107:cooler:14.89
112:boat cover:120.00
109:transom:199.00
110:pulley:9.88
105:mirror:4.99
108:wheel:49.99
111:lock:31.00
102:trailer hitch:97.95
TOPIC 7 – AWK
51
awk Example: output
Marine Parts R Us
Main catalog
Part-id name price
======================================
101 propeller 104.99
102 trailer hitch 97.95
103 sway bar 49.99
104 fishing line 0.99
105 mirror 4.99
106 cup holder 2.49
107 cooler 14.89
108 wheel 49.99
109 transom 199.00
110 pulley 9.88
111 lock 31.00
112 boat cover 120.00
113 premium fish bait 1.00
======================================
Catalog has 13 parts
TOPIC 7 – AWK
52
4/9/2021
27
awk Example: complete
BEGIN {
FS= “:”
print “Marine Parts R Us”
print “Main catalog”
print “Part-id\tname\t\t\t price”
print “======================================”
}
{
printf(“%3d\t%-20s\t%6.2f\n”, $1, $2, $3)
count++
}
END {
print “======================================”
print “Catalog has ” count ” parts”
}
TOPIC 7 – AWK
53
is output sorted ?
awk Array
awk allows one-dimensional arrays
to store strings or numbers
index can be number or string
array need not be declared
◦ its size
◦ its elements
array elements are created when first used
◦ initialized to 0 or “”
TOPIC 7 – AWK
54
4/9/2021
28
Arrays in awk
Syntax:
arrayName[index] = value
Examples:
list[1] = “one”
list[2] = “three”
list[“other”] = “oh my !”
TOPIC 7 – AWK
55
Illustration: Associative Arrays
awk arrays can use string as index
TOPIC 7 – AWK
56
4/9/2021
29
Awk builtin split function
split(string, array, fieldsep)
◦ divides string into pieces separated by fieldsep, and stores the pieces in array
◦ if the fieldsep is omitted, the value of FS is used.
Example:
split(“auto-da-fe”, a, “-“)
sets the contents of the array a as follows:
a[1] = “auto”
a[2] = “da”
a[3] = “fe”
TOPIC 7 – AWK
57
Example: process sales data
input file:
output:
◦ summary of category sales
TOPIC 7 – AWK
58
4/9/2021
30
Illustration: process each input line
T
O
P
IC
7
–
A
W
K
59
Illustration: process each input line
T
O
P
IC
7
–
A
W
K
60
4/9/2021
31
Summary: awk program
TOPIC 7 – AWK
61
Example: complete program
% cat sales.awk
{
deptSales[$2] += $3
}
END {
for (x in deptSales)
print x, deptSales[x]
}
% awk –f sales.awk sales
TOPIC 7 – AWK
62
4/9/2021
32
Delete Array Entry
The delete function can be used to delete an element from an array.
Format:
delete array_name [index]
Example:
delete deptSales[“supplies”]
TOPIC 7 – AWK
63
Awk control structures
Conditional
◦ if-else
Repetition
◦ for
◦ with counter
◦ with array index
◦ while
◦ do-while
◦ also: break, continue
TOPIC 7 – AWK
64
4/9/2021
33
if Statement
Syntax:
if (conditional expression)
statement-1
else
statement-2
Example:
if ( NR < 3 )
print $2
else
print $3
TOPIC 7 - AWK
65
for Loop
Syntax:
for (initialization; limit-test; update)
statement
Example:
for (i = 1; i <= NR; i++)
{
total += $i
count++
}
TOPIC 7 - AWK
66
4/9/2021
34
for Loop for arrays
Syntax:
for (var in array)
statement
Example:
for (x in deptSales)
{
print x, deptSales[x]
}
TOPIC 7 - AWK
67
while Loop
Syntax:
while (logical expression)
statement
Example:
i = 1
while (i <= NF)
{
print i, $i
i++
}
TOPIC 7 - AWK
68
4/9/2021
35
do-while Loop
Syntax:
do
statement
while (condition)
statement is executed at least once, even if condition is false at the beginning
Example:
i = 1
do {
print $0
i++
} while (i <= 10)
TOPIC 7 - AWK
69
loop control statements
break
exits loop
continue
skips rest of current iteration, continues with next iteration
TOPIC 7 - AWK
70
4/9/2021
36
loop control example
for (x = 0; x < 20; x++) {
if ( array[x] > 100) continue
printf “%d “, x
if ( array[x] < 0 ) break } TOPIC 7 - AWK 71 Example: sensor data 1 Temperature 2 Rainfall 3 Snowfall 4 Windspeed 5 Winddirection also: sensor readings Plan: print average readings in descending order TOPIC 7 - AWK 72 4/9/2021 37 Example: sensor readings 2008-10-01/1/68 2008-10-02/2/6 2007-10-03/3/4 2008-10-04/4/25 2008-10-05/5/120 2008-10-01/1/89 2007-10-01/4/35 2008-11-01/5/360 2008-10-01/1/45 2007-12-01/1/61 2008-10-10/1/32 TOPIC 7 - AWK 73 Example: print sensor data BEGIN { printf("id\tSensor\n") printf("----------------------\n") } { printf("%d\t%s\n", $1, $2) } TOPIC 7 - AWK 74 4/9/2021 38 Example: print sensor readings BEGIN { FS="/" printf(" Date\t\tValue\n“ printf("---------------------\n") } { printf("%s %7.2f\n", $1, $3) } TOPIC 7 - AWK 75 Example: print sensor summary BEGIN { FS="/" } { sum[$2] += $3; count[$2]++; } END { for (i in sum) { printf("%d %7.2f\n",i,sum[i]/count[i]) } } TOPIC 7 - AWK 76 4/9/2021 39 Example: Remaining tasks awk –f sense.awk sensors readings Sensor Average ----------------------- Winddirection 240.00 Temperature 59.00 Windspeed 30.00 Rainfall 6.00 Snowfall 4.00 TOPIC 7 - AWK 77 sorted sensor names 2 input files Example: print sensor averages Remaining tasks: ◦ recognize nature of input data use: number of fields in record ◦ substitute sensor id with sensor name use: associative array ◦ sort readings use: sort –gr –k 2 TOPIC 7 - AWK 78 4/9/2021 40 Example: sense.awk NF > 1 {
name[$1] = $2
}
NF < 2 { split($0,fields,"/") sum[fields[2]] += fields[3]; count[fields[2]]++; } END { for (i in sum) { printf("%15s %7.2f\n", name[i], sum[i]/count[i]) | "sort -gr -k 2" } } TOPIC 7 - AWK 79 Example: print sensor averages Remaining tasks: ◦ Sort use: sort -gr ◦ Substitute sensor id with sensor name 1. use: join -j 1 sensor-data sensor-averages 2. within awk TOPIC 7 - AWK 80 80 4/9/2021 41 Example: solution 1 (1/3) #! /bin/bash trap '/bin/rm /tmp/report-*-$$; exit' 1 2 3 cat << HERE > /tmp/report-awk-1-$$
BEGIN {FS=”/”}
{
sum[\$2] += \$3;
count[\$2]++;
}
END {
for (i in sum) {
printf(“%d %7.2f\n”, i, sum[i]/count[i])
}
}
HERE
TOPIC 7 – AWK 81
81
Example: solution 1 (2/3)
cat << HERE > /tmp/report-awk-2-$$
BEGIN {
printf(” Sensor Average\n”)
printf(“———————–\n”)
}
{
printf(“%15s %7.2f\n”, \$2, \$3)
}
HERE
TOPIC 7 – AWK 82
82
4/9/2021
42
Example: solution 1 (3/3)
awk -f /tmp/report-awk-1-$$
sensor-readings |
sort > /tmp/report-r-$$
join –j 1 sensor-data /tmp/report-r-$$ > /tmp/report-t-$$
sort -gr -k 3 /tmp/report-t-$$ |
awk -f /tmp/report-awk-2-$$
/bin/rm /tmp/report-*-$$
TOPIC 7 – AWK 83
83
Example: output
Sensor Average
———————–
Winddirection 240.00
Temperature 59.00
Windspeed 30.00
Rainfall 6.00
Snowfall 4.00
TOPIC 7 – AWK 84
84
4/9/2021
43
Example: solution 2 (1/2)
#! /bin/bash
trap ‘/bin/rm /tmp/report-*$$; exit’ 1 2 3
cat << HERE > /tmp/report-awk-3-$$
NF > 1 {
name[\$1] = \$2
}
NF < 2 { split(\$0,fields,"/") sum[fields[2]] += fields[3]; count[fields[2]]++; } TOPIC 7 - AWK 85 85 Example: solution 2 (2/2) END { for (i in sum) { printf("%15s %7.2f\n", name[i], sum[i]/count[i]) } } HERE echo " Sensor Average" echo "-----------------------" awk -f /tmp/report-awk-3-$$ sensor-data sensor-readings | sort -gr -k 2 /bin/rm /tmp/report-*$$ TOPIC 7 - AWK 86 86