CS计算机代考程序代写 Topic7

Topic7
AWK
4/9/2021
1

What is awk?
created by: Aho, Weinberger, and Kernighan
scripting language used for manipulating data and generating reports
versions of awk
◦ awk, nawk, mawk, pgawk, … ◦ GNU awk: gawk
4/9/2021
What can you do with awk?
awk operation:
◦ scans a file line by line
◦ splits each input line into fields
◦ compares input line/fields to pattern ◦ performs action(s) on matched lines
Useful for:
◦ transforming data files
◦ producing formatted reports
Programming constructs:
◦ format output lines
◦ arithmetic and string operations ◦ conditionals and loops
TOPIC 7 – AWK
3
4
TOPIC 7 – AWK
2

The Command: awk
4/9/2021
Basic awk Syntax
awk [options] ‘script’ file(s)
awk [options] –f scriptfile file(s)
Options:
-F to change input field separator -f to name script file
TOPIC 7 – AWK
5
6
TOPIC 7 – AWK
3

Basic awk Program
consists of patterns & actions:
pattern {action}
◦ if pattern is missing, action is applied to all lines ◦ if action is missing, the matched line is printed ◦ must have either pattern or action
Example:
awk ‘/for/’ testfile
◦ prints all lines containing string “for” in testfile
4/9/2021
Basic Terminology: input file
A field is a unit of data in a line
Each field is separated from the other fields by the field separator
◦ default field separator is whitespace
A record is the collection of fields in a line A data file is made up of records
TOPIC 7 – AWK
7
8
TOPIC 7 – AWK
4

Example Input File
4/9/2021
9
Buffers
awk supports two types of buffers: record and field
field buffer:
◦ one for each fields in the current record. ◦ names: $1, $2, …
record buffer :
◦ $0 holds the entire record
10
TOPIC 7 – AWK
5
TOPIC 7 – AWK

Some System Variables
FS Field separator (default=whitespace) RS Record separator (default=\n)
NF NR
OFS ORS
FILENAME
Number of fields in current record Number of the current record
Output field separator (default=space) Output record separator (default=\n)
Current filename 11
Example: Records and Fields
1 Tom Jones 4424
2 Mary Adams 5346
3 Sally Chang 1654
4 Billy Black 1683
5/12/66 543354
11/4/63 28765
7/22/54 650000
9/23/44 336500
TOPIC 7 – AWK
12
4/9/2021
% cat emps
Tom Jones
Mary Adams
Sally Chang
Billy Black
4424 5/12/66 543354
5346 11/4/63 28765
1654 7/22/54 650000
1683 9/23/44 336500
% awk ‘{print NR, $0}’ emps
TOPIC 7 – AWK
6

Example: Space as Field Separator
% cat emps
Tom Jones
Mary Adams
Sally Chang
Billy Black
4424 5/12/66 543354
5346 11/4/63 28765
1654 7/22/54 650000
1683 9/23/44 336500
% awk ‘{print NR, $1, $2, $5}’ emps
1 Tom Jones 543354
2 Mary Adams 28765
3 Sally Chang 650000
4 Billy Black 336500
Example: Colon as Field Separator
% cat em2
Tom Jones:4424:5/12/66:543354
Mary Adams:5346:11/4/63:28765
Sally Chang:1654:7/22/54:650000
Billy Black:1683:9/23/44:336500
% awk -F: ‘/Jones/{print $1, $2}’ em2
Tom Jones 4424
13
4/9/2021
TOPIC 7 – AWK
14
TOPIC 7 – AWK
7

awk Scripts
 awk scripts are divided into three major parts:
4/9/2021
 comment lines start with #
awk Scripts
BEGIN: pre-processing
◦ performs processing that must be completed before the file processing starts (i.e., before awk starts reading records from the input file)
◦ useful for initialization tasks such as to initialize variables and to create report headings
15
16
TOPIC 7 – AWK
8
TOPIC 7 – AWK

awk Scripts
BODY: Processing
◦ contains main processing logic to be applied to input records ◦ like a loop that processes input data one record at a time:
◦ if a file contains 100 records, the body will be executed 100 times, one for each record
4/9/2021
awk Scripts
END: post-processing
◦ contains logic to be executed after all input data have been processed
◦ logic such as printing report grand total should be performed in this part of the script
TOPIC 7 – AWK
17
18
TOPIC 7 – AWK
9

Pattern / Action Syntax
4/9/2021
Categories of Patterns
19
20
10
TOPIC 7 – AWK TOPIC 7 – AWK

Expression Pattern types
match
◦ entire input record
regular expression enclosed by ‘/’s
◦ explicit pattern-matching expressions
~ (match), !~ (not match)
expression operators ◦ arithmetic
◦ relational
◦ logical
4/9/2021
Example: match input record
% cat employees2
Tom Jones:4424:5/12/66:543354
Mary Adams:5346:11/4/63:28765
Sally Chang:1654:7/22/54:650000
Billy Black:1683:9/23/44:336500
% awk –F: ‘/00$/’ employees2
Sally Chang:1654:7/22/54:650000
Billy Black:1683:9/23/44:336500
TOPIC 7 – AWK
21
22
TOPIC 7 – AWK
11

Example: explicit match
% cat datafile northwest NW western WE southwest SW southern SO southeast SE eastern EA northeast NE north NO central CT
Charles Main Sharon Gray Lewis Dalsass Suan Chin Patricia Hemenway TB Savage
AM Main MargotWeber Ann Stephens
3.0 .98 5.3 .97 2.7 .8 5.1 .95 4.0 .7 4.4 .84 5.1 .94 4.5 .89 5.7 .94
2.7 .8 5.7 .94
3 34 5 23 2 18 4 15 4 17 5 20 3 13 5 9 5 13
2 18 5 13
% awk ‘$5
southwest SW Lewis Dalsass central CT Ann Stephens
~ /\.[7-9]+/’ datafile
Examples: matching with REs
% awk ‘$2 !~ /E/{print $1, $2}’ datafile northwest NW
southwest SW
southern SO
north NO central CT
% awk ‘/^[ns]/{print $1}’ datafile northwest
southwest
southern
southeast northeast north
23
4/9/2021
TOPIC 7 – AWK
24
TOPIC 7 – AWK
12

Arithmetic Operators
x / y % Modulus x%y
^ Exponential x ^ y Example:
% awk ‘$3 * $4 > 500 {print $0}’ file
4/9/2021
Operator Meaning + Add
– Subtract * Multiply / Divide
Example x + y
x – y
x * y
Relational Operators
Operator < < = == != > >= ~ !~
Meaning
Less than
Less than or equal
Equal to
Not equal to
Greater than Greaterthanorequalto x>=y Matched by reg exp
Not matched by req exp x !~ /y/
TOPIC 7 – AWK
Example
x < y 25 x < = y x == y x != y x > y
x ~ /y/
26
TOPIC 7 – AWK
13

Logical Operators
Operator
&& Logical AND
|| Logical OR
! NOT ! a
Examples:
% awk ‘($2 > 5) && ($2 <= 15) {print $0}' file % awk '$3 == 100 || $4 > 50′ file
4/9/2021
Meaning
Example
a && b
a || b
Range Patterns
 Matches ranges of consecutive input lines Syntax:
pattern1 , pattern2 {action}
 pattern can be any simple pattern  pattern1 turns action on
 pattern2 turns action off
27
TOPIC 7 – AWK
28
TOPIC 7 – AWK
14

Range Pattern Example
4/9/2021
awk Actions
29
30
15
TOPIC 7 – AWK TOPIC 7 – AWK

awk expressions
 Expression is evaluated and returns value
 consists of any combination of numeric and string constants, variables, operators, functions, and
regular expressions
 Can involve variables
 As part of expression evaluation  As target of assignment
awk variables
A user can define any number of variables within an awk script
The variables can be numbers, strings, or arrays
Variable names start with a letter, followed by letters, digits, and underscore
All variables are initially created as strings and initialized to a null string “”
TOPIC 7 – AWK
31
32
4/9/2021
Variables come into existence the first time they are referenced; therefore, they do not need to be declared before use
TOPIC 7 – AWK
16

awk Variables
Format:
variable = expression
Examples:
% awk ‘$1 ~ /Tom/
{wage = $3 * $4; print wage}’
filename
% awk ‘$4 == “CA”
{$4 = “California”; print $0}’ filename
4/9/2021
awk assignment operators
= assign result of right-hand-side expression to left-hand-side variable
++ Add 1 to variable
— Subtract 1 from variable
+= Assign result of addition
-= Assign result of subtraction
*= Assign result of multiplication /= Assign result of division
%= Assign result of modulo
^= Assign result of exponentiation
33
TOPIC 7 – AWK
34
TOPIC 7 – AWK
17

Awk example
 File: grades
john 85 92 78 94 88 andrea 89 90 75 90 86 jasper 84 88 80 92 84
 awk script: average
# average five grades
{ total = $2 + $3 + $4 + $5 + $6
avg = total / 5
print $1, avg }
 Run as:
awk –f average grades
4/9/2021
Output Statements
print
print easy and simple output
printf
print formatted (similar to C printf)
sprintf
format string (similar to C sprintf)
TOPIC 7 – AWK
35
36
TOPIC 7 – AWK
18

Function: print
Writes to standard output Output is terminated by ORS
◦ default ORS is newline
If called with no parameter, it will print $0
Printed parameters are separated by OFS, ◦ default OFS is blank
Print control characters are allowed: ◦ \n\f\a\t\\…
4/9/2021
print example
% awk ‘{print}’ grades
john 85 92 78 94 88
andrea 89 90 75 90 86
% awk ‘{print $0}’ grades
john 85 92 78 94 88
andrea 89 90 75 90 86
% awk ‘{print($0)}’ grades
john 85 92 78 94 88
andrea 89 90 75 90 86
TOPIC 7 – AWK
37
38
TOPIC 7 – AWK
19

print Example
% awk ‘{print $1, $2}’ grades
john 85
andrea 89
% awk ‘{print $1 “,” $2}’ grades
john,85
andrea,89
4/9/2021
print Example
% awk ‘{OFS=”-“;print $1 , $2}’ grades
john-85
andrea-89
% awk ‘{OFS=”-“;print $1 “,” $2}’ grades
john,85
andrea,89
TOPIC 7 – AWK
39
40
TOPIC 7 – AWK
20

Redirecting print output
Print output goes to standard output
unless redirected via: > “file”
>> “file”
| “command”
will open file or command only once
subsequent redirections append to already open stream
4/9/2021
print Example
% awk ‘{print $1 , $2 > “file”}’ grades
% cat file
john 85
andrea 89
jasper 84
TOPIC 7 – AWK
41
42
TOPIC 7 – AWK
21

print Example
% awk ‘{print $1,$2 | “sort”}’ grades andrea 89
jasper 84
john 85
% awk ‘{print $1,$2 | “sort –k 2”}’ grades
jasper 84
john 85
andrea 89
4/9/2021
print Example
% date
Wed Nov 19 14:40:07 CST 2008
% date |
awk ‘{print “Month: ” $2 “\nYear: “, $6}’
Month: Nov
Year: 2008
TOPIC 7 – AWK
44
43
TOPIC 7 – AWK 44
22

printf: Formatting output
Syntax:
printf(format-string, var1, var2, …)
◦ works like C printf
◦ each format specifier in “format-string” requires argument of matching type
4/9/2021
Format specifiers
%d, %i %c
%s
%f
%o %x %e %%
decimal integer
single character
string of characters
floating point number
octal number
hexadecimal number
scientific floating point notation the letter “%”
TOPIC 7 – AWK
45
46
TOPIC 7 – AWK
23

Format specifier examples
4/9/2021
Given: x = ‘A’, y = 15, z = 2.3, and $1 = Bob Smith
Printf Format Specifier
%c
%d
%s
%f
What it Does
printf(“The character is %c \n”, x) output: The character is A
printf(“The boy is %d years old \n”, y) output: The boy is 15 years old
printf(“My name is %s \n”, $1) output: My name is Bob Smith
printf(“z is %5.3f \n”, z) output: z is 2.300
TOPIC 7 – AWK
Format specifier modifiers
between “%” and letter %10s
%7d %10.4f %-20s
meaning:
◦ width of field, field is printed right justified
◦ precision: number of digits after decimal point ◦ “-” will left justify
47
48
TOPIC 7 – AWK
24

sprintf: Formatting text
Syntax:
sprintf(format-string, var1, var2, …)
◦ Works like printf, but does not produce output ◦ Instead it returns formatted string
Example:
{
text = sprintf(“1: %d – 2: %d”, $1, $2) print text
}
4/9/2021
awk builtin functions
tolower(string)
Example: tolower(“MiXeD cAsE 123”) returns “mixed case 123”
toupper(string)
returns a copy of string, with each lower-case character converted to upper-case.
TOPIC 7 – AWK
49
returns a copy of string, with each upper-case character converted to lower-case. Nonalphabetic characters are left unchanged.
50
TOPIC 7 – AWK
25

awk Example: list of products
103:sway bar:49.99 101:propeller:104.99 104:fishing line:0.99 113:premium fish bait:1.00 106:cup holder:2.49 107:cooler:14.89
112:boat cover:120.00 109:transom:199.00 110:pulley:9.88 105:mirror:4.99 108:wheel:49.99 111:lock:31.00 102:trailer hitch:97.95
awk Example: output
Marine Parts R Us
Main catalog
Part-id name price ====================================== 101 propeller 104.99
Catalog has 13 parts
TOPIC 7 – AWK
51
52
4/9/2021
102 trailer hitch
103 sway bar
104 fishing line
105 mirror 4.99
106 cup holder 2.49
107 cooler 14.89
108 wheel 49.99
109 transom 199.00
110 pulley 9.88
111 lock 31.00
112 boat cover 120.00
113 premium fish bait 1.00
======================================
97.95 49.99 0.99
TOPIC 7 – AWK
26

awk Example: complete
BEGIN {
FS= “:”
} {
}
END {
}
print “Marine Parts R Us”
print “Main catalog”
print “Part-id\tname\t\t\t price”
print “======================================”
printf(“%3d\t%-20s\t%6.2f\n”, $1, $2, $3) count++
print “======================================” print “Catalog has ” count ” parts”
is output sorted ?
awk Array
awk allows one-dimensional arrays to store strings or numbers
index can be number or string
array need not be declared ◦ its size
◦ its elements
array elements are created when first used ◦ initialized to 0 or “”
TOPIC 7 – AWK
53
54
4/9/2021
TOPIC 7 – AWK
27

Arrays in awk
Syntax:
arrayName[index] = value
Examples:
list[1] = “one”
list[2] = “three”
list[“other”] = “oh my !”
4/9/2021
Illustration: Associative Arrays
awk arrays can use string as index
TOPIC 7 – AWK
55
56
TOPIC 7 – AWK
28

Awk builtin split function
split(string, array, fieldsep)
◦ divides string into pieces separated by fieldsep, and stores the pieces in array ◦ if the fieldsep is omitted, the value of FS is used.
Example:
split(“auto-da-fe”, a, “-“)
sets the contents of the array a as follows:
a[1] = “auto”
a[2] = “da”
a[3] = “fe”
4/9/2021
Example: process sales data
input file:
output:
◦ summary of category sales
TOPIC 7 – AWK
57
58
TOPIC 7 – AWK
29

Illustration: process each input line
4/9/2021
Illustration: process each input line
59
60
30
TOPIC 7 – AWK TOPIC 7 – AWK

Summary: awk program
4/9/2021
Example: complete program
% cat sales.awk
{
}
% awk –f sales.awk sales
TOPIC 7 – AWK
61
}
END {
deptSales[$2] += $3
for (x in deptSales)
print x, deptSales[x]
62
TOPIC 7 – AWK
31

Delete Array Entry
The delete function can be used to delete an element from an array. Format:
delete array_name [index]
Example:
delete deptSales[“supplies”]
4/9/2021
Awk control structures
Conditional ◦ if-else
Repetition
◦ for
◦ with counter
◦ with array index
◦ while
◦ do-while
◦ also: break, continue
63
TOPIC 7 – AWK
64
TOPIC 7 – AWK
32

if Statement
Syntax:
if (conditional expression)
statement-1
else
statement-2
Example:
if ( NR < 3 ) print $2 else print $3 4/9/2021 for Loop Syntax: for (initialization; limit-test; update) statement Example: for (i = 1; i <= NR; i++) { total += $i count++ } TOPIC 7 - AWK 65 66 TOPIC 7 - AWK 33 for Loop for arrays Syntax: for (var in array) statement Example: for (x in deptSales) { print x, deptSales[x] } 4/9/2021 while Loop Syntax: while (logical expression) statement Example: i= 1 while (i <= NF) { print i, $i i++ } TOPIC 7 - AWK 67 68 TOPIC 7 - AWK 34 do-while Loop Syntax: do statement while (condition)  statement is executed at least once, even if condition is false at the beginning Example: i=1 do { print $0 i++ } while (i <= 10) 4/9/2021 loop control statements break exits loop continue skips rest of current iteration, continues with next iteration 69 TOPIC 7 - AWK 70 TOPIC 7 - AWK 35 loop control example for (x = 0; x < 20; x++) { if ( array[x] > 100) continue
printf “%d “, x
if ( array[x] < 0 ) break } 4/9/2021 Example: sensor data 1 Temperature 2 Rainfall 3 Snowfall 4 Windspeed 5 Winddirection also: sensor readings Plan: print average readings in descending order TOPIC 7 - AWK 71 72 TOPIC 7 - AWK 36 Example: sensor readings 2008-10-01/1/68 2008-10-02/2/6 2007-10-03/3/4 2008-10-04/4/25 2008-10-05/5/120 2008-10-01/1/89 2007-10-01/4/35 2008-11-01/5/360 2008-10-01/1/45 2007-12-01/1/61 2008-10-10/1/32 4/9/2021 Example: print sensor data BEGIN { printf("id\tSensor\n") printf("----------------------\n") } { printf("%d\t%s\n", $1, $2) } TOPIC 7 - AWK 73 74 TOPIC 7 - AWK 37 Example: print sensor readings BEGIN { FS="/" printf(" Date\t\tValue\n“ printf("---------------------\n") } { printf("%s %7.2f\n", $1, $3) } Example: print sensor summary BEGIN { FS="/" } { sum[$2] += $3; count[$2]++; } END { for (i in sum) { printf("%d %7.2f\n",i,sum[i]/count[i]) } } 76 TOPIC 7 - AWK 75 4/9/2021 TOPIC 7 - AWK 38 Example: Remaining tasks awk –f sense.awk sensors readings 4/9/2021 Sensor Average ----------------------- Winddirection 240.00 Temperature 59.00 Windspeed 30.00 Rainfall 6.00 Snowfall 4.00 2 input files sorted sensor names 77 Example: print sensor averages Remaining tasks: ◦ recognize nature of input data use: number of fields in record ◦ substitute sensor id with sensor name use: associative array ◦ sort readings use: sort –gr –k 2 TOPIC 7 - AWK 78 TOPIC 7 - AWK 39 Example: sense.awk NF > 1 { name[$1] = $2
}
NF < 2 { split($0,fields,"/") sum[fields[2]] += fields[3]; count[fields[2]]++; } END { for (i in sum) { printf("%15s %7.2f\n", name[i], } } sum[i]/count[i]) | "sort -gr -k 2" 4/9/2021 Example: print sensor averages Remaining tasks: ◦ Sort use: sort -gr ◦ Substitute sensor id with sensor name 1. use: join -j 1 sensor-data sensor-averages 2. within awk TOPIC 7 - AWK 80 79 TOPIC 7 - AWK 80 40 Example: solution 1 (1/3) #! /bin/bash trap '/bin/rm /tmp/report-*-$$; exit' 1 2 3 cat << HERE > /tmp/report-awk-1-$$
BEGIN {FS=”/”}
{
sum[\$2] += \$3;
count[\$2]++; }
END {
for (i in sum) {
} }
HERE
printf(“%d %7.2f\n”, i, sum[i]/count[i])
Example: solution 1 (2/3)
cat << HERE > /tmp/report-awk-2-$$
BEGIN {
printf(” Sensor Average\n”)
printf(“———————–\n”) }
{
printf(“%15s %7.2f\n”, \$2, \$3)
} HERE
81
4/9/2021
TOPIC 7 – AWK 81
82
TOPIC 7 – AWK 82
41

Example: solution 1 (3/3)
awk -f /tmp/report-awk-1-$$
sensor-readings |
sort > /tmp/report-r-$$
join –j 1 sensor-data /tmp/report-r-$$ > /tmp/report-t-$$
sort -gr -k 3 /tmp/report-t-$$ |
awk -f /tmp/report-awk-2-$$
/bin/rm /tmp/report-*-$$
4/9/2021
Example: output
Sensor Average
———————–
Winddirection 240.00
Temperature 59.00
Windspeed 30.00
Rainfall 6.00
Snowfall 4.00
TOPIC 7 – AWK 83
83
84
TOPIC 7 – AWK 84
42

Example: solution 2 (1/2)
#! /bin/bash
trap ‘/bin/rm /tmp/report-*$$; exit’ 1 2 3
cat << HERE > /tmp/report-awk-3-$$
NF > 1 {
name[\$1] = \$2
}
NF < 2 { split(\$0,fields,"/") sum[fields[2]] += fields[3]; count[fields[2]]++; } Example: solution 2 (2/2) END { for (i in sum) { printf("%15s %7.2f\n", name[i], sum[i]/count[i]) HERE echo " Sensor Average" echo "-----------------------" awk -f /tmp/report-awk-3-$$ sensor-data sensor-readings | sort -gr -k 2 /bin/rm /tmp/report-*$$ 86 TOPIC 7 - AWK 85 85 4/9/2021 } } TOPIC 7 - AWK 86 43