CS计算机代考程序代写 PowerPoint Presentation

PowerPoint Presentation

4/9/2021

1

Topic 7
AWK

4/9/2021

2

What is awk?
created by: Aho, Weinberger, and Kernighan

scripting language used for manipulating data and generating reports

versions of awk
◦ awk, nawk, mawk, pgawk, …

◦ GNU awk: gawk

TOPIC 7 – AWK

3

What can you do with awk?
awk operation:

◦ scans a file line by line

◦ splits each input line into fields

◦ compares input line/fields to pattern

◦ performs action(s) on matched lines

Useful for:
◦ transforming data files

◦ producing formatted reports

Programming constructs:
◦ format output lines

◦ arithmetic and string operations

◦ conditionals and loops

TOPIC 7 – AWK

4

4/9/2021

3

The Command: awk

TOPIC 7 – AWK

5

Basic awk Syntax
awk [options] ‘script’ file(s)

awk [options] –f scriptfile file(s)

Options:

-F to change input field separator

-f to name script file

TOPIC 7 – AWK

6

4/9/2021

4

Basic awk Program
consists of patterns & actions:

pattern {action}

◦ if pattern is missing, action is applied to all lines

◦ if action is missing, the matched line is printed

◦ must have either pattern or action

Example:
awk ‘/for/’ testfile

◦ prints all lines containing string “for” in testfile

TOPIC 7 – AWK

7

Basic Terminology: input file
A field is a unit of data in a line

Each field is separated from the other fields by the field separator
◦ default field separator is whitespace

A record is the collection of fields in a line

A data file is made up of records

TOPIC 7 – AWK

8

4/9/2021

5

Example Input File
T

O
P

IC
7


A

W
K

9

Buffers

awk supports two types of buffers:

record and field

field buffer:
◦ one for each fields in the current record.

◦ names: $1, $2, …

record buffer :
◦ $0 holds the entire record

TOPIC 7 – AWK

10

4/9/2021

6

Some System Variables
FS Field separator (default=whitespace)

RS Record separator (default=\n)

NF Number of fields in current record

NR Number of the current record

OFS Output field separator (default=space)

ORS Output record separator (default=\n)

FILENAME Current filename

TOPIC 7 – AWK

11

Example: Records and Fields
% cat emps

Tom Jones 4424 5/12/66 543354

Mary Adams 5346 11/4/63 28765

Sally Chang 1654 7/22/54 650000

Billy Black 1683 9/23/44 336500

% awk ‘{print NR, $0}’ emps

1 Tom Jones 4424 5/12/66 543354

2 Mary Adams 5346 11/4/63 28765

3 Sally Chang 1654 7/22/54 650000

4 Billy Black 1683 9/23/44 336500

TOPIC 7 – AWK

12

4/9/2021

7

Example: Space as Field Separator
% cat emps

Tom Jones 4424 5/12/66 543354

Mary Adams 5346 11/4/63 28765

Sally Chang 1654 7/22/54 650000

Billy Black 1683 9/23/44 336500

% awk ‘{print NR, $1, $2, $5}’ emps

1 Tom Jones 543354

2 Mary Adams 28765

3 Sally Chang 650000

4 Billy Black 336500

TOPIC 7 – AWK

13

Example: Colon as Field Separator
% cat em2

Tom Jones:4424:5/12/66:543354

Mary Adams:5346:11/4/63:28765

Sally Chang:1654:7/22/54:650000

Billy Black:1683:9/23/44:336500

% awk -F: ‘/Jones/{print $1, $2}’ em2

Tom Jones 4424

TOPIC 7 – AWK

14

4/9/2021

8

awk Scripts
 awk scripts are divided into three major parts:

 comment lines start with #

T
O

P
IC

7

A
W

K

15

awk Scripts
BEGIN: pre-processing

◦ performs processing that must be completed before the file processing starts (i.e., before awk starts
reading records from the input file)

◦ useful for initialization tasks such as to initialize variables and to create report headings

TOPIC 7 – AWK

16

4/9/2021

9

awk Scripts
BODY: Processing

◦ contains main processing logic to be applied to input records

◦ like a loop that processes input data one record at a time:
◦ if a file contains 100 records, the body will be executed 100 times, one for each record

TOPIC 7 – AWK

17

awk Scripts
END: post-processing

◦ contains logic to be executed after all input data have been processed

◦ logic such as printing report grand total should be performed in this part of the script

TOPIC 7 – AWK

18

4/9/2021

10

Pattern / Action Syntax
T

O
P

IC
7


A

W
K

19

Categories of Patterns

T
O

P
IC

7

A
W

K

20

4/9/2021

11

Expression Pattern types
match

◦ entire input record

regular expression enclosed by ‘/’s

◦ explicit pattern-matching expressions

~ (match), !~ (not match)

expression operators
◦ arithmetic

◦ relational

◦ logical

TOPIC 7 – AWK

21

Example: match input record
% cat employees2

Tom Jones:4424:5/12/66:543354

Mary Adams:5346:11/4/63:28765

Sally Chang:1654:7/22/54:650000

Billy Black:1683:9/23/44:336500

% awk –F: ‘/00$/’ employees2

Sally Chang:1654:7/22/54:650000

Billy Black:1683:9/23/44:336500

TOPIC 7 – AWK

22

4/9/2021

12

Example: explicit match
% cat datafile

northwest NW Charles Main 3.0 .98 3 34

western WE Sharon Gray 5.3 .97 5 23

southwest SW Lewis Dalsass 2.7 .8 2 18

southern SO Suan Chin 5.1 .95 4 15

southeast SE Patricia Hemenway 4.0 .7 4 17

eastern EA TB Savage 4.4 .84 5 20

northeast NE AM Main 5.1 .94 3 13

north NO Margot Weber 4.5 .89 5 9

central CT Ann Stephens 5.7 .94 5 13

% awk ‘$5 ~ /\.[7-9]+/’ datafile

southwest SW Lewis Dalsass 2.7 .8 2 18

central CT Ann Stephens 5.7 .94 5 13

TOPIC 7 – AWK

23

Examples: matching with REs
% awk ‘$2 !~ /E/{print $1, $2}’ datafile

northwest NW

southwest SW

southern SO

north NO

central CT

% awk ‘/^[ns]/{print $1}’ datafile

northwest

southwest

southern

southeast

northeast

north

TOPIC 7 – AWK

24

4/9/2021

13

Arithmetic Operators
Operator Meaning Example

+ Add x + y

– Subtract x – y

* Multiply x * y

/ Divide x / y

% Modulus x % y

^ Exponential x ^ y

Example:

% awk ‘$3 * $4 > 500 {print $0}’ file

TOPIC 7 – AWK

25

Relational Operators
Operator Meaning Example

< Less than x < y < = Less than or equal x < = y == Equal to x == y != Not equal to x != y > Greater than x > y

> = Greater than or equal to x > = y

~ Matched by reg exp x ~ /y/

!~ Not matched by req exp x !~ /y/

TOPIC 7 – AWK

26

4/9/2021

14

Logical Operators
Operator Meaning Example

&& Logical AND a && b

|| Logical OR a || b

! NOT ! a

Examples:

% awk ‘($2 > 5) && ($2 <= 15) {print $0}' file % awk '$3 == 100 || $4 > 50′ file

TOPIC 7 – AWK

27

Range Patterns
 Matches ranges of consecutive input lines

Syntax:

pattern1 , pattern2 {action}

 pattern can be any simple pattern

 pattern1 turns action on

 pattern2 turns action off

TOPIC 7 – AWK

28

4/9/2021

15

Range Pattern Example
T

O
P

IC
7


A

W
K

29

awk Actions

T
O

P
IC

7

A
W

K

30

4/9/2021

16

awk expressions
 Expression is evaluated and returns value

 consists of any combination of numeric and string constants, variables, operators, functions, and
regular expressions

 Can involve variables
 As part of expression evaluation

 As target of assignment

TOPIC 7 – AWK

31

awk variables
A user can define any number of variables within an awk script

The variables can be numbers, strings, or arrays

Variable names start with a letter, followed by letters, digits, and underscore

Variables come into existence the first time they are referenced; therefore, they do not need to
be declared before use

All variables are initially created as strings and initialized to a null string “”

TOPIC 7 – AWK

32

4/9/2021

17

awk Variables
Format:

variable = expression

Examples:

% awk ‘$1 ~ /Tom/

{wage = $3 * $4; print wage}’

filename

% awk ‘$4 == “CA”

{$4 = “California”; print $0}’

filename

TOPIC 7 – AWK

33

awk assignment operators
= assign result of right-hand-side expression to

left-hand-side variable

++ Add 1 to variable

— Subtract 1 from variable

+= Assign result of addition

-= Assign result of subtraction

*= Assign result of multiplication

/= Assign result of division

%= Assign result of modulo

^= Assign result of exponentiation

TOPIC 7 – AWK

34

4/9/2021

18

Awk example
 File: grades

john 85 92 78 94 88

andrea 89 90 75 90 86

jasper 84 88 80 92 84

 awk script: average

# average five grades

{ total = $2 + $3 + $4 + $5 + $6

avg = total / 5

print $1, avg }

 Run as:

awk –f average grades

TOPIC 7 – AWK

35

Output Statements
print

print easy and simple output

printf

print formatted (similar to C printf)

sprintf

format string (similar to C sprintf)

TOPIC 7 – AWK

36

4/9/2021

19

Function: print
Writes to standard output

Output is terminated by ORS
◦ default ORS is newline

If called with no parameter, it will print $0

Printed parameters are separated by OFS,
◦ default OFS is blank

Print control characters are allowed:
◦ \n \f \a \t \\ …

TOPIC 7 – AWK

37

print example
% awk ‘{print}’ grades

john 85 92 78 94 88

andrea 89 90 75 90 86

% awk ‘{print $0}’ grades

john 85 92 78 94 88

andrea 89 90 75 90 86

% awk ‘{print($0)}’ grades

john 85 92 78 94 88

andrea 89 90 75 90 86

TOPIC 7 – AWK

38

4/9/2021

20

print Example
% awk ‘{print $1, $2}’ grades

john 85

andrea 89

% awk ‘{print $1 “,” $2}’ grades

john,85

andrea,89

TOPIC 7 – AWK

39

print Example
% awk ‘{OFS=”-“;print $1 , $2}’ grades

john-85

andrea-89

% awk ‘{OFS=”-“;print $1 “,” $2}’ grades

john,85

andrea,89

TOPIC 7 – AWK

40

4/9/2021

21

Redirecting print output
Print output goes to standard output

unless redirected via:
> “file”

>> “file”

| “command”

will open file or command only once

subsequent redirections append to already open stream

TOPIC 7 – AWK

41

print Example
% awk ‘{print $1 , $2 > “file”}’ grades

% cat file

john 85

andrea 89

jasper 84

TOPIC 7 – AWK

42

4/9/2021

22

print Example
% awk ‘{print $1,$2 | “sort”}’ grades

andrea 89

jasper 84

john 85

% awk ‘{print $1,$2 | “sort –k 2”}’ grades

jasper 84

john 85

andrea 89

TOPIC 7 – AWK

43

print Example
% date

Wed Nov 19 14:40:07 CST 2008

% date |

awk ‘{print “Month: ” $2 “\nYear: “, $6}’

Month: Nov

Year: 2008

TOPIC 7 – AWK 44

44

4/9/2021

23

printf: Formatting output
Syntax:

printf(format-string, var1, var2, …)

◦ works like C printf

◦ each format specifier in “format-string” requires argument of matching type

TOPIC 7 – AWK

45

Format specifiers
%d, %i decimal integer

%c single character

%s string of characters

%f floating point number

%o octal number

%x hexadecimal number

%e scientific floating point notation

%% the letter “%”

TOPIC 7 – AWK

46

4/9/2021

24

Format specifier examples

TOPIC 7 – AWK

47

Given: x = ‘A’, y = 15, z = 2.3, and $1 = Bob Smith

Printf Format

Specifier What it Does

%c printf(“The character is %c \n”, x)

output: The character is A

%d printf(“The boy is %d years old \n”, y)

output: The boy is 15 years old

%s printf(“My name is %s \n”, $1)

output: My name is Bob Smith

%f printf(“z is %5.3f \n”, z)

output: z is 2.300

Format specifier modifiers
between “%” and letter

%10s

%7d

%10.4f

%-20s

meaning:
◦ width of field, field is printed right justified

◦ precision: number of digits after decimal point

◦ “-” will left justify

TOPIC 7 – AWK

48

4/9/2021

25

sprintf: Formatting text
Syntax:
sprintf(format-string, var1, var2, …)

◦ Works like printf, but does not produce output

◦ Instead it returns formatted string

Example:
{

text = sprintf(“1: %d – 2: %d”, $1, $2)

print text

}

TOPIC 7 – AWK

49

awk builtin functions
tolower(string)

returns a copy of string, with each upper-case character converted to lower-case. Nonalphabetic
characters are left unchanged.

Example: tolower(“MiXeD cAsE 123”)

returns “mixed case 123”

toupper(string)

returns a copy of string, with each lower-case character converted to upper-case.

TOPIC 7 – AWK

50

4/9/2021

26

awk Example: list of products
103:sway bar:49.99

101:propeller:104.99

104:fishing line:0.99

113:premium fish bait:1.00

106:cup holder:2.49

107:cooler:14.89

112:boat cover:120.00

109:transom:199.00

110:pulley:9.88

105:mirror:4.99

108:wheel:49.99

111:lock:31.00

102:trailer hitch:97.95

TOPIC 7 – AWK

51

awk Example: output
Marine Parts R Us

Main catalog

Part-id name price

======================================

101 propeller 104.99

102 trailer hitch 97.95

103 sway bar 49.99

104 fishing line 0.99

105 mirror 4.99

106 cup holder 2.49

107 cooler 14.89

108 wheel 49.99

109 transom 199.00

110 pulley 9.88

111 lock 31.00

112 boat cover 120.00

113 premium fish bait 1.00

======================================

Catalog has 13 parts

TOPIC 7 – AWK

52

4/9/2021

27

awk Example: complete
BEGIN {

FS= “:”

print “Marine Parts R Us”

print “Main catalog”

print “Part-id\tname\t\t\t price”

print “======================================”

}

{

printf(“%3d\t%-20s\t%6.2f\n”, $1, $2, $3)

count++

}

END {

print “======================================”

print “Catalog has ” count ” parts”

}

TOPIC 7 – AWK

53

is output sorted ?

awk Array
awk allows one-dimensional arrays

to store strings or numbers

index can be number or string

array need not be declared
◦ its size

◦ its elements

array elements are created when first used
◦ initialized to 0 or “”

TOPIC 7 – AWK

54

4/9/2021

28

Arrays in awk
Syntax:

arrayName[index] = value

Examples:

list[1] = “one”

list[2] = “three”

list[“other”] = “oh my !”

TOPIC 7 – AWK

55

Illustration: Associative Arrays
awk arrays can use string as index

TOPIC 7 – AWK

56

4/9/2021

29

Awk builtin split function
split(string, array, fieldsep)

◦ divides string into pieces separated by fieldsep, and stores the pieces in array

◦ if the fieldsep is omitted, the value of FS is used.

Example:

split(“auto-da-fe”, a, “-“)

sets the contents of the array a as follows:

a[1] = “auto”

a[2] = “da”

a[3] = “fe”

TOPIC 7 – AWK

57

Example: process sales data
input file:

output:
◦ summary of category sales

TOPIC 7 – AWK

58

4/9/2021

30

Illustration: process each input line
T

O
P

IC
7


A

W
K

59

Illustration: process each input line

T
O

P
IC

7

A
W

K

60

4/9/2021

31

Summary: awk program

TOPIC 7 – AWK

61

Example: complete program
% cat sales.awk

{

deptSales[$2] += $3

}

END {

for (x in deptSales)

print x, deptSales[x]

}

% awk –f sales.awk sales

TOPIC 7 – AWK

62

4/9/2021

32

Delete Array Entry
The delete function can be used to delete an element from an array.

Format:
delete array_name [index]

Example:
delete deptSales[“supplies”]

TOPIC 7 – AWK

63

Awk control structures
Conditional

◦ if-else

Repetition
◦ for

◦ with counter

◦ with array index

◦ while

◦ do-while

◦ also: break, continue

TOPIC 7 – AWK

64

4/9/2021

33

if Statement
Syntax:
if (conditional expression)

statement-1

else

statement-2

Example:
if ( NR < 3 ) print $2 else print $3 TOPIC 7 - AWK 65 for Loop Syntax: for (initialization; limit-test; update) statement Example: for (i = 1; i <= NR; i++) { total += $i count++ } TOPIC 7 - AWK 66 4/9/2021 34 for Loop for arrays Syntax: for (var in array) statement Example: for (x in deptSales) { print x, deptSales[x] } TOPIC 7 - AWK 67 while Loop Syntax: while (logical expression) statement Example: i = 1 while (i <= NF) { print i, $i i++ } TOPIC 7 - AWK 68 4/9/2021 35 do-while Loop Syntax: do statement while (condition)  statement is executed at least once, even if condition is false at the beginning Example: i = 1 do { print $0 i++ } while (i <= 10) TOPIC 7 - AWK 69 loop control statements break exits loop continue skips rest of current iteration, continues with next iteration TOPIC 7 - AWK 70 4/9/2021 36 loop control example for (x = 0; x < 20; x++) { if ( array[x] > 100) continue

printf “%d “, x

if ( array[x] < 0 ) break } TOPIC 7 - AWK 71 Example: sensor data 1 Temperature 2 Rainfall 3 Snowfall 4 Windspeed 5 Winddirection also: sensor readings Plan: print average readings in descending order TOPIC 7 - AWK 72 4/9/2021 37 Example: sensor readings 2008-10-01/1/68 2008-10-02/2/6 2007-10-03/3/4 2008-10-04/4/25 2008-10-05/5/120 2008-10-01/1/89 2007-10-01/4/35 2008-11-01/5/360 2008-10-01/1/45 2007-12-01/1/61 2008-10-10/1/32 TOPIC 7 - AWK 73 Example: print sensor data BEGIN { printf("id\tSensor\n") printf("----------------------\n") } { printf("%d\t%s\n", $1, $2) } TOPIC 7 - AWK 74 4/9/2021 38 Example: print sensor readings BEGIN { FS="/" printf(" Date\t\tValue\n“ printf("---------------------\n") } { printf("%s %7.2f\n", $1, $3) } TOPIC 7 - AWK 75 Example: print sensor summary BEGIN { FS="/" } { sum[$2] += $3; count[$2]++; } END { for (i in sum) { printf("%d %7.2f\n",i,sum[i]/count[i]) } } TOPIC 7 - AWK 76 4/9/2021 39 Example: Remaining tasks awk –f sense.awk sensors readings Sensor Average ----------------------- Winddirection 240.00 Temperature 59.00 Windspeed 30.00 Rainfall 6.00 Snowfall 4.00 TOPIC 7 - AWK 77 sorted sensor names 2 input files Example: print sensor averages Remaining tasks: ◦ recognize nature of input data use: number of fields in record ◦ substitute sensor id with sensor name use: associative array ◦ sort readings use: sort –gr –k 2 TOPIC 7 - AWK 78 4/9/2021 40 Example: sense.awk NF > 1 {

name[$1] = $2

}

NF < 2 { split($0,fields,"/") sum[fields[2]] += fields[3]; count[fields[2]]++; } END { for (i in sum) { printf("%15s %7.2f\n", name[i], sum[i]/count[i]) | "sort -gr -k 2" } } TOPIC 7 - AWK 79 Example: print sensor averages Remaining tasks: ◦ Sort use: sort -gr ◦ Substitute sensor id with sensor name 1. use: join -j 1 sensor-data sensor-averages 2. within awk TOPIC 7 - AWK 80 80 4/9/2021 41 Example: solution 1 (1/3) #! /bin/bash trap '/bin/rm /tmp/report-*-$$; exit' 1 2 3 cat << HERE > /tmp/report-awk-1-$$

BEGIN {FS=”/”}

{

sum[\$2] += \$3;

count[\$2]++;

}

END {

for (i in sum) {

printf(“%d %7.2f\n”, i, sum[i]/count[i])

}

}

HERE

TOPIC 7 – AWK 81

81

Example: solution 1 (2/3)
cat << HERE > /tmp/report-awk-2-$$

BEGIN {

printf(” Sensor Average\n”)

printf(“———————–\n”)

}

{

printf(“%15s %7.2f\n”, \$2, \$3)

}

HERE

TOPIC 7 – AWK 82

82

4/9/2021

42

Example: solution 1 (3/3)
awk -f /tmp/report-awk-1-$$

sensor-readings |

sort > /tmp/report-r-$$

join –j 1 sensor-data /tmp/report-r-$$ > /tmp/report-t-$$

sort -gr -k 3 /tmp/report-t-$$ |

awk -f /tmp/report-awk-2-$$

/bin/rm /tmp/report-*-$$

TOPIC 7 – AWK 83

83

Example: output

Sensor Average

———————–

Winddirection 240.00

Temperature 59.00

Windspeed 30.00

Rainfall 6.00

Snowfall 4.00

TOPIC 7 – AWK 84

84

4/9/2021

43

Example: solution 2 (1/2)
#! /bin/bash

trap ‘/bin/rm /tmp/report-*$$; exit’ 1 2 3

cat << HERE > /tmp/report-awk-3-$$

NF > 1 {

name[\$1] = \$2

}

NF < 2 { split(\$0,fields,"/") sum[fields[2]] += fields[3]; count[fields[2]]++; } TOPIC 7 - AWK 85 85 Example: solution 2 (2/2) END { for (i in sum) { printf("%15s %7.2f\n", name[i], sum[i]/count[i]) } } HERE echo " Sensor Average" echo "-----------------------" awk -f /tmp/report-awk-3-$$ sensor-data sensor-readings | sort -gr -k 2 /bin/rm /tmp/report-*$$ TOPIC 7 - AWK 86 86