linux tools shell script.. computer science

profileSingpran
awkprogram7.ppt

* of 44

*

  • created by: Aho, Weinberger, and Kernighan
  • scripting language used for manipulating data and generating reports
  • versions of awk
  • awk, nawk, mawk, pgawk, …
  • GNU awk: gawk

* of 44

What can you do with awk?

  • awk operation:
  • scans a file line by line
  • splits each input line into fields
  • compares input line/fields to pattern
  • performs action(s) on matched lines
  • Useful for:
  • transform data files
  • produce formatted reports
  • Programming constructs:
  • format output lines
  • arithmetic and string operations
  • conditionals and loops

*

* of 44

The Command: awk

*

* of 44

Simple awk command

  • awk ‘Pattern { Command }’ inputFile

$ cat textfile

Line number 1

Line number 2

Line number 3

Line number 4

Line number 5

$ awk ‘/4/ {print }’ textfile

Line number 4

condition

action

* of 44

Basic awk Syntax

  • awk [options] ‘script’ file(s)

$ awk ‘/4/ {print }’ textfile

  • awk [options] –f scriptfile file(s)

Options:

-F to change input field separator

-F: or -F,

-f to name script file

Since awk itself can be a complex language, you can store all the commands in a file and run it with the –f flag

*

The AWK/NAWK Utility

The AWK/NAWK Utility

Copyright Department of Computer Science, Northern Illinois University, 2004

*

Copyright Department of Computer Science, Northern Illinois University, 2004

* of 44

Basic awk Program

  • consists of patterns & actions:

pattern {action}

  • if pattern is missing, action is applied to all lines
  • if action is missing, the matched line is printed
  • must have either pattern or action

Example:

$ awk '/for/' testfile

  • prints all lines containing string “for” in testfile

$ awk ‘{print }’ testfile

- print all lines in testfile

*

The AWK/NAWK Utility

The AWK/NAWK Utility

Copyright Department of Computer Science, Northern Illinois University, 2004

*

Copyright Department of Computer Science, Northern Illinois University, 2004

* of 44

Basic Terminology: input file

  • A field is a unit of data in a line
  • Each field is separated from the other fields by the field separator
  • default field separator is whitespace
  • A record is the collection of fields in a line
  • A data file is made up of records

*

Example Input File

The AWK/NAWK Utility

The AWK/NAWK Utility

Copyright Department of Computer Science, Northern Illinois University, 2004

*

Copyright Department of Computer Science, Northern Illinois University, 2004

* of 44

Buffers

  • awk supports two types of buffers:

record and field

  • field buffer:
  • one for each fields in the current record.
  • names: $1, $2, …

  • record buffer :
  • $0 holds the entire current record

*

* of 44

Some System Variables

FS Field separator (default=whitespace)

RS Record separator (default=\n)

NF Number of fields in current record

NR Number of the current record

OFS Output field separator (default=space)

ORS Output record separator (default=\n)

FILENAME Current filename

*

* of 44

Example: Records and Fields

$ cat emps

Tom Jones 4424 5/12/66 543354

Mary Adams 5346 11/4/63 28765

Sally Chang 1654 7/22/54 650000

Billy Black 1683 9/23/44 336500

$ awk '{print NR, $0}' emps

1 Tom Jones 4424 5/12/66 543354

2 Mary Adams 5346 11/4/63 28765

3 Sally Chang 1654 7/22/54 650000

4 Billy Black 1683 9/23/44 336500

*

No pattern, just action

on each record (line)

The AWK/NAWK Utility

The AWK/NAWK Utility

Copyright Department of Computer Science, Northern Illinois University, 2004

*

Copyright Department of Computer Science, Northern Illinois University, 2004

* of 44

Example: Space as Field Separator

$ cat emps

Tom Jones 4424 5/12/66 543354

Mary Adams 5346 11/4/63 28765

Sally Chang 1654 7/22/54 650000

Billy Black 1683 9/23/44 336500

$ awk '{print NR, $1, $2, $5}' emps

1 Tom Jones 543354

2 Mary Adams 28765

3 Sally Chang 650000

4 Billy Black 336500

*

No pattern, just action

on each record (line)

The AWK/NAWK Utility

The AWK/NAWK Utility

Copyright Department of Computer Science, Northern Illinois University, 2004

*

Copyright Department of Computer Science, Northern Illinois University, 2004

* of 44

Example: Colon as Field Separator

$ cat em2

Tom Jones:4424:5/12/66:543354

Mary Adams:5346:11/4/63:28765

Sally Chang:1654:7/22/54:650000

Billy Black:1683:9/23/44:336500

$ awk -F: '/Jones/{print $1, $2}' em2

Tom Jones 4424

*

Pattern and action

on each record (line)

The AWK/NAWK Utility

The AWK/NAWK Utility

Copyright Department of Computer Science, Northern Illinois University, 2004

*

Copyright Department of Computer Science, Northern Illinois University, 2004

* of 44

BEGIN And END Blocks

  • Two special patterns that can be matched
  • BEGIN
  • Commands are executed before any records are looked at
  • END
  • Commands are executed after all records are processed

* of 44

Example

$cat textfile

Line number 1

Line number 2

Line number 3

Line number 4

Line number 5

$ awk '/4/ {print $0} BEGIN {print "hello"} END {print "goodbye"}' textfile

Hello

Line number 4

goodbye

$

Step 1

Step 2

Step 3

* of 44

$ ls | awk ' BEGIN {print "List of html files:" } /.html$/ {print} END { print "There you go !" }'

List of html files:

as1.html

as2.html

index.html

There you go !

* of 44

Awk Patterns

  • /regular expression/
  • Relational expression
  • >, <, >=, <=, ==
  • Pattern && pattern
  • Pattern || pattern
  • Pattern1 ? Pattern2 : pattern3
  • If Pattern1 is True, then Pattern2, else pattern 3
  • (pattern)
  • ! Pattern

* of 44

Example Patterns

$ cat textfile2

Just a text file Nothing to see here

Some lines have

More fields than others

And some

Are blank

$ awk 'NF > 3 {print $0}' textfile2

Just a text file Nothing to see here

More fields than others

$ awk 'NF > 3 || /^$/ {print $0}' textfile2

Just a text file Nothing to see here

More fields than others

$ awk 'NF > 3 ? /file/ : /^And/ {print $0}' textfile2

Just a text file Nothing to see here

And some

* of 44

Awk Actions

  • Enclosed in { }
  • () Grouping
  • $ Field reference
  • ++ -- Increment, decrement
  • ^ Exponentiation
  • + - ! Plus, minus, not
  • * / % Multiplication, division, and modulus

* of 44

*

* of 44

*

$ awk '{print $1, $2 * $3}' empsrh

John 325

Smith 420

Tom 117

George 756

Sam 132

$cat empsrh

John 13 25

Smith 14 30

Tom 9 13

George 21 36

Sam 11 12

* of 44

*

$ awk '{print "total pay for", $1, " is ", $2 * $3}' empsrh

total pay for John is 325

total pay for Smith is 420

total pay for Tom is 117

total pay for George is 756

total pay for Sam is 132

$ cat empsrh

John 13 25

Smith 14 30

Tom 9 13

George 21 36

Sam 11 12

* of 44

*

* of 44

*

* of 44

Example: Computing with awk

$cat empsrh

John 13 25

Smith 14 30

Tom 9 13

George 21 36

Sam 11 12

$ awk '$3 > 15 {emp = emp +1} END {print emp, “employees worked more than 15 hours"}' empsrh

3 employees worked more than 15 housr

* of 44

Example: Computing with awk

$cat empsrh

John 13 25

Smith 14 30

Tom 9 13

George 21 36

Sam 11 12

$awk '{pay = pay + $2 * $3} END {print NR, "employees"; print "total pay is", pay; print "average pay is", pay/NR}' empsrh

5 employees

total pay is 1750

average pay is 350

* of 44

*

$ awk '$2 > maxrate {maxrate = $2; maxemp =$1}

END {print "Highest rate is:", maxrate, "for", maxemp}’ emprate

* of 44

awk

$ cat em2

Tom Jones:4424: 5/12/66:543354

Mary Adam:5346:11/4/63:28765

Sally Chang:1654:7/22/54:650000

Billy Black:1683:9/23/44:336500

$ awk -F: '{names=names $1 " "} END {print names}' em2

Tom Jones Mary Adam Sally Chang Billy Black

*

* of 44

Example: I want to go through and calculate the average score on the Midterm

$ cat grades

Jason William:Midterm:100

Jane Smith:Quiz 1:45

Tom Ram:Final:78

Sarah George:Midterm:23

Franklin Rob:Midterm:46

$ awk -F: '/Midterm/ {count++; sum= sum + $3}

BEGIN {count=0 ; sum=0} END {print sum/count}' grades

56.3333

$

* of 44

Another Example

Adding 12 points to everyone’s midterm score

$ cat grades

Jason William:Midterm:100

Jane Smith:Quiz 1:45

Tom Ram:Final:78

Sarah George:Midterm:23

Franklin Rob:Midterm:46

$awk -F: '/Midterm/ {$3 =$3 + 12; print $0} /Quiz/ || /Final/ {print $0}' grades

Jason William Midterm 112

Jane Smith:Quiz 1:45

Tom Ram:Final:78

Sarah George Midterm 35

Franklin Rob Midterm 58

* of 44

Example

$ colors=(red blue orange green purple)

$  echo ${colors[@]} | awk '{for (i=NF; i > 0; --i) print $i}’

purple

green

orange

blue

red

* of 44

awk Versus bash $ arguments

  • Always enclose everything to awk in single quotes

  • $1 to awk means something completely different than $1 to bash
  • $1 in awk means first field
  • $1 in bash means first command line argument

* of 44

User Defined Variables
Variable names could be anything, but it can’t begin with a number.
You can assign a variable as in shell scripting like this:

$ cat script0

BEGIN {

test="This is a test"

print test

}

$ awk -f script0

This is a test

* of 44

Example: script in a file

$ cat testfile

{print $1 "home at " $6}

$ awk -F: -f testfile /etc/passwd

kuskarhome at /home/STUDENTS/majors/kuskar

juswstahome at /home/STUDENTS/majors/juswsta

dusgmoohome at /home/STUDENTS/nonmajors/dusgmoo

jerpcamhome at /home/STUDENTS/majors/jerpcam

pralam6home at /home/STUDENTS/majors/pralam6

jonnrob1home at /home/STUDENTS/majors/jonnrob1

….

$ cat testfile2

{

text = $1 "home at " $6

print text

}

$ head /etc/passwd

root:x:0:0:root:/root:/bin/bash

daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin

bin:x:2:2:bin:/bin:/usr/sbin/nologin

sys:x:3:3:sys:/dev:/usr/sbin/nologin

sync:x:4:65534:sync:/bin:/bin/sync

games:x:5:60:games:/usr/games:/usr/sbin/nologin

man:x:6:12:man:/var/cache/man:/usr/sbin/nologin

lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin

mail:x:8:8:mail:/var/mail:/usr/sbin/nologin

news:x:9:9:news:/var/spool/news:/usr/sbin/nologin

* of 44

$ cat testfile3

BEGIN {

print "users and their corresponding home"

print "UserName \t HomePath"

print "_________ \t __________"

FS=":"

}

{

print $1 " \t " $6

}

END {

print "The end"

}

$ awk -f testfile3 /etc/passwd

users and their corresponding home

UserName HomePath

_________ ________

root /root

daemon /usr/sbin

bin /bin

sys /dev

sync /bin

games /usr/games

$ head /etc/passwd

root:x:0:0:root:/root:/bin/bash

daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin

bin:x:2:2:bin:/bin:/usr/sbin/nologin

sys:x:3:3:sys:/dev:/usr/sbin/nologin

sync:x:4:65534:sync:/bin:/bin/sync

games:x:5:60:games:/usr/games:/usr/sbin/nologin

man:x:6:12:man:/var/cache/man:/usr/sbin/nologin

lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin

mail:x:8:8:mail:/var/mail:/usr/sbin/nologin

news:x:9:9:news:/var/spool/news:/usr/sbin/nologin

* of 44

Sometimes, the fields are distributed without a fixed separator. In these cases, FIELDWIDTHS variable solves the problem.

$ cat testfile4

1235.96521

927-8.3652

36257.8157

$awk 'BEGIN {FIELDWIDTHS="3 4 3"}{print $1, $2, $3}' testfile4

123 5.96 521

927 -8.3 652

362 57.8 157

* of 44

Suppose that your data are distributed on different lines

$ cat testfile5

Jalal Omer

123 High Street

(222) 466-1234

James Smith

456 High Street

(333) 456-7890

$ awk 'BEGIN {FS="\n"; RS=""} {print $1," ", $2," ", $3}' testfile5

Jalal Omer 123 High Street (222) 466-1234

James Smith 456 High Street (333) 456-7890

* of 44

$ cat testfile6

{

if ($1 > 30)

{

x = $1 * 3

print x

}

else

{

x = $1 / 2

print x

}

}

$ awk -f testfile6 numbers

5

7.5

3

99

135

10

4

11

$ cat numbers

10

15

6

33

45

20

8

22

* of 44

While Loop
You can use the while loop to iterate over data with a condition.

$ cat testfile7

{

sum = 0

i = 1

while (i < 4)

{

sum += $i

i++

}

average = sum / 3

print "Average: ", average

}

$ awk -f testfile7 nums

Average: 127

Average: 129.667

Average: 192.667

Average: 165.333

$ cat nums

124 127 130

112 142 135

175 158 245

118 231 147

For each input line do

From here

To

Here

* of 44

You can exit the loop using break command like this:

$ cat testfile8

{

sum = 0

i = 1

while (i < 4)

{

sum += $i

i++

if (i == 3)

break

}

average = sum / 3

print "Average: ", average

}

jomer@cs10:~$ awk -f testfile8 nums

Average: 83.6667

Average: 84.6667

Average: 111

Average: 116.333

$ cat nums

124 127 130

112 142 135

175 158 245

118 231 147

Wrong averages.

Why?

* of 44

The for Loop

$ cat testfile9

{

sum = 0

for (i =1; i < 4; i++)

{

sum += $i

}

average = sum / 3

print "Average: ", average

}

jomer@cs10:~$ awk -f testfile9 nums

Average: 127

Average: 129.667

Average: 192.667

Average: 165.333

$ cat nums

124 127 130

112 142 135

175 158 245

118 231 147

* of 44

Mathematical Functions
sin(x) cos(x) sqrt(x)  exp(x)  log(x)  rand()

$ awk 'BEGIN{x=rand(); y= sqrt(16); print x, y}'

0.237788 4

String Functions

$ awk 'BEGIN{x="likegeeks"; print toupper(x)}'

LIKEGEEKS

* of 44

User Defined Functions

$ cat testfile10

function myfunc()

{

printf "The user %s has home path at %s \n", $1, $6

}

BEGIN {FS=":"}

{

myfunc()

}

$ awk -f testfile10 /etc/passwd

The user root has home path at /root

The user daemon has home path at /usr/sbin

The user bin has home path at /bin

The user sys has home path at /dev

The user sync has home path at /bin

The user games has home path at /usr/games

For each input line do

Call this function

$ head /etc/passwd

root:x:0:0:root:/root:/bin/bash

daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin

bin:x:2:2:bin:/bin:/usr/sbin/nologin

sys:x:3:3:sys:/dev:/usr/sbin/nologin

sync:x:4:65534:sync:/bin:/bin/sync

games:x:5:60:games:/usr/games:/usr/sbin/nologin

man:x:6:12:man:/var/cache/man:/usr/sbin/nologin

lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin

mail:x:8:8:mail:/var/mail:/usr/sbin/nologin

news:x:9:9:news:/var/spool/news:/usr/sbin/nologin

* of 44

Awk Text Split into an array

$ echo "12 23 11" | awk '{split($0, a); print a[3], a[2], a[1]}'

11 23 12

$ echo "12,23,11" | awk '{split($0, a,","); print a[3], a[2], a[1]}'

11 23 12

$ awk ‘{split($0, arr, “:”), print arr[4], arr[1]}’ inputfile

* of 44

$ cat file1

Item1,200

Item2,500

Item3,900

Item2,800

Item1,600

$ awk -F, '{print > $1}' file1

$ cat Item1

Item1,200

Item1,600

$ cat Item3

Item3,900

$ cat Item2

Item2,500

Item2,800

$ awk -F, '{print > $1".txt"}' file1

$ ls *.txt

Item2.txt Item1.txt Item3.txt

Splitting a file into several files

Print the entire line

Into a file named $1