linux tools shell script.. computer science
* of 44
*
- created by: Aho, Weinberger, and Kernighan
- scripting language used for manipulating data and generating reports
- versions of awk
- awk, nawk, mawk, pgawk, …
- GNU awk: gawk
* of 44
What can you do with awk?
- awk operation:
- scans a file line by line
- splits each input line into fields
- compares input line/fields to pattern
- performs action(s) on matched lines
- Useful for:
- transform data files
- produce formatted reports
- Programming constructs:
- format output lines
- arithmetic and string operations
- conditionals and loops
*
* of 44
The Command: awk
*
* of 44
Simple awk command
- awk ‘Pattern { Command }’ inputFile
$ cat textfile
Line number 1
Line number 2
Line number 3
Line number 4
Line number 5
$ awk ‘/4/ {print }’ textfile
Line number 4
condition
action
* of 44
Basic awk Syntax
- awk [options] ‘script’ file(s)
$ awk ‘/4/ {print }’ textfile
- awk [options] –f scriptfile file(s)
Options:
-F to change input field separator
-F: or -F,
-f to name script file
Since awk itself can be a complex language, you can store all the commands in a file and run it with the –f flag
*
The AWK/NAWK Utility
The AWK/NAWK Utility
Copyright Department of Computer Science, Northern Illinois University, 2004
*
Copyright Department of Computer Science, Northern Illinois University, 2004
* of 44
Basic awk Program
- consists of patterns & actions:
pattern {action}
- if pattern is missing, action is applied to all lines
- if action is missing, the matched line is printed
- must have either pattern or action
Example:
$ awk '/for/' testfile
- prints all lines containing string “for” in testfile
$ awk ‘{print }’ testfile
- print all lines in testfile
*
The AWK/NAWK Utility
The AWK/NAWK Utility
Copyright Department of Computer Science, Northern Illinois University, 2004
*
Copyright Department of Computer Science, Northern Illinois University, 2004
* of 44
Basic Terminology: input file
- A field is a unit of data in a line
- Each field is separated from the other fields by the field separator
- default field separator is whitespace
- A record is the collection of fields in a line
- A data file is made up of records
*
Example Input File
The AWK/NAWK Utility
The AWK/NAWK Utility
Copyright Department of Computer Science, Northern Illinois University, 2004
*
Copyright Department of Computer Science, Northern Illinois University, 2004
* of 44
Buffers
- awk supports two types of buffers:
record and field
- field buffer:
- one for each fields in the current record.
- names: $1, $2, …
- record buffer :
- $0 holds the entire current record
*
* of 44
Some System Variables
FS Field separator (default=whitespace)
RS Record separator (default=\n)
NF Number of fields in current record
NR Number of the current record
OFS Output field separator (default=space)
ORS Output record separator (default=\n)
FILENAME Current filename
*
* of 44
Example: Records and Fields
$ cat emps
Tom Jones 4424 5/12/66 543354
Mary Adams 5346 11/4/63 28765
Sally Chang 1654 7/22/54 650000
Billy Black 1683 9/23/44 336500
$ awk '{print NR, $0}' emps
1 Tom Jones 4424 5/12/66 543354
2 Mary Adams 5346 11/4/63 28765
3 Sally Chang 1654 7/22/54 650000
4 Billy Black 1683 9/23/44 336500
*
No pattern, just action
on each record (line)
The AWK/NAWK Utility
The AWK/NAWK Utility
Copyright Department of Computer Science, Northern Illinois University, 2004
*
Copyright Department of Computer Science, Northern Illinois University, 2004
* of 44
Example: Space as Field Separator
$ cat emps
Tom Jones 4424 5/12/66 543354
Mary Adams 5346 11/4/63 28765
Sally Chang 1654 7/22/54 650000
Billy Black 1683 9/23/44 336500
$ awk '{print NR, $1, $2, $5}' emps
1 Tom Jones 543354
2 Mary Adams 28765
3 Sally Chang 650000
4 Billy Black 336500
*
No pattern, just action
on each record (line)
The AWK/NAWK Utility
The AWK/NAWK Utility
Copyright Department of Computer Science, Northern Illinois University, 2004
*
Copyright Department of Computer Science, Northern Illinois University, 2004
* of 44
Example: Colon as Field Separator
$ cat em2
Tom Jones:4424:5/12/66:543354
Mary Adams:5346:11/4/63:28765
Sally Chang:1654:7/22/54:650000
Billy Black:1683:9/23/44:336500
$ awk -F: '/Jones/{print $1, $2}' em2
Tom Jones 4424
*
Pattern and action
on each record (line)
The AWK/NAWK Utility
The AWK/NAWK Utility
Copyright Department of Computer Science, Northern Illinois University, 2004
*
Copyright Department of Computer Science, Northern Illinois University, 2004
* of 44
BEGIN And END Blocks
- Two special patterns that can be matched
- BEGIN
- Commands are executed before any records are looked at
- END
- Commands are executed after all records are processed
* of 44
Example
$cat textfile
Line number 1
Line number 2
Line number 3
Line number 4
Line number 5
$ awk '/4/ {print $0} BEGIN {print "hello"} END {print "goodbye"}' textfile
Hello
Line number 4
goodbye
$
Step 1
Step 2
Step 3
* of 44
$ ls | awk ' BEGIN {print "List of html files:" } /.html$/ {print} END { print "There you go !" }'
List of html files:
as1.html
as2.html
index.html
There you go !
* of 44
Awk Patterns
- /regular expression/
- Relational expression
- >, <, >=, <=, ==
- Pattern && pattern
- Pattern || pattern
- Pattern1 ? Pattern2 : pattern3
- If Pattern1 is True, then Pattern2, else pattern 3
- (pattern)
- ! Pattern
* of 44
Example Patterns
$ cat textfile2
Just a text file Nothing to see here
Some lines have
More fields than others
And some
Are blank
$ awk 'NF > 3 {print $0}' textfile2
Just a text file Nothing to see here
More fields than others
$ awk 'NF > 3 || /^$/ {print $0}' textfile2
Just a text file Nothing to see here
More fields than others
$ awk 'NF > 3 ? /file/ : /^And/ {print $0}' textfile2
Just a text file Nothing to see here
And some
* of 44
Awk Actions
- Enclosed in { }
- () Grouping
- $ Field reference
- ++ -- Increment, decrement
- ^ Exponentiation
- + - ! Plus, minus, not
- * / % Multiplication, division, and modulus
* of 44
*
* of 44
*
$ awk '{print $1, $2 * $3}' empsrh
John 325
Smith 420
Tom 117
George 756
Sam 132
$cat empsrh
John 13 25
Smith 14 30
Tom 9 13
George 21 36
Sam 11 12
* of 44
*
$ awk '{print "total pay for", $1, " is ", $2 * $3}' empsrh
total pay for John is 325
total pay for Smith is 420
total pay for Tom is 117
total pay for George is 756
total pay for Sam is 132
$ cat empsrh
John 13 25
Smith 14 30
Tom 9 13
George 21 36
Sam 11 12
* of 44
*
* of 44
*
* of 44
Example: Computing with awk
$cat empsrh
John 13 25
Smith 14 30
Tom 9 13
George 21 36
Sam 11 12
$ awk '$3 > 15 {emp = emp +1} END {print emp, “employees worked more than 15 hours"}' empsrh
3 employees worked more than 15 housr
* of 44
Example: Computing with awk
$cat empsrh
John 13 25
Smith 14 30
Tom 9 13
George 21 36
Sam 11 12
$awk '{pay = pay + $2 * $3} END {print NR, "employees"; print "total pay is", pay; print "average pay is", pay/NR}' empsrh
5 employees
total pay is 1750
average pay is 350
* of 44
*
$ awk '$2 > maxrate {maxrate = $2; maxemp =$1}
END {print "Highest rate is:", maxrate, "for", maxemp}’ emprate
* of 44
awk
$ cat em2
Tom Jones:4424: 5/12/66:543354
Mary Adam:5346:11/4/63:28765
Sally Chang:1654:7/22/54:650000
Billy Black:1683:9/23/44:336500
$ awk -F: '{names=names $1 " "} END {print names}' em2
Tom Jones Mary Adam Sally Chang Billy Black
*
* of 44
Example: I want to go through and calculate the average score on the Midterm
$ cat grades
Jason William:Midterm:100
Jane Smith:Quiz 1:45
Tom Ram:Final:78
Sarah George:Midterm:23
Franklin Rob:Midterm:46
$ awk -F: '/Midterm/ {count++; sum= sum + $3}
BEGIN {count=0 ; sum=0} END {print sum/count}' grades
56.3333
$
* of 44
Another Example
Adding 12 points to everyone’s midterm score
$ cat grades
Jason William:Midterm:100
Jane Smith:Quiz 1:45
Tom Ram:Final:78
Sarah George:Midterm:23
Franklin Rob:Midterm:46
$awk -F: '/Midterm/ {$3 =$3 + 12; print $0} /Quiz/ || /Final/ {print $0}' grades
Jason William Midterm 112
Jane Smith:Quiz 1:45
Tom Ram:Final:78
Sarah George Midterm 35
Franklin Rob Midterm 58
* of 44
Example
$ colors=(red blue orange green purple)
$ echo ${colors[@]} | awk '{for (i=NF; i > 0; --i) print $i}’
purple
green
orange
blue
red
* of 44
awk Versus bash $ arguments
- Always enclose everything to awk in single quotes
- $1 to awk means something completely different than $1 to bash
- $1 in awk means first field
- $1 in bash means first command line argument
* of 44
User Defined Variables
Variable names could be anything, but it can’t begin with a number.
You can assign a variable as in shell scripting like this:
$ cat script0
BEGIN {
test="This is a test"
print test
}
$ awk -f script0
This is a test
* of 44
Example: script in a file
$ cat testfile
{print $1 "home at " $6}
$ awk -F: -f testfile /etc/passwd
kuskarhome at /home/STUDENTS/majors/kuskar
juswstahome at /home/STUDENTS/majors/juswsta
dusgmoohome at /home/STUDENTS/nonmajors/dusgmoo
jerpcamhome at /home/STUDENTS/majors/jerpcam
pralam6home at /home/STUDENTS/majors/pralam6
jonnrob1home at /home/STUDENTS/majors/jonnrob1
….
$ cat testfile2
{
text = $1 "home at " $6
print text
}
$ head /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
* of 44
$ cat testfile3
BEGIN {
print "users and their corresponding home"
print "UserName \t HomePath"
print "_________ \t __________"
FS=":"
}
{
print $1 " \t " $6
}
END {
print "The end"
}
$ awk -f testfile3 /etc/passwd
users and their corresponding home
UserName HomePath
_________ ________
root /root
daemon /usr/sbin
bin /bin
sys /dev
sync /bin
games /usr/games
$ head /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
* of 44
Sometimes, the fields are distributed without a fixed separator. In these cases, FIELDWIDTHS variable solves the problem.
$ cat testfile4
1235.96521
927-8.3652
36257.8157
$awk 'BEGIN {FIELDWIDTHS="3 4 3"}{print $1, $2, $3}' testfile4
123 5.96 521
927 -8.3 652
362 57.8 157
* of 44
Suppose that your data are distributed on different lines
$ cat testfile5
Jalal Omer
123 High Street
(222) 466-1234
James Smith
456 High Street
(333) 456-7890
$ awk 'BEGIN {FS="\n"; RS=""} {print $1," ", $2," ", $3}' testfile5
Jalal Omer 123 High Street (222) 466-1234
James Smith 456 High Street (333) 456-7890
* of 44
$ cat testfile6
{
if ($1 > 30)
{
x = $1 * 3
print x
}
else
{
x = $1 / 2
print x
}
}
$ awk -f testfile6 numbers
5
7.5
3
99
135
10
4
11
$ cat numbers
10
15
6
33
45
20
8
22
* of 44
While Loop
You can use the while loop to iterate over data with a condition.
$ cat testfile7
{
sum = 0
i = 1
while (i < 4)
{
sum += $i
i++
}
average = sum / 3
print "Average: ", average
}
$ awk -f testfile7 nums
Average: 127
Average: 129.667
Average: 192.667
Average: 165.333
$ cat nums
124 127 130
112 142 135
175 158 245
118 231 147
For each input line do
From here
To
Here
* of 44
You can exit the loop using break command like this:
$ cat testfile8
{
sum = 0
i = 1
while (i < 4)
{
sum += $i
i++
if (i == 3)
break
}
average = sum / 3
print "Average: ", average
}
jomer@cs10:~$ awk -f testfile8 nums
Average: 83.6667
Average: 84.6667
Average: 111
Average: 116.333
$ cat nums
124 127 130
112 142 135
175 158 245
118 231 147
Wrong averages.
Why?
* of 44
The for Loop
$ cat testfile9
{
sum = 0
for (i =1; i < 4; i++)
{
sum += $i
}
average = sum / 3
print "Average: ", average
}
jomer@cs10:~$ awk -f testfile9 nums
Average: 127
Average: 129.667
Average: 192.667
Average: 165.333
$ cat nums
124 127 130
112 142 135
175 158 245
118 231 147
* of 44
Mathematical Functions
sin(x) cos(x) sqrt(x) exp(x) log(x) rand()
$ awk 'BEGIN{x=rand(); y= sqrt(16); print x, y}'
0.237788 4
String Functions
$ awk 'BEGIN{x="likegeeks"; print toupper(x)}'
LIKEGEEKS
* of 44
User Defined Functions
$ cat testfile10
function myfunc()
{
printf "The user %s has home path at %s \n", $1, $6
}
BEGIN {FS=":"}
{
myfunc()
}
$ awk -f testfile10 /etc/passwd
The user root has home path at /root
The user daemon has home path at /usr/sbin
The user bin has home path at /bin
The user sys has home path at /dev
The user sync has home path at /bin
The user games has home path at /usr/games
For each input line do
Call this function
$ head /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
* of 44
Awk Text Split into an array
$ echo "12 23 11" | awk '{split($0, a); print a[3], a[2], a[1]}'
11 23 12
$ echo "12,23,11" | awk '{split($0, a,","); print a[3], a[2], a[1]}'
11 23 12
$ awk ‘{split($0, arr, “:”), print arr[4], arr[1]}’ inputfile
* of 44
$ cat file1
Item1,200
Item2,500
Item3,900
Item2,800
Item1,600
$ awk -F, '{print > $1}' file1
$ cat Item1
Item1,200
Item1,600
$ cat Item3
Item3,900
$ cat Item2
Item2,500
Item2,800
$ awk -F, '{print > $1".txt"}' file1
$ ls *.txt
Item2.txt Item1.txt Item3.txt
Splitting a file into several files
Print the entire line
Into a file named $1