awk In Bash Scripting

Up until now, we have covered a couple of text processing, from sed to grep to sort, and now we have the big daddy, which is awk.

awk (which stands for Aho, Weinberger, and Kernighan – its creators) is a programming language that is geared towards textual data manipulation.

Unlike the other text processing tools we covered, which are somewhat limited, awk is capable of virtually anything you could ever want to do, be it for text processing or even any programming related task (if statement, loops, function, and just about any major feature of a procedural programming language).

While awk can be used to do anything related to procedural, we would only be focusing on its feature that pertains to text processing, as that is what it is well known for.

awk usage is actually similar to sed, you use it the following way:

awk action [files]

The action is a sequence of statements enclosed in a curly bracket { }, each separated by a semicolon ; for example:

lastlog | awk '{print $1, "is a user"}'

The above example take the lastlog program, and pipe the output to awk, if you notice, there is one parameter after awk, and a text to print, which is enclosed in a curly braces.

One more important aspect to take note is that the actions are enclosed in a single quote, which is treated as special characters by the awk program, they won’t be treated by the shell.

This is the output of the above code:

user@blog:~$ lastlog | awk '{print $1, "is a user"}'
sys is a user
sync is a user
games is a user
man is a user
lp is a user
mail is a user
proxy is a user
www-data is a user
backup is a user
gnats is a user
nobody is a user
systemd-network is a user
systemd-resolve is a user
syslog is a user
messagebus is a user
_apt is a user

To understand what’s happening, I’ll copy and paste the code below again:

lastlog | awk '{print $1, "is a user"}'

Ordinarily when we run the lastlog command, we get something as follows:

Username         Port     From             Latest
root                                       **Never logged in**
lxd                                        **Never logged in**
sshd                                       **Never logged in**
pascal           pts/0   Mon Sep 21 10:00:17 +0000 2020
vboxadd                                    **Never logged in**
thisisme                                   **Never logged in**
dumm                                       **Never logged in**
james                                      **Never logged in**

Each line consist of a username, followed by the port they are logged into if they are ever logged in, followed by the IP the are logged in from, and the date they logged in.

This is where things get interesting, awk treats each of this entry as a token, and each token is separated from the next by either a space or a tab or even a series of spaces or tabs, so, in the above entry, we can say we can fairly say we have 4 tokens. Token one($1) is the “Username”, two($2) is the “Port, three($3) is the “From”, and token four($4) is the “Latest”.

The four tokens are actually numbered by awk if you want to use it in an awk operation, so, when you called $1, it would represent the username, let’s see another example:

devsrealm@blog:~$ who | awk '{print $1, "is logged in on terminal", $2}'
pascal is logged in on terminal pts/0
devsrealm is logged in on terminal pts/1

I am basically just appending the token in awk and using it for more meaningful operation, you can run the who command as is, to get the idea of what I am doing.

The print statement is a command line program, so, what print does is it takes a series of parameters, in the above command, it takes three parameters that are separated by commas, $1, and the next one is a text string, and the third one is $2. What print does is to translate, and print each of the parameter on the screen.

The beautiful thing about awk is that you can use the -F option to specify the character(s) used to separate tokens, e.g:

devsrealm@blog:~$ awk -F : '{print $1, "home:", $6}' /etc/passwd
root home: /root
daemon home: /usr/sbin
bin home: /bin
sys home: /dev
sync home: /bin
games home: /usr/games
man home: /var/cache/man
lp home: /var/spool/lpd
mail home: /var/mail
news home: /var/spool/news
uucp home: /var/spool/uucp
proxy home: /bin
www-data home: /var/www

I am using -F to tell awk that I want the token separator to be a colon, instead of spaces and tabs, the capital F stands for field separator, which is what we are using to separate the entry in the /etc/passwd.

As you can see, you can do all sorts of wild stuff with awk, lets proceed…

With awk, it is possible to perform different actions on lines that match certain (regular expression) patterns, e.g:

awk '/Devsrealm/ {print $1}' data.txt

In the above example, awk would find the line that matches “Devsrealm” in the data.txt file, and it would print the first parameter, e.g if you have the following text:

This is Devsrealm blog, and we talk about system admin, 
and programing language in general.

It would only print “This”

It is also possible to perform arithmetic on variables within awk, for example:

awk '{print $1, ($3+$4)/$5}' database

This is assuming the database file contains 5 parameters on each line, so, the third and fourth parameter would be added together and then divided by the 5th parameter, this is assuming the value are numbers, and so, it would print all of it assuming, they are like so on each line of the database.

If you have a complex awk script, you can put it in a separate file, and run it as follows:

awk -f scriptfile inputfile

Note that, you don’t need to add an execution permission to the file, nor do you need to add any interpreter.

It is also possible to create a standalone awk script, and you do it like this:

#!/bin/awk -f
{print $1, "home:", $6}'

To run the script, you execute the script, and you do something like this:

scriptname.awk /etc/passwd

It would then run whatever awk action on the /etc/passwd file, so, those are pretty basic stuff with awk, we would cover more in a later guide.

Comment policy: Respectful and beneficial comments are welcome with full open hands. However, all comments are manually moderated and those that doesn't relate with what the passage is saying or offensive comments would be deleted. Thanks for understanding!

Leave a Reply

Your email address will not be published. Required fields are marked *