facebook youtube pinterest twitter reddit whatsapp instagram

awk In Bash Scripting

Up until now, we have covered a couple of text processing, from sed to grep to sort, and now we have the big daddy, which is awk.

awk (which stands for Aho, Weinberger, and Kernighan - its creators) is a programming language that is geared towards textual data manipulation.

Unlike the other text processing tools we covered, which are somewhat limited, awk is capable of virtually anything you could ever want to do, be it for text processing or even any programming related task (if statement, loops, function, and just about any major feature of a procedural programming language).

While awk can be used to do anything related to procedural, we would only be focusing on its feature that pertains to text processing, as that is what it is well known for.

awk usage is actually similar to sed, you use it the following way:

awk action [files]

The action is a sequence of statements enclosed in a curly bracket { }, each separated by a semicolon ; for example:

lastlog | awk '{print $1, "is a user"}'

The above example take the lastlog program, and pipe the output to awk, if you notice, there is one parameter after awk, and a text to print, which is enclosed in a curly braces.

One more important aspect to take note is that the actions are enclosed in a single quote, which is treated as special characters by the awk program, they won't be treated by the shell.

This is the output of the above code:

user@blog:~$ lastlog | awk '{print $1, "is a user"}'
sys is a user
sync is a user
games is a user
man is a user
lp is a user
mail is a user
proxy is a user
www-data is a user
backup is a user
gnats is a user
nobody is a user
systemd-network is a user
systemd-resolve is a user
syslog is a user
messagebus is a user
_apt is a user

To understand what's happening, I'll copy and paste the code below again:

lastlog | awk '{print $1, "is a user"}'

Ordinarily when we run the lastlog command, we get something as follows:

Username         Port     From             Latest
root                                       **Never logged in**
lxd                                        **Never logged in**
sshd                                       **Never logged in**
pascal           pts/0   Mon Sep 21 10:00:17 +0000 2020
vboxadd                                    **Never logged in**
thisisme                                   **Never logged in**
dumm                                       **Never logged in**
james                                      **Never logged in**

Each line consist of a username, followed by the port they are logged into if they are ever logged in, followed by the IP the are logged in from, and the date they logged in.

This is where things get interesting, awk treats each of this entry as a token, and each token is separated from the next by either a space or a tab or even a series of spaces or tabs, so, in the above entry, we can say we can fairly say we have 4 tokens. Token one($1) is the "Username", two($2) is the "Port, three($3) is the "From", and token four($4) is the "Latest".

The four tokens are actually numbered by awk if you want to use it in an awk operation, so, when you called $1, it would represent the username, let's see another example:

devsrealm@blog:~$ who | awk '{print $1, "is logged in on terminal", $2}'
pascal is logged in on terminal pts/0
devsrealm is logged in on terminal pts/1

I am basically just appending the token in awk and using it for more meaningful operation, you can run the who command as is, to get the idea of what I am doing.

The print statement is a command line program, so, what print does is it takes a series of parameters, in the above command, it takes three parameters that are separated by commas, $1, and the next one is a text string, and the third one is $2. What print does is to translate, and print each of the parameter on the screen.

The beautiful thing about awk is that you can use the -F option to specify the character(s) used to separate tokens, e.g:

devsrealm@blog:~$ awk -F : '{print $1, "home:", $6}' /etc/passwd
root home: /root
daemon home: /usr/sbin
bin home: /bin
sys home: /dev
sync home: /bin
games home: /usr/games
man home: /var/cache/man
lp home: /var/spool/lpd
mail home: /var/mail
news home: /var/spool/news
uucp home: /var/spool/uucp
proxy home: /bin
www-data home: /var/www

I am using -F to tell awk that I want the token separator to be a colon, instead of spaces and tabs, the capital F stands for field separator, which is what we are using to separate the entry in the /etc/passwd.

As you can see, you can do all sorts of wild stuff with awk, lets proceed...

With awk, it is possible to perform different actions on lines that match certain (regular expression) patterns, e.g:

awk '/Devsrealm/ {print $1}' data.txt

In the above example, awk would find the line that matches "Devsrealm" in the data.txt file, and it would print the first parameter, e.g if you have the following text:

This is Devsrealm blog, and we talk about system admin, 
and programing language in general.

It would only print "This"

It is also possible to perform arithmetic on variables within awk, for example:

awk '{print $1, ($3+$4)/$5}' database

This is assuming the database file contains 5 parameters on each line, so, the third and fourth parameter would be added together and then divided by the 5th parameter, this is assuming the value are numbers, and so, it would print all of it assuming, they are like so on each line of the database.

If you have a complex awk script, you can put it in a separate file, and run it as follows:

awk -f scriptfile inputfile

Note that, you don't need to add an execution permission to the file, nor do you need to add any interpreter.

It is also possible to create a standalone awk script, and you do it like this:

#!/bin/awk -f
{print $1, "home:", $6}'

To run the script, you execute the script, and you do something like this:

scriptname.awk /etc/passwd

It would then run whatever awk action on the /etc/passwd file, so, those are pretty basic stuff with awk, we would cover more in a later guide.

Related Post(s)

  • sed (Stream Editor) In Bash Scripting

    Another powerful filter in GNU/Linux is sed (stream editor), which is a program for performing basic editing tasks on the output of another program or on a file. sed or stream editor can be used to p

  • (Bash Script) Automate Multiple Classicpress & Wordpress Installation + [Caching and SSL]

    If you are tired of using control panels (I am because they are mostly not secure, and most comes loaded with bundles of useless stacks), and want to do everything on the server level, I wrote a menu

  • Returning Values From Functions in (Bash)

    In the guide function in bash, we dive deep in bash functions and we also pass some information into the functions to carry out a certain operation. In this guide, I'll show you how to get informatio

  • Functions in Bash

    Functions are modular building blocks for creating powerful and modular scripts, using function makes it easy to isolate a code since it would be in a single building block, which would be isolated f

  • grep (Regular Expression) In Bash Scripting

    We recently discussed filters in bash scripting, in this guide we would look more into the practical usage of using a filter program, and an example of such a program is grep. grep, in its simplest f

  • Sort in Bash Scripting

    We recently just looked at grep in bash scripting, and another commonly used program in bash scripting is the program called sort. Sort is a simple filter that is used to sort lines of text from its