facebook youtube pinterest twitter reddit whatsapp instagram

Text Processing (Filters) In Bash Scripting

Ever wondered if you could quickly do some text processing in a bash script, for example, you can analyze a text or sort the output of a text, and provide it in a more meaningful way.

The good thing about doing things this way is that you can search a single query in a mammoth document, for example, say you are looking for a specific name in a document, and you need to return all the lines that the user is included in, without a filter program you won't be able to do that.

Before we go into the program that can carry out the aforementioned operation, let's understand what a filter is all about...

A filter is a nix or GNU/Linux command-line program that has the following properties:

  • It takes either standard input (your keyboard) or the contents of one or more files, the input has to come from somewhere, be it a keyboard input, a file, or an output from another program.
  • It performs some processing on the above data, the different processing is what separates one filter program from the other
  • It produces an output based upon that input you give it an input, you specify what you wanna process on it, and it gives you an output based on the operation it performs on the input.

Listen, it might get a bit confusing as you might think that all GNU/Linux programs are filters, which kinda make sense since they all produce an output, this is not entirely true, and to clarify what I am saying, I'll show you an example of what a filter looks like and what it doesn't look like.

wc is a filter, the processing it performs is counting line, words, and character from standard input or one or more files. The output it produces is the number that represents those counts it those on the standard input or files, making sense?

Let's take an example:

devsrealm@server:~/bin$ wc data.txt
  7  58 305 data.txt
devsrealm@server:~/bin$

The above produces lines, words and character of the file "data.txt"

To perform operation on all files in a given directory, you do:

wc *
devsrealm@server:~/bin$ wc *
wc: backup: Is a directory
      0       0       0 backup
wc: book: Is a directory
      0       0       0 book
wc: cat,: Is a directory
      0       0       0 cat,
      0       0       0 ch1.sh
      7      58     305 data.txt
     71     450    2864 dr_optimization.sh
     25      78     482 entrance.sh
      2       2      18 example1.sh

You can see how it filters and gives the lines, words and character of each file, it even process a directory.

Now, if you try to type wc as is, you'll see it is waiting for an input, now type any word or character, hit enter rinse and repeat, and once you are done, you hit CTRLL + D to indicate you are done typing, and wc would process all the input and gives you the output of whatever you typed, e.g:

devsrealm@server:~/bin$ wc
Things in Life Aren't That Black and White
Use your Brain With Whatver You Do
You Only Have A Time In Life
Use It Well
Gracias      4      26     126

Having said that, let's take a look at programs that do not filter, actually, most programs on GNU/Linux aren't filters, e.g:

ls - This is not a filter because it takes no input, the only thing ls does is to list a series of files and directories, you can't even pipe anything to ls, if you try piping to ls, it would completely ignore whatever you piped to it.

Again, you can't pipe a program output to ls but you can pipe ls output to a program, the only ls does is to produce an output, nothing more.

A good way to know if a program is a filter is to first ask yourself if you can pipe information from one program into another program, and have it use the info in some meaningful way, if the answer is yes, then it is a filter, if otherwise, it isn't a filter.

To summarize, a filer is a program or text processing tool used to process the data produced by other programs or data in files.

Here is a list of major filters that are used regularly in shell scripting:

 

Filter The Processing done
cat It displays whatever input it takes
more Similar to the cat, just with pagination
grep Display lines of text from its input that contains a certain pattern
wc Counting of line, words, and characters
sort Sorting of text
tee Duplication, write to files and screen
sed Basic Editing
awk Anything

Related Post(s)

  • sed (Stream Editor) In Bash Scripting

    Another powerful filter in GNU/Linux is sed (stream editor), which is a program for performing basic editing tasks on the output of another program or on a file. sed or stream editor can be used to p

  • awk In Bash Scripting

    Up until now, we have covered a couple of text processing, from sed to grep to sort, and now we have the big daddy, which is awk. awk (which stands for Aho, Weinberger, and Kernighan - its creators)

  • (Bash Script) Automate Multiple Classicpress & Wordpress Installation + [Caching and SSL]

    If you are tired of using control panels (I am because they are mostly not secure, and most comes loaded with bundles of useless stacks), and want to do everything on the server level, I wrote a menu

  • Returning Values From Functions in (Bash)

    In the guide function in bash, we dive deep in bash functions and we also pass some information into the functions to carry out a certain operation. In this guide, I'll show you how to get informatio

  • Functions in Bash

    Functions are modular building blocks for creating powerful and modular scripts, using function makes it easy to isolate a code since it would be in a single building block, which would be isolated f

  • grep (Regular Expression) In Bash Scripting

    We recently discussed filters in bash scripting, in this guide we would look more into the practical usage of using a filter program, and an example of such a program is grep. grep, in its simplest f