Text Processing (Filters) In Bash Scripting

Ever wondered if you could quickly do some text processing in a bash script, for example, you can analyze a text or sort the output of a text, and provide it in a more meaningful way.

The good thing about doing things this way is that you can search a single query in a mammoth document, for example, say you are looking for a specific name in a document, and you need to return all the lines that the user is included in, without a filter program you won’t be able to do that.

Before we go into the program that can carry out the aforementioned operation, let’s understand what a filter is all about…

A filter is a nix or GNU/Linux command-line program that has the following properties:

  • It takes either standard input (your keyboard) or the contents of one or more files, the input has to come from somewhere, be it a keyboard input, a file, or an output from another program.
  • It performs some processing on the above data, the different processing is what separates one filter program from the other
  • It produces an output based upon that input you give it an input, you specify what you wanna process on it, and it gives you an output based on the operation it performs on the input.

Listen, it might get a bit confusing as you might think that all GNU/Linux programs are filters, which kinda make sense since they all produce an output, this is not entirely true, and to clarify what I am saying, I’ll show you an example of what a filter looks like and what it doesn’t look like.

wc is a filter, the processing it performs is counting line, words, and character from standard input or one or more files. The output it produces is the number that represents those counts it those on the standard input or files, making sense?

Let’s take an example:


devsrealm@server:~/bin$ wc data.txt
  7  58 305 data.txt

The above produces lines, words and character of the file “data.txt”

To perform operation on all files in a given directory, you do:

wc *

devsrealm@server:~/bin$ wc *
wc: backup: Is a directory
      0       0       0 backup
wc: book: Is a directory
      0       0       0 book
wc: cat,: Is a directory
      0       0       0 cat,
      0       0       0 ch1.sh
      7      58     305 data.txt
     71     450    2864 dr_optimization.sh
     25      78     482 entrance.sh
      2       2      18 example1.sh

You can see how it filters and gives the lines, words and character of each file, it even process a directory.

Now, if you try to type wc as is, you’ll see it is waiting for an input, now type any word or character, hit enter rinse and repeat, and once you are done, you hit CTRLL + D to indicate you are done typing, and wc would process all the input and gives you the output of whatever you typed, e.g:


devsrealm@server:~/bin$ wc
Things in Life Aren't That Black and White
Use your Brain With Whatver You Do
You Only Have A Time In Life
Use It Well
Gracias      4      26     126

Having said that, let’s take a look at programs that do not filter, actually, most programs on GNU/Linux aren’t filters, e.g:

ls – This is not a filter because it takes no input, the only thing ls does is to list a series of files and directories, you can’t even pipe anything to ls, if you try piping to ls, it would completely ignore whatever you piped to it.

Again, you can’t pipe a program output to ls but you can pipe ls output to a program, the only ls does is to produce an output, nothing more.

A good way to know if a program is a filter is to first ask yourself if you can pipe information from one program into another program, and have it use the info in some meaningful way, if the answer is yes, then it is a filter, if otherwise, it isn’t a filter.

To summarize, a filer is a program or text processing tool used to process the data produced by other programs or data in files.

Here is a list of major filters that are used regularly in shell scripting:


Filter The Processing done
cat It displays whatever input it takes
more Similar to the cat, just with pagination
grep Display lines of text from its input that contains a certain pattern
wc Counting of line, words, and characters
sort Sorting of text
tee Duplication, write to files and screen
sed Basic Editing
awk Anything

Comment policy: Respectful and beneficial comments are welcome with full open hands. However, all comments are manually moderated and those that doesn't relate with what the passage is saying or offensive comments would be deleted. Thanks for understanding!

Leave a Reply

Your email address will not be published. Required fields are marked *