Bash 101 Part 5: Regular Expressions in Conditional statements


I've come to rely on Regex for most of my text analysis needs. Regular Expressions are powerful, flexible and precise. They come in handy when you need to analyze the output of CLI Commands.

This article aims to:

  • Briefly cover what a regular expression is, and where you can go to learn more
  • Cover the Bash 3.2+ Syntax around regex in conditionals

Notes:

 


Regular Expressions

A Regular Expression is a type of search query for text. Using symbols you can specify exactly what you are looking for in text strings of any length. See Regular-Expressions.info, wikipedia and the rest of the references in the Notes section (above) for more detailed explanations on what Regex is and is not.

Bash uses a dialect of Regular expression which is different from Powershell / .NET. This dialect is part of the POSIX Specification. For more details, see the list of references above.

In this article I'm focusing on a particular use of Regular expression here. If I were to try and cover the entire scope of Regex in a single article it would probably kill me.

 

Regular Expressions in Bash Conditional statements

Bash allows you to compare Literal strings and variables against regular expressions when you use the [[ (double bracket) and =~ (regex comparator, equal sign tilde) in your if statements. I'll be showing examples and explaining them.

Example 1: A simple match

#!/bin/bash
#
# This script demonstrates regular expressions in bash

# String we want to analyze
var="this is A text9 strin7g with numb3rs in it"
echo "this is the string we are going to analyze:"
echo "$var"
echo "  "

# Regular expression we'll use to analyze the string
regex="[a-zA-Z]+"
echo "This is the regex we will use to try analyzing the text:"
echo "$regex"
echo "  "

# Perform regex check and determine match / no match
if [[ $var =~ $regex ]]
then
    echo "matched the following text:"
    echo ${BASH_REMATCH[0]}
else
    echo "didn't match!"
fi

 

Here's a screen capture of the script output:

1-First_regex_match_example.png

 

Key points:

  • The =~ operator is used for regular expression testing in bash conditionals
    • It is only available in Bash 3.0+
  • Regex testing is only available when used within the double bracket expression [[ ]]
  • BASH_REMATCH[0] shows the part of the text string that matched regex (if there is a match)

 

Example 2: Check for a string which can vary slightly

#!/bin/bash
#
# Example 2: Use regex to check for a string that could vary a bit

# String we want to analyze
var="this is A text9 strin7g with numb3rs in it"
echo "this is the string we are going to analyze:"
echo "$var"
echo "  "

# Regular expression we'll use to analyze the string
regex="text9 strin[0-9]g with numb[0-9]rs"
echo "This is the regex we will use to try analyzing the text:"
echo "$regex"
echo "  "

# Perform regex check and determine match / no match
if [[ $var =~ $regex ]]
then
    echo "matched the following text:"
    echo ${BASH_REMATCH[0]}
else
    echo "didn't match!"
fi

 

Screencap of the script output:

2-Second_regex_example.png

 

Key Points:

  • Bracketed expressions can be used to define a range of possible characters.
    • In this case we wanted to find out if the string we are testing meets our parameters for acceptability (are there single digit numbers in expected places in the string?)
    • You don't have to limit the bracketed expression to numbers. Single characters and ranges of characters are accepted as well (as we'll demonstrate below).
  • For more complete information on Bracketed Expressions, see the Notes section above

 

Example 3: Not a Match

#!/bin/bash
#
# Example 3: Not a match

# String we want to analyze
var="this is A text9 strin7g with numb3rs in it"
echo "this is the string we are going to analyze:"
echo "$var"
echo "  "

# Regular expression we'll use to analyze the string
regex="text9 strin[0-9][0-9]g with numb[0-9]rs"
echo "This is the regex we will use to try analyzing the text:"
echo "$regex"
echo "  "

# Perform regex check and determine match / no match
if [[ $var =~ $regex ]]
then
    echo "matched the following text:"
    echo ${BASH_REMATCH[0]}
else
    echo "didn't match!"
fi

 

Screen capture of script output:

3-Not_a_match.png

 

Key points:

  • Regex will only match (return true) if it can find text that exactly matches its pattern.
  • We altered the pattern to look for two digits in a row where our sample string only has one digit. This causes the regex check to fail.

 

Example 4: Bracketed expressions and Character Classes

#!/bin/bash
#
# Example 4: Bracketed Expressions and Character classes

# String we want to analyze
var="this is A text9 strin7g with numb3rs in it"
echo "this is the string we are going to analyze:"
echo "$var"
echo "  "

# Regular expression we'll use to analyze the string
regex="[a-z]+[[:blank:]][a-z]+"
echo "This is the regex we will use to try analyzing the text:"
echo "$regex"
echo "  "

# Perform regex check and determine match / no match
if [[ $var =~ $regex ]]
then
    echo "matched the following text:"
    echo ${BASH_REMATCH[0]}
else
    echo "didn't match!"
fi

 

Script execution output:

4-Bracketed_expressions_and_character_classes.png

 

Key points:

  • [a-z] is a bracketed expession. If the string we are testing contains any of the characters listed in the range specified in the bracketed expression (any lower case letter between a and z), this symbol is considered to be matched.
  • The Plus sign ( [a-z]+ ) in this context will match any consecutive characters that are described by the bracketed expression (lower case letters a through z)
  • [:blank:] is an example of a POSIX Character class. It can be used within a bracketed expression and has special meaning. In the case of [[:blank:]] we are looking for a single space or tab character
    • In our example above we did NOT put a plus sign outside of [:blank:]. As such, this indicates that we are only interested in a single space or tab character
  • You can use as many bracketed expressions and character classes in a regular expression as you need to make your match.

    Note: Smaller, easy to read regexes are preferred over jumbo sized "one size fits all" varieties.

 

Example 5: Matching groups and finding zero or more occurances

#!/bin/bash
#
# Example 5: Matching groups and finding zero or more occurances

# Strings we want to analyze
var="this is A text9 strin7g with numb3rs in it"
var2="this isWHQC A text9 strin7g with numb3rs in it"

# Regular expression we'll use to analyze the strings
regex="(zzzz)*[a-z]+[[:blank:]][a-z]+(WHQC)*[[:blank:]][a-zA-Z]"
echo "This is the regex we will use to try analyzing the text:"
echo "$regex"
echo "  "

# Perform regex check and determine match / no match
echo "First Check"
echo "this is the string we are going to analyze:"
echo "$var"
echo "  "
if [[ $var =~ $regex ]]
then
    echo "matched the following text:"
    echo ${BASH_REMATCH[0]}
else
    echo "didn't match!"
fi

# Perform the second check
echo "  "
echo "Second Check"
echo "this is the string we are going to analyze:"
echo "$var2"
echo "  "
if [[ $var2 =~ $regex ]]
then
    echo "matched the following text:"
    echo ${BASH_REMATCH[0]}
else
    echo "didn't match!"
fi

 

Screen capture of script output:

5-groups_and_zero_or_more_occurances.png

 

Key points:

  • The single parenthesis ( ) indicate a group
    • A group is evaluated as a "whole", so (zzzz) matches differently than zzzz
  • The asterisk ( * ) is used to denote zero or more occurances of the preceeding symbol.
    • It can be used after any single character, bracketed expression or group
  • In this example, we use groups and asterisks twice: (zzzz)* and (WHQC)*
  • To better illustrate the concept of zero-or-more, we have two test strings in this example.
    • In the first example, the string matches "this is A".
    • In the second example, the string matches  "this isWHQC A"

      Both strings are matched with the same regular expression.

 

You might ask "what good is this?". In the case of the QA Automation environment I'm setting up, I need to ensure that the output of our executable is consistent without writing a case statement or if-then-else for each possible output. A regex allows me to write something once that can check for all possible valid outputs.

This has been a long post- and it could have been longer if I had covered more topics. As it is, this just gets you started with the basics of using regex in bash. Hopefully the reference material linked in the Notes section is useful in helping you take the next steps.