PLACE
spec
solutions
Bash

RPN Calculator in Bash

bash has been chosen, among other shells, for being the default shell on all versions of Linux, as well as other Unices.

The program, or script, starts with two necessary settings. The contents of the IFS variable define the set of characters that are treated as delimiters in splitting strings into ‘words’. set -f disables the so called filename expansion – a process in which certain characters in command arguments, such as * and ?, are considered patterns and implicitly replaced by matching filenames.

The rest of the program repeatedly executes the read command to obtain text lines from the standard input. The -r option is for ‘raw mode’, meaning that an input line cannot be continued (by default, \ is treated as a continuation character). The -a option instructs to split the input into words and store them as an array. Consequently, tks becomes an array of strings rather than a single string.

If the array turns to be 0-sized, resulting from an empty or whitespace-only input line, the rest of the loop is skipped. If not, tks's entries are processed one by one in an inner loop, where each string tk is pattern-matched to find out whether it represents a number, an operator, or is something else. The variable n increases or decreases according to encountering a number or an operator, and is set to 0 on an incorrect token. An expression is syntactically correct if n>0 at each step, and n=1 finally holds.

As bash itself allows computing only with integers, and modelling fractional arithmetic in it is relatively tedious, an expression that is found correct is evaluated by filtering it through the dc program. The 6k command sent to dc before the expression tells it to perform calculations with a precision of 6 digits after the decimal point. The p after the expression is the dc's printing command. The expression itself is specified as the contents of the entire tks array, ${tks[*]}.

[ dc is an RPN calculator language and interpreter that can be used in a terminal interactively or by writing scripts in it. It is one of the oldest Unix utilities, in fact – ‘the senior language on Unix systems’ (in the words of M. D. McIlroy) – and has been ubiquitous on Unix-like systems ever since. ]

The expressions in double parentheses in the program are a means of bash to do arithmetics and numeric assignments, which otherwise would be written in a more cumbersome way. Similarly, [ ... ] and [[ ... ]] are a syntactic convenience for writing down conditions of all sorts. They are similar in function, but the latter is more powerful: some operators, such as =~ here, are allowed in [[ ... ]] but not in [ ... ].

Throughout the program ; is used for command separation. If not present, an end of line is required at the same place.

IFS=$' \t'
set -f
while read -r -a tks; do
  if [ ${#tks[*]} -eq 0 ]; then continue; fi
  ((n = 0))
  for tk in ${tks[*]}; do
    if [[ $tk =~ ^[-+]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)$ ]]; then
      ((++n))
    elif [[ $tk =~ ^[-+*/]$ ]]; then
      ((--n))
      if [ $n -le 0 ]; then break; fi
    else
      ((n = 0))
      break
    fi
  done
  if [ $n -eq 1 ]; then
    echo '6k' ${tks[*]} 'p' | dc
  else
    echo 'error'
  fi
done

That it is possible to pattern-match against regular expressions within bash is not essential. The grep utility, standard in a Unix environment, could have been used instead. In order to check whether tk reads as a number, one would replace the corresponding [[ … ]] expression with
    echo $tk | grep -q -E -e '^[-+]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)$',
which runs grep with the contents of tk piped into its standard input. Similarly for checking whether tk is an opertaor.

Also inessential is that dc which is used to evaluate an expression understands reverse Polish notation. As the expression is already syntax checked and stored token by token in an array, it is easy to carry on the computations by following any of the forward, backward, or reduction strategies. What is indeed essential is implementing real-valued arithmetic.

The following bash script is derived from the previous one. Instead of feeding all the expression into dc, it evaluates each single operation, as it is encountered, with its two operands and puts back the result in place of the three respective entries in the tks array. Thus it gradually reduces the expression to a single numeric result. If insufficient number of operands is found for some operation, evaluation is abandoned and eventually an error message is printed. Evaluating an expression is successful if tks is processed entirely (i equals n) and there is a single entry, a number, left in it (n equals 1). That number is then displayed. Otherwise, i.e. in case of a wrong token, or incorrectly constructed sequence of numbers and arithmetic operators, an error message gets printed.

The inner loop is nominally infinite – a while loop with an always true condition denoted : – and is really a post-test loop terminating upon i becoming equal to n. bash has no construct for expressing post-test loops directly. Oddly, it has an until-do-done loop which only differs from while-do-done by inverting the meaning of the condition: the latter defines when to terminate the loop but is still tested before executing the loop's body.

Special care is taken to compact tks after an operator is applied. As only one entry must be left where there have been three, two of the array elements are given null values, and then tks=(${tks[*]}) ensures that the so created holes are removed from the array by actually constructing it anew.

For nulling variables bash offers the unset command, so tks[i]= could be replaced by unset tks[i]. The difference between assigning a null value and unsetting is that the latter makes bash believe that the array's length (${#tks[*]}) has decreased. But that is only confusing, as the ‘unset’ indices will still point at null values, and all other indices will still point at where they pointed before unset. In other words, unsetting array entries makes them void but still present, and compacting is still necessary in order to truly eliminate them.

In this script the use of dc has been restricted to doing single operations, and of course any other program that can perform +, -, *, and / would do just as well. An external program will not be needed at all if a shell such as ksh or zsh with built-in support for fractional arithmetic is used.

IFS=$' \t'
set -f
while read -r -a tks; do
  ((n = ${#tks[*]}))
  if [ $n -eq 0 ]; then continue; fi
  ((i = 0))
  while :; do
    tk=${tks[i]}
    if [[ $tk =~ ^[-+]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)$ ]]; then
      ((++i))
    elif [[ $tk =~ ^[-+*/]$ ]]; then
      if [ $i -lt 2 ]; then break; fi
      tks[i-2]=$(echo '6k' ${tks[*]:i-2:3} 'p' | dc)
      tks[i]=
      ((--i))
      tks[i]=
      ((n -= 2))
      tks=(${tks[*]})
    else
      break
    fi
    if [ $i -eq $n ]; then break; fi
  done
  if [ $i -eq $n -a $n -eq 1 ]; then
    echo ${tks}
  else
    echo 'error'
  fi
done

boykobbatgmaildotcom