RPN Calculator in AWK

Reading the standard input a line at a time and splitting it into blank-separated tokens is default in AWK, and this implementation of the calculator makes use of that. It consists of a rule and a function. The rule's pattern filters out empty and blank lines, while letting the rest be handled by the action part, in effect passing them to the function evalrpn. The latter tries to evaluate an RPN expression and returns a numeric result or an error message for printing.

When evalrpn commences execution, the input line is already broken into a series of tokens, accessible through the $ operator by their numbers, starting from 1 and ending at NF. (NF is a system variable holding the number of fields in the currend record; in our case, the record is a line, and the fields are the tokens it is split into.) evalrpn works recursively on the list of tokens, starting from the last one and proceeding backwards. Each token is matched against regular expressions, attempting to make sense of it as either a number or an operator. For an operator, two recursive calls are made to find the values of the needed arguments, and then the result is computed accordingly.

The global variable n keeps track of where the computation currently is along the sequence of tokens. It decreases by 1 with each call of evalrpn. A value of 0 at the start of any of the calls signals an error: there are too few arguments for some operation then. Once an error message is returned from evalrpn instead of a numeric result, it is propagated to the top level and eventually gets printed.

Another global variable, k, has the same initial value as n, but decreases by 1 as the function is invoked and increases as it returns. Thus the topmost invocation of evalrpn can be distinguished from the others by holding k==NF when it is about to return. Before returning from the top level call of evalrpn, we have to check whether all the tokens have been processed. If some tokens are left (n>0), then the input expression is incorrect by having an insufficient number of operations.

The variables tk, x, and y are local to evalrpn. In AWK, for a variable to be local it must be listed in the header of the corresponding function, after the names of its parameters. As evalrpn has no parameters, only the three mentioned local names are listed in its header.

/[^ \t]/  {k = n = NF; print evalrpn()}

function evalrpn(tk,x,y) {
  tk = $n
  x = "error"
  if (n-- > 0)
    if (tk ~ "^[+-]?(\\.[0-9]+|[0-9]+(\\.[0-9]*)?)$")
      x = tk
    else if (tk ~ "^[-+*/]$")
      if ((x=evalrpn())!="error") {
        y = x
        if ((x=evalrpn())!="error")
          if      (tk == "+")  x += y
          else if (tk == "-")  x -= y
          else if (tk == "*")  x *= y
          else                 x /= y
  if (++k==NF && n>0) x = "error"
  return x

The resulting program is concise and not awfully unreadable, yet a bit of annoyance remains in me. Given that AWK is a match-and-transform based language, it is tempting to think of an implementation where, instead of matching whole lines only to process them in a function, token triples of the kind number-number-operator are repeatedly searched for and replaced with the result of evaluating the operator until no more possible. That would have been a truly transformational approach where parsing is coded entirely as matching against rule patterns and computation – as applying rule actions. All work would have been done straight in the rules, with no need for functions, and properly separating parsing (patterns) from computation (actions). Alas, AWK cannot be made to apply a rule more than once on a record (line). That is why we were forced to devise a function to which both parsing and computation were delegated.