RPN Calculator in BCPL

Here is a rather lengthy implementation of the calculator. BCPL offers no convenience for string processing, and supports no floating-point or other non-integer arithmetic. Parsing the input, performing numeric calculations, and output formatting of numbers are programmed from ground up, using integers and vectors of ones. Numbers are represented as common fractions, and the respective arithmetic is implemented rather simple-mindedly; certain inputs will lead to erroneous results due to loss of significance or integer overflow, which a less naive approach would be able to avoid.

The language dictates that the main procedure is named start. It was my choice to nest all other procedures within it.

Two of them do most of the work. readln() reads a line of text from the standard input and breaks it down into tokens which it stores in the vector tks, while evalrpn() evaluates the RPN expression by traversing the token sequence backwards, starting from its end. All kinds of syntax errors are catched within readln(), so that evalrpn() only has to do the computations proper.

To read and parse the input, readln() makes use of a number of locally declared procedures. At the lowest level, getch is called to read characters one at a time. The actual reading is done by the library function sardch(), and tab characters are transparentized by changing them to spaces. (* is the ‘escape’ using which special characters are typed, so that e.g. *t and *n designate tab and new line characters.)

It ought to be noted that sardch() is a recent adition to the language. In the classical BCPL programs, sardch()'s job used to be done by rdch(), but the latter function currently returns immediately rather than waiting until a character is actually entered.

The function gettk() makes sure that a token that it assembles from individual characters is either one of the four arithmetic operators or a number. It returns a Boolean value (Boolean constants being only special syntax for two distinguished integer values), indicating whether a token is a correct one. If it is, that token is stored, by calling addtk(), in the token sequence tks.

gettk() delegates the recognition, representation and storing of numeric values to getnum(), which also discriminates between success and failure by returning a Boolean. getnum() computes a number by its digits and takes into account the position of the decimal dot if one is present. A successfully parsed number is represented as a pair of numerator and denominator. A fraction is always stored in an irreducible form which is being obtained by calling the nrz() procedure. The said procedure exemplifies the use of parameters whose values are pointers, so that the values referred to can be set in the procedure. The technique is similar to that used in the BCPL's descendant C, but in BCPL, @ gets a pointer, and monadic ! dereferences one.

The counter tcnt is initialized in readln() and maintained within gettk() in order to track the syntactic correctness of the token sequence as an RPN expression. Upon completion of reading a line, readln() does a final check (tcnt=1) to ensure that the entire expression is correctly formed.

The tks vector gets an initial size (whose value is unimportant) as soon as the program starts, and is being extended whenever the number of tokens exceeds the current size. In fact, a new vector is created, where the contents of the exhausted one are copied, the old vector is removed, and the value of tks is pointed at the start of the new vector. The 0-th element of the token vector is used throughout the program for storing the actual volume in tks occupied by tokens. Yet another element – the one at offset -1 – automatically (by language definition) stores the size of the vector as allocated. This helps keeping the entire information regarding the use of tks in a single data object, tks itself.

Each token occupies two consequtive elements of tks. For a numeric token these are the numerator and the denominator, while for an operator one of the elements stores a code of 1, 2, 3, or 4, and the second element is 0. That makes possible to tell the kind of a token at any position without need for additional information.

readln() returns to the main program one of the values defined as manifest constants of the program, signifying end-of-input (END), normal return with a correct expression that is to be evaluated (RES), empty input line (EMP), and a syntax error (ERR). Each case is treated accordingly, in particular, evalrpn() is called as needed.

Each call of evalrpn() decreases by 2 (one token) the value stored at the 0-th subscript: this way the function keeps track of the location of the current sub-expression to be evaluated – it always ends where the entire expression ends according to tks's 0-th item. Also, each call of evalrpn() returns a pointer to within tks where the result of the (sub)computation that that call performs is stored. When the sub-expression that is being evaluated is a single number, the pointer to that number is returned. When it is (i.e., it ends at) an operator, that operator is eventually replaced by the result of applying it to its operands, and a pointer to it is returned. All such results are, again by calling nrz, stored as irreducible fractions.

After the main program obtains the result from evalrpn(), it does some computations in order to print it as a decimal fraction with at most four digits after the decimal dot: first it prints the integer part, then scales and rounds up the fractional part. If the fractional part appears to end in some number of 0s, those 0s are removed. This may lead to removing the whole fractional part along with the decimal dot.

CASE statements in BCPL are similar to C's (or vice versa) in that they normally have to be exited explicitly, because otherwise the execution passes to the next clause, which is usually not wanted. One way to exit a CASE is through ENDCASE, as in evalrpn(). In one place of the main program, namely at CASE RES:, we do allow control to reach the body of the next clause, because we indeed have to execute LOOP, i.e., jump to a next iteration of REPEAT.

Most local variables in the program are dynamic, stack-based. Such variables, similarly to procedures, are declared with the LET keyword and must be given an initial value. Whenever the initial value is immaterial, the indeterminate (?) pseudo-value comes in handy. Several variables local to start() and readln() are declared STATIC. This is so that the procedures nested at the same or deeper level as those variables are able to access them (BCPL prohibits referring to dynamic local variables of outer procedures).

Finally, several remarks on BCPL's syntax are in order.

As this program demonstrates, curly braces { and } are used as block delimiters. However, the braces are a recent adoption. The more traditional pair, which is still accepted, is $( and $). These character combinations were chosen back in the 1960s when many computer keyboards had neither braces nor square brackets. At that time, there was also a separate pair of characters for printed representation of BCPL programs: § (section character) and the same character vertically overstroke. The § is a part of the syntax heritage of BCPL from CPL.

That the vector and string subscript operators are single characters (dyadic ! and %) rather than a pair puts BCPL in the very small group of languages having this property. Most other languages introduce unnecessary syntactic idiosyncrasy by forcefully parenthesizing subscript expressions.

The construct …->…,… is the conditional expression known to C, C++ etc. programmers as …?…:….

One may observe that the bodies of IF, TEST, FOR, and other statements are preceded by a DO keyword, but not always. The rule is that DO may be omitted if the immediately following token is a { or a keyword. I tend to find this convenient and like to make use of it.

GET "libhdr"

MANIFEST {RES = 1; EMP = 0; END = -1; ERR = -2}

LET start() BE {
  STATIC {digs; ops; tks}

  LET nrz(x,y,a,b) BE
    TEST x=0 DO !a, !b := 0, 1
    OR {
      LET s, t = ((x>0)=(y>0) -> 1, -1), ?
      x, y := ABS x, ABS y
      !a, !b := x, y
      IF x<y DO t, x, y := x, y, t
      UNTIL y=0 DO t, x, y := x MOD y, y, t
      !a, !b := s*!a/x, !b/x
    }  // nrz

  LET readln() = VALOF {
    STATIC {tcnt; ch}

    LET getch() = VALOF {
      LET c = sardch()
      RESULTIS c='*t' -> ' ', c
    }  // getch

    LET index(c,cs) = VALOF {
      FOR i=1 TO cs%0
        IF c=cs%i RESULTIS i
      RESULTIS 0
    }  // index

    LET gettk() = VALOF {
      LET sgn, o = 1, ?

      LET addtk(v,va) BE {
        LET sz, m = tks!-1, tks!0
        m := m+2
        IF sz<=m {
          LET p = getvec(2*sz)
          FOR i=1 TO sz DO p!i := tks!i
          tks := p
        tks!0, tks!(m-1), tks!m := m, v, va
      }  // addtk

      LET getnum() = VALOF {
        LET r, d, p, i = 0, FALSE, 0, ?
          i := index(ch,digs)
          IF i=0 BREAK
          TEST i<=10 {
            r := 10*r+i-1
            d := TRUE
            IF p>0 DO p := 10*p
          } OR {
            IF p>0 RESULTIS FALSE
            p := 1
          ch := getch()
        } REPEAT
        UNLESS d & (ch=' ' | ch='*n') RESULTIS FALSE
        IF r=0 | p=0 DO p := 1
        {LET a, b = ?, ?;  nrz(r,p,@a,@b);  addtk(a,b)}
      }  // getnum

   // begin gettk
      o := index(ch,ops)
      IF o>0 {
        ch := getch()
        IF o>2 | index(ch,digs)=0 {
          tcnt := tcnt-1
          TEST tcnt>0 & (ch=' ' | ch='*n') {addtk(o,0); RESULTIS TRUE}
        IF o=2 DO sgn := -1
      IF sgn=-1 DO tks!(!tks-1) := -tks!(!tks-1)
      tcnt := tcnt+1
    }  // gettk

 // begin readln
    tcnt, ch := 0, ' '
      WHILE ch=' ' DO ch := getch()
      SWITCHON ch INTO {
        CASE endstreamch: RESULTIS !tks=0 -> END, ERR
        CASE '*n':        RESULTIS !tks=0 -> EMP, tcnt=1 -> RES, ERR
        DEFAULT:          UNLESS gettk() {
                            UNTIL ch='*n' DO ch := getch()
                            RESULTIS ERR
    } REPEAT
  }  // readln

  LET evalrpn() = VALOF {
    LET n = tks+!tks
    !tks := n-tks-2
    IF !n=0 {
      LET q = evalrpn();  LET c, d = !(q-1), !q
      LET p = evalrpn();  LET a, b = !(p-1), !p
      SWITCHON !(n-1) INTO {
        CASE 1: nrz(a*d+b*c,b*d,@p,@q) ENDCASE
        CASE 2: nrz(a*d-b*c,b*d,@p,@q) ENDCASE
        CASE 3: nrz(a*c,b*d,@p,@q) ENDCASE
        CASE 4: nrz(a*d,b*c,@p,@q)
      !(n-1), !n := p, q
  }  // evalrpn

// start of the main program
  digs, ops := "0123456789.", "+-**/"
  tks := getvec(101)
    !tks := 0
    SWITCHON readln() INTO {
      CASE RES: {LET r = evalrpn();  LET n, d = !(r-1), !r
                 n := muldiv(n MOD d,100000,d)
                 d, n := n MOD 10, n/10
                 IF d>4 DO n := n+1 MOD 10000
                 FOR i=1 TO 3 IF n MOD 10 = 0 DO n := n/10
                 IF n>0 {wrch('.'); writen(n)}
      CASE ERR: writes("error*n")