Refal

Refal (Рефал) was created in Russia, then USSR, in the second half of the 1960s. It is a functional programming language in which computation is driven by pattern matching on textual and symbolic information.

Computation Model and Program Structure

A program in Refal is a set of function definitions, among which one is designated as main. Program execution starts with the main function and consists of performing matching operations and evaluating expressions. Expression evaluation usually involves calling functions from the standard library or defined in the program. The outcomes of the matching operations determine which expressions are being evaluated and supply data for the evaluation. In turn, function calls initiate new matchings etc.

A function is a sequence of pattern-matching clauses. A pattern-matching clause consists of a pattern and a result expression. When a function is invoked, the argument of the call is matched against the patterns, in the order of their appearance in the function definition, until a match succeeds. The corresponding expression is then evaluated and its value is returned as the result of the function call. The expression may refer to variables contained in the matching pattern. In the course of matching, such variables get bound to respective parts of the function argument, and the bindings are available within the expression evaluation. Thus, the matching mechanism not only controls which expression to evaluate, but it also provides the actual computation with values according to the structure and content of the function argument for that function's current invocation.

Pattern structure and matching are based on the Refal's specific view of data. A data object is either a symbol or a sequence, possibly empty, of symbols and other sequences. Those other sequences must be put in parentheses and by their inclusion are effectively nested in the larger sequence. A sequence is therefore also a tree-like structure with symbols at the leaf positions. Sequencing – and the nesting that is being involved – is the only data structuring mechanism in Refal.

A symbol is a character, a ‘word’, or a number – these are the language's primitive values. Examples are 'f' (a character symbol), ABC and "that" (word symbols), and 19860725 (a number symbol). If the content between the quotes of a double-quoted word looks like an identifier (i.e. does not contain spaces or other special characters), the quotes can be omitted, e.g. "xyz" is the same as xyz. In any case, words are atomic values, not sequences of characters. A sequence of characters can be written 'xyz' – short for 'x' 'y' 'z'.

Almost none of the Refal implementations includes floating-point numbers. Only integers are considered. On the other hand, integers are unlimited (bignums). However, except in Refal+, really large integers are not symbols but sequences of symbols. There are non-negative integers of limited size called ‘macrodigits’ which are symbols. A large positive integer is represented as a sequence of ‘macrodigits’. Negative numbers are sequences with the character symbol '-' as the first element. Consequently arithmetic functions have to accept both atomic and non-atomic arguments.

The following examples illustrate the use of data structuring and pattern matching in Refal programming. The first function constructs a ‘flat’ sequence of all the symbols – at all levels of nesting – in a given sequence. (Semicolons separate clauses, similarly to separating statements in many other languages.)

Flatten { =
        ;s.h e.t   = s.h <Flatten e.t>
        ;(e.h) e.t = <Flatten e.h> <Flatten e.t>}

The first clause says that for an empty sequence the value returned must be also an empty sequence. The second clause matches an argument sequence whose initial element is a symbol. If the pattern, s.h e.t, of that clause matches, the symbol is stored in the variable s.h, while the rest of the sequence is stored in e.t. Similarly, the third clause matches a sequence beginning with a nested sequence, and stores that nested sequence in e.h, and the rest in e.t. The value to be returned in the latter two cases is obtained by concatenating two expressions and involves calling the same function recursively. Example invocation is
<Flatten a be (ce (de 13 (()) se)) que nce>
and the corresponding return value is
a be ce de 13 se que nce.

Function calls are written in prefix notation and are delimited by angle brackets. This is superficially similar to Lisp but the meaning is very different: a function in Refal can only have one argument, but its value may be specified as a concatenation of parts, as in the example call. So, brackets are used to separate an argument expression from the surrounding context.

As can be observed, the names of the variables are prefixed by a letter and a dot. The letter is a structural type designator and cannot be omitted, as it reflects the structure of the value that can match the corresponding variable. All in all, there are three structural types. Two of them are symbols and sequences, denoted s and e (e is for expression, as sequences are the general kind of expression in Refal). The third one is called term and is not used in the examples. The part after the dot must be an ordinary identifier or a number.

(For the sake of convenience both a symbol and a sequence in parentheses are considered terms. The notion of term, although not essential to the language, unites symbols, i.e. atomic data, with sequences that represent arbitrarily complex hierarchies. Note that a sequence may be parenthesized regardless of it being nested in another sequence. This has two useful implications. First, a parenthesized sequence is like a symbol in that it can be treated as an indivisible value – and saying ‘term’ emphasizes this by not distinguishing between the two kinds of data. Second, when nesting does take place, a sequence may be viewed, as need dictates, as linear and homogeneous – a succession of terms, or as a hierarchy of terms that is anything between the linear succession and the tree of symbols ultimately making up the sequence: for any individual node in that tree, by saying ‘term’ one expresses lack of interest in its internal structure.)

Notice that symbols are a subset of terms, and the latter, in turn, are a subset of expressions (sequences). (A term is a one-element sequence.) Accordingly, s-variables can only match symbols, t-variables match symbols and parenthesized expressions, and e-variables match arbitrary expressions.

Assigning types to variables is a way, along with sequencing and parentheses, of specifying the structure of what is being matched by a pattern. Besides structural constraints, specific content may be required. This is done by including constant symbols in the pattern, as in the function below. It replaces a comma in a (say, character) sequence with a semicolon, but only if there is a dot further on in that sequence:


f {e.1 ',' e.2 '.' e.3 = e.1 ';' e.2 '.' e.3; e.x = e.x}

The first clause does the matching and the corresponding replacement. The second ensures that any other sequence is returned unchanged.

The next example shows how constraining the matching argument by content is also possible by using the same variable more than once in a pattern. The following function searches a given sequence for the presence of a symbol or a sub-sequence equal to some given one.

In-seq {s.x e.1 s.x e.2   = T e.2
       ;(e.x) e.1 e.x e.2 = T e.2
       ;e.x               = F}

The pattern of the first clause matches a sequence in which the first item is a symbol, and the same symbol is encountered further in the sequence. This is specified by using the same variable s.x twice in the pattern. Similarly, the pattern of the second clause matches a sequence starting with an expression within parentheses, to which the variable e.x is bound, and that expression is encountered (without nesting) once more in the argument sequence. In both cases, the variables e.1 and e.2 can match anything, including empty sequences, as also does e.x in the second and third clauses. The third clause is necessary in order to catch, through a most general pattern, all other sequences.

When the search is successful, the function returns a sequence of the symbol T followed by the part of the argument sequence after the matching occurrence of s.x or e.x. If the search fails, the symbol F is returned. Labeling the value being returned with a symbol is a common technique for distinguishing between different cases that cannot be structurally discriminated. The particular case of modelling Booleans (absent from Refal) is typical.

Example invocations are
<In-seq co cor nu co pi a>             returning T pi a
<In-seq a cor nu co pi a>              returning T
<In-seq (co pi) cor nu co pi a>        returning T a
<In-seq (co pi) cor nu (co pi) a>      returning F
<In-seq ((co pi)) cor nu (co pi) a>    returning T a
<In-seq ((co pi)) cor nu ((co pi) a)> returning F

Considering a function to have two or more arguments is merely a convention; as said above, a function can only have one argument. What we really mean is structuring that argument into identifyable parts. To achieve that, it usually suffices to represent each argument, with the possible exception of the last one, as a term within the argument sequence.

This is indeed the case with our example above: the ‘first argument’ of the function – the value to search for – is actually the first term of the argument, and the ‘second argument’ – the sequence to search in – is the remainder of the one actual argument. Moreover, the function makes use of both kinds of term for the first argument, thus exhibiting a specific form of polymorphism.

Functions and pattern variables are the only program objects that can have names. The former are always global, and the latter local. One cannot have, e.g., a global constant or another globally defined value. One can, however, define functions that act as global values. For instance,

Holmes-address {= (("221b" Baker Street) London) ("NW1 6XE" England)}

defines a function whose only pattern-matching clause matches an empty value. Calling it with no (i.e. empty) argument is guaranteed to succeed and produces a certain sequence; e.g., <Prout <Holmes-address>> prints out ((221b Baker Street) London) (NW1 6XE England).

Evolution

The language described so far is nearly all of what has become known as basic Refal, which is, more or less, the original definition of the language. As several dialects of Refal came to life with its use, the name ‘basic Refal’ was introduced to denote their ‘common denominator’.

The major dialects are the following, all with available implementations.

Refal-2 emerged as an implementation that appeared soon after the original one, and is rather close to it. For years, it was the only one in use, but its syntax differs significantly from the more modern dialects and should be considered obsolete now. This dialect is no more developed and seems to be almost out of use.

Refal-5 is a major revision of the original language by its author, introducing several new constructs. It, too, is no more developed, but continues to be in active use and is, by its pedigree, ‘the classic’ version of the language.

Refal-6 started as an implementation of Refal-5 but later evolved enough to be considered a dialect of its own. There has been some mutual influence between Refal-6 and Refal+ in the course of their development. With respect to Refal-5, Refal-6 has some additional datatypes and other innovations, but its development seems to have stalled.

Refal-6 is the only one of the dialects of Refal that has some provision for using floating-point numbers, but the provision is anomalous. Notably, there is no means for reading such numbers.

Refal+ (Refal Plus) is the most advanced dialect, and the only one that is still evolving. Although not changing the language in any way that can be considered radical, Refal+ introduces a number of improvements, such as: true, atomic integers regardless of size; controllable and composable outcome (success/failure) of the matching operations; interceptable errors (exception handling); more complex syntax aiming at more versatile expression of complex evaluations; library functions implementing vectors and tables, and other additions to the standard library; mandatory function type declaration.

Historical Perspective and Use

At the time of its creation, Refal was innovative in a number of respects:

it was one of the first and very few languages to promote a genuinely non-imperative, stateless style of programming;
it was of the very first examples of a language using pattern matching to define functions and drive computation;
defining functions as one-argument, one-result was also a very early examlpe of incorporating the concept, which only much later became dominant through other functional programming languages;
Refal's sequences were more natural to process than analogous structures in other languages (see below);
it offered facilities for metaprogramming;
it was a garbage-collected language, and was implemented on top of a virtual machine – both rare features at the time.

Even if Refal was created today, forty years later, with this set of features it would have ranked among the most advanced languages!

Refal compares favourably to more widely accepted languages for symbolic computing, such as Lisp and Prolog. Unlike them, Refal's sequences are symmetric rather than ‘consed up’ in one direction. They are equally well modified, traversed, and pattern-matched not only from both ends at the top level, but also to any level of nesting. Moreover, the forward-directed pattern-matching-and-processing (from the given data towards achieving the goal) of Refal is conceptually simpler for general program constructon than the backward-directed (starting from the goal and working backwards) model of computation in Prolog.

There is a vague similarity between Refal and Snobol, whose model of computation is also based on the Markov Normal Algorithms. The similarity is particularly notable with respect to Refal+'s success/failure semantics, e.g. a predicate either fails, or is successful and returns an empty result – just as in Snobol. In most other respects the two languages differ.

There is also similarity, with respect to defining functions through pattern matching, between Refal and modern functional programming languages such as Haskell and ML. However, pattern formation in those languages is founded on their strong, constructive typing and is more abstract, while in Refal – an (almost) typeless language – patterns reflect the immediate structure and content that text sequences have.

Refal's simplicity of concept, together with practical ability for programming from simple to large and sophisticated programs, makes it an attractive tool for text – including natural language – processing, and manipulating symbolic structures as in artificial intelligence. One area of application is XML document manipulation, where using Refal has been shown to be advantageous to using XSLT (see the corresponding reference in the Links section).

Worth mentioning is that Refal's author V. Turchin (В. Турчи́н), beside the language itself, also invented a technique for globally optimizing program transformation that he called ‘supercompilation’. The latter can, in certain cases, dramatically increase the efficiency of programs. Until very recently, the technique has only been successfully applied to Refal, but work on Haskell, Scala and other functional languages seems promising.

Criticism

From today's perspective, Refal has certain deficiencies, which can be systematized in three groups.

Deficiencies for general programming.
Sequencing – and the nesting that is being involved – is the only data structuring mechanism in Refal. Moreover, other than distinguishing between the kinds of symbols and putting them in sequences (effectively trees) Refal has no notion of datatype. In particular, there is no provision for (explicit) user-defined datatypes or structures.

There are no Boolean values, with the respective arithmetic on them, or anything of the kind. Succeeding/failing is being used in place of true/false, but this is not always sufficient; besides, true failure control is only present in some dialects.

Refal has no control structures, even explicit conditional evaluation is absent. The only form of choice making is through pattern matching against alternative patterns within a function. The only form of repetition is recursion. (And – as there is no support for higher-order functions – recursion is always explicit.)

There is no provision for local definitions of variables or functions, other than the one built-in in the pattern-matching mechanism. This is partially compensated for in some dialects, esp. Refal+, through elaboration of the evaluation model, but that complicates the syntax a lot without completely solving the problem.

Working with sequences of ‘macrodigits’ rather than atomic integers is prone to inconvenience and errors. The lack of real (and why not fractional and complex) numbers is vexing.

The pattern language cannot express alternatives or repetition, so these have to be explicitly programmed. Things that would be easy to do by trivial application of regexes can be very inconvenient to express in Refal.

(Perhaps not very important) Patterns are not data objects.

Refal's standard library is much wanting. It lacks even frequently needed functions, such as absolute value, arithmetic negation, and min/max. Also striking is the absence of functions for processing text and symbol structures. The library is both generally deficient and mostly unrelated to the language.

Refal lacks openness, such as Internet connectivity, foreign-function interface, etc., and that certainly limits its scope of applicability.
Deficiencies for functional programming.
Refal is almost exclusively a functional programming language because it is, in essence, an expression evaluator: side effects are very restricted and their use discouraged. But – unlike any other functional programming language – functions are not first-class objects in Refal. First of all, they are not data objects, and cannot be parts of other data objects. (The closest to functions-as-data that one can get is using a function's name as a value for invoking the function indirectly, like with Lisp's funcall or apply.) Anonymous functions cannot be created, let alone closures. Function composition forms are not provided. Local functions are also not allowed, although this problem is partly circumvented by other means.

The standard library provides no higher-order functions and has nothing to do with functional programming at all.
Problems with the definition.
As has become clear by now, there is no single definition of the language. Rather, like Lisp, Forth and others, Refal is the name of a family of very similar languages with common ancestry. There has never been a standard of a kind.

In addition, the definitions of the available dialects/implementations are partly outdated and incomplete.

Links of Relevance

A community page for Refal

Major dialects:: Refal-2; Refal-5
      a textbook in Russian (somewhat outdated) and in English
      Refal-SciTE: an IDE for Refal-5 based on SciTE
      Refal-PHP: a programming system integrating the two languages; Refal-6; a user manual (in Russian); Refal+, also here; documentation in Russian

Two sets of lecture notes (in Russian)

An article showing Refal as an XML processor

An article showing possible uses of Refal in school (in Russian): .pdf, .ps

Three articles that initiated the field of supercompilation

boykobbatgmaildotcom