Pattern matching examines a "subject" string for some combination of characters, called a "pattern." The matching process may be very simple, or extremely complex. For example:
Most programming languages provide rudimentary facilities to examine a string for a specific character sequence. SNOBOL4 patterns are far more powerful, because they can specify complex (and convoluted) interrelationships. The colors of a painting, the words of a sentence, the notes of a musical score have limited significance in isolation. It is their "relationship" with one another which provides meaning to the whole. Likewise, SNOBOL4 patterns can specify "context;" they may be qualified by what precedes or follows them, or by their position in the subject.
Patterns are composed of "known" and "unknown" components.
"Knowns" are specific character strings, such as the string "BLUE" in the first example above. We are looking for a yes/no answer to the question: "Does this known item appear in the subject string?"
"Unknowns" specify the "kind" of subject characters we are looking for; the specific characters are not identifiable in advance. We might want to match only characters from a restricted alphabet, or any substring of a certain length, or some arbitrary number of repetitions of a string. If the pattern matches, we can then "capture" the particular subject substring matched.
A pattern match requires a subject string and a pattern. The subject is the first statement element after the label field (if any). The pattern appears next, separated from the subject by white space (blank or tab). If SUBJECT is the subject string, and PATTERN is the pattern, it looks like this:
label SUBJECT PATTERNThe pattern match "succeeds" if the pattern is found in the subject string; otherwise it fails. This success or failure may be tested in the GOTO field:
label SUBJECT PATTERN :S(label1) F(label2)A real point of confusion is the distinction between pattern matching and concatenation. How do you tell the difference? Where does the subject end and the pattern begin? In this case, parentheses should be placed around the subject, since SNOBOL4 always uses the first "complete" statement element as the subject. In the statement
X Y ZX is the subject, and Y concatenated with Z is the pattern. Whereas
(X Y) Zindicates the subject is string X concatenated with string Y, and the pattern is Z.
The subject string may be a literal string, a variable, or an expression. If it is not a string, its string equivalent will be produced before pattern matching begins. For example, if the subject is the integer 48, integer to string conversion produces the character string "48". Remember, if the subject includes concatenated elements, they should be enclosed in parentheses.
Arithmetic expressions are composed of elements and simpler subexpressions. Similarly, patterns are composed of simpler subpatterns which are joined together as "subsequents" and "alternates." If P1 and P2 are two subpatterns, the expression
P1 P2is also a pattern. The subject must contain whatever P1 matches, immediately followed by whatever P2 matches. P2 is the "subsequent" of P1. The white space (blank or tab) between P1 and P2 is the same binary concatenation operator previously used to join strings; its use with patterns is completely analogous. The preceding pattern matches pattern P1 "followed by pattern" P2.
The binary "alternation" operator is the vertical bar (|). As it is a binary operator, it must have white space on each side. The pattern
P1 | P2matches whatever P1 matches, "or" whatever P2 matches. SNOBOL4 tries the various alternatives from left to right.
Normally, concatenation is performed before alternation, so the pattern
P1 | P2 P3matches P1 alone, or P2 "followed by" P3. Parentheses can be used to alter the grouping of subpatterns. For example:
(P1 | P2) P3matches P1 "or" P2, followed by P3.
When a pattern successfully matches a portion of the subject, the matching subject characters are "bound" to it. The next pattern in the statement must match beginning with the very next subject character. If a subsequent fails to match, SNOBOL4 backtracks, unbinding patterns until another alternative can be tried. A pattern match fails when SNOBOL4 cannot find an alternative that matches.
The null string may appear in a pattern. It always matches, but does not bind any subject characters. We can think of it as matching the invisible space "between" two subject characters. One possible use is as the last of a series of alternatives. For example, the pattern
ROOT ('S' | 'ES' | '')matches the pattern in ROOT, with an optional suffix of 'S' or 'ES'. If ROOT matches, but is not followed by 'S' or 'ES', the null string matches and successfully completes the clause. Its presence gives the pattern match a successful escape.
The conditional functions of the previous chapter may appear in patterns. If they fail when evaluated, the current alternative fails. If they succeed, they match the null string, and so do not consume any subject characters. They behave like a gate, allowing the match to proceed beyond them only if they are true. This pattern will match 'FOX' if N is 1, or 'WOLF' if N is 2:
EQ(N,1) 'FOX' | EQ(N,2) 'WOLF'Parentheses may be used to factor a pattern. The strings 'COMPATIBLE', 'COMPREHENSIBLE', and 'COMPRESSIBLE' are matched by the pattern:
'COMP' ('AT' | 'RE' ('HEN' | 'S') 'S') 'IBLE'
Here are examples of pattern matches using a string literal or variable for the subject. The patterns consist entirely of known elements. Use the CODE.SNO program to experiment with them:
? 'BLUEBIRD' 'BIRD' Success ? 'BLUEBIRD' 'bird' Failure ? B = 'THE BLUEBIRD' ? B 'FISH' Failure ? B 'FISH' | 'BIRD' Success ? B ('GOLD' | 'BLUE') ('FISH' | 'BIRD') SuccessThe first statement shows that the matching substring ('BIRD') need not begin at the start of the subject string. This is called "unanchored" matching. The second statement fails because strings are case sensitive, unlike names and labels. The third statement creates a variable to be used as the subject. The fifth statement employs an alternate: we are matching for 'FISH' or 'BIRD'.
The last statement uses subsequents and alternates. We are looking for a substring in B that contains 'GOLD' or 'BLUE', followed by 'FISH' or 'BIRD'. It will match 'GOLDFISH', 'GOLDBIRD', 'BLUEFISH' or 'BLUEBIRD'. If the parentheses were omitted, concatenation of 'BLUE' and 'FISH' would be performed before alternation, and the pattern would match 'GOLD', 'BLUEFISH', or 'BIRD'.
If we execute the statement
? COLOR = 'BLUE'the variable COLOR contains the string 'BLUE', and could appear in the pattern portion of a statement:
? B COLOR SuccessEven though it is used as a pattern, COLOR has the "string" data type. However, complicated patterns may be stored in a variable just like a string or numeric value. The statement
? COLOR = 'GOLD' | 'BLUE'will create a "structure" describing the pattern, and store it in the variable COLOR. COLOR now has the "pattern" data type. The preceding example can now be written as:
? CRITTER = 'FISH' | 'BIRD' ? BOTH = COLOR CRITTER ? B BOTH Success
If the pattern match
B BOTHsucceeds, we may want to know which of the many pattern alternatives were used in the match. The binary operator "conditional assignment" assigns the matching subject substring to a variable. The operator is called conditional, because assignment occurs ONLY if the entire pattern match is successful. Its graphic symbol is a period (.). It assigns the matching substring on its left to the variable on its right. Note that the direction of assignment is just the opposite of the statement assignment operator (=). Continuing with the previous example, we'll redefine COLOR and CRITTER to use conditional assignment:
? COLOR = ('GOLD' | 'BLUE') . SHADE ? CRITTER = ('FISH' | 'BIRD') . ANIMAL ? BOTH = COLOR CRITTER ? B BOTH Success ? OUTPUT = SHADE BLUE ? OUTPUT = ANIMAL BIRDThe substrings that match the subpatterns COLOR and CRITTER are assigned to SHADE and ANIMAL respectively. The statement
BOTH = COLOR CRITTERhad to be reexecuted because its previous execution captured the old values of COLOR and CRITTER, without the conditional assignment operators. The redefinition of COLOR and CRITTER was not reflected in BOTH until the statement was reexecuted.
Conditional assignment may appear at any level of pattern nesting, and may include other conditional assignments within its embrace. The pattern
(('B' | 'F' | 'N') . FIRST 'EA' ('R' | 'T') . LAST) . WORDmatches 'BEAR', 'FEAR', 'NEAR', 'BEAT', 'FEAT', or 'NEAT', assigning the first letter matched to FIRST, the last letter to LAST, and the entire result to WORD.
The variable OUTPUT may be used as the target of conditional assignment. Try:
? 'B2' ('A' | 'B') . OUTPUT (1 | 2 | 3) . OUTPUT B 2 Success
All of the previous examples used patterns created from literal strings. We may also want to specify the "qualities" of a match component, rather than its specific characters. Using unknowns greatly increases the power of pattern matching. There are two types, primitive patterns and pattern functions.
There are seven primitive patterns built into the SNOBOL4 system. The two used most frequently will be discussed here. Chapter 7, "Advanced Topics," introduces the remaining five.
REM is short for the REMainder pattern. It will match zero or more characters at the end of the subject string. Try the following:
? 'THE WINTER WINDS' 'WIN' REM . OUTPUT TER WINDS SuccessThe subpattern 'WIN' matched its first occurrence in the subject, at the beginning of the word 'WINTER'. REM matched from there to the end of the subject string -- the characters 'TER WINDS' -- and assigned them to the variable OUTPUT. If we change the pattern slightly, to:
? 'THE WINTER WINDS' 'WINDS' REM . OUTPUT Successthen 'WINDS' matches at the end of the subject string, leaving a null remainder for REM. REM matches this null string, assigns it to OUTPUT, and a blank line is displayed.
The pattern components to the left of REM must successfully match some portion of the subject string. REM begins where they left off, matching all subject characters through the end of string. There are no restrictions on the particular characters matched.
ARB matches an ARBitrary number of characters from the subject string. It matches the shortest possible substring, including the null string. The pattern components on either side of ARB determine what is matched. Try the statements
? 'MOUNTAIN' 'O' ARB . OUTPUT 'A' UNT Success ? 'MOUNTAIN' 'O' ARB . OUTPUT 'U' SuccessIn the first statement, the ARB pattern is constrained on either side by the known patterns 'O' and 'A'. ARB expands to match the subject characters between, 'UNT'. In the second statement, there is nothing between 'O' and 'U', so ARB matches the null string. ARB behaves like a spring, expanding as needed to fill the gap defined by neighboring patterns.
During a pattern match, the "cursor" is SNOBOL4's pointer into the subject string. It is integer valued, and points "between" two subject characters. The cursor is set to zero when a pattern match begins, corresponding to a position immediately to the left of the first subject character. As the pattern match proceeds, the cursor moves right and left across the subject to indicate where SNOBOL4 is attempting a match. The value of the cursor will be used by some of the pattern functions that follow.
The "cursor position" operator assigns the current cursor value to a variable. It is a unary operator whose graphic symbol is the "at sign" (@). It appears within a pattern, preceding the name of a variable. By using OUTPUT as the variable, we can display the cursor position on the screen. For instance:
? 'VALLEY' 'A' @OUTPUT ARB 'E' @OUTPUT 2 5 Success ? 'DOUBT' @OUTPUT 'B' 0 1 2 3 Success ? 'FIX' @OUTPUT 'B' 0 1 2 FailureCursor assignment is performed whenever the pattern match encounters the operator, including retries. It occurs even if the pattern ultimately fails. The element @OUTPUT behaves like the null string -- it doesn't consume subject characters or interfere with the match in any way.
These functions return a pattern based on their integer argument. The pattern produced can be used directly in a pattern match statement, or stored in a variable for later retrieval.
LEN(I) produces a pattern which matches a string exactly I characters long. I must be an integer greater than or equal to zero. Any characters may appear in the matched string. For example, LEN(5) matches any 5-character string, and LEN(0) matches the null string. LEN may be constrained to certain portions of the subject by other adjacent patterns:
? S = 'ABCDA' ? S LEN(3) . OUTPUT ABC ? S LEN(2) . OUTPUT 'A' CDThe first pattern match had only one constraint -- the subject had to be at least three characters long -- so LEN(3) matched its first three characters. The second case imposes the additional restriction that LEN(2)'s match be followed immediately by the letter 'A'. This disqualifies the intermediate match attempts 'AB' and 'BC'.
Using keyword &ALPHABET as the subject provides a simple way to convert a decimal character code between 0 and 255 to its one character equivalent. For example, by consulting an ASCII character code chart we find that the BEL character is decimal 7. We can load that character into variable BEEP with one statement:
? &ALPHABET LEN(7) LEN(1) . BEEPand produce five beeps on the speaker with:
? OUTPUT = DUPL(BEEP,5)&ALPHABET contains all 256 members of the computer's character set, in ascending order. LEN(7) matches the first seven characters (codes 0 - 6), leaving BEL as the next match position for LEN(1). This operation is analogous to the CHR$ function in BASIC.
The inverse operation, obtaining the numerical value of a character code, is also possible. If variable CHAR contains a one character string, variable N will be set to its decimal equivalent with the second statement below:
? CHAR = 'A' ? &ALPHABET @N CHAR ? OUTPUT = N 65
In Chapter 6, "Program Defined Objects," I'll demonstrate how you can define your own functions to encapsulate each of these operations.
The POS(I) and RPOS(I) patterns do not match subject characters. Instead, they succeed only if the "current" cursor position is a specified value. They often are used to tie points of the pattern to specific character positions in the subject.
POS(I) counts from the left end of the subject string, succeeding if the current cursor position is equal to I. RPOS(I) is similar, but counts from the right end of the subject. If the subject length is N characters, RPOS(I) requires the cursor be (N - I). If the cursor is not the correct value, these functions fail, and SNOBOL4 tries other pattern alternatives, perhaps extending a previous substring matched by ARB, or beginning the match further along in the subject.
Continuing with CODE.SNO:
? S = 'ABCDA' ? S POS(0) 'B' Failure ? S LEN(3) . OUTPUT RPOS(0) CDA ? S POS(3) LEN(1) . OUTPUT D ? S POS(0) 'ABCD' RPOS(0) FailureThe first example requires a 'B' at cursor position 0, and fails for this subject. POS(0) "anchors" the match, forcing it to begin with the first subject character. Similarly, RPOS(0) anchors the end of the pattern to the tail of the subject. The next example matches at a specific mid-string character position, POS(3). Finally, enclosing a pattern between POS(0) and RPOS(0) forces the match to use the ENTIRE subject string.
At first glance these functions appear to be "setting" the cursor to a specified value. Actually, they never alter the cursor, but instead wait for the cursor to "come to them" as various match alternatives are attempted. This, in turn, allows other patterns in the statement to be processed in an orderly fashion. You can demonstrate this "waiting for the cursor" behavior like this:
? S @OUTPUT POS(3) 0 1 2 3 Success
These patterns are hybrids of ARB, POS(), and RPOS(). They use specific cursor positions, like POS and RPOS, but bind (match) subject characters, like ARB. TAB(I) matches any characters from the current cursor position up to the specified position I. RTAB(I) does the same, except, as in RPOS(), the target position is measured from the end of the subject.
TAB and RTAB will match the null string, but will fail if the current cursor is to the right of the target. They also fail if the target position is past the end of the subject string.
These patterns are useful when working with tabular data. For example, if a data file contains name, street address, city and state in columns 1, 30, 60, and 75, this pattern will break out those elements from a line:
P = TAB(29) . NAME TAB(59) . STREET TAB(74) . CITY REM . STATEThe pattern RTAB(0) is equivalent to primitive pattern REM. One potential source of confusion is just what it is that RTAB matches. It counts from the right end of the subject, but matches to the left of its target cursor. Try:
? 'ABCDE' TAB(2) . OUTPUT RTAB(1) . OUTPUT AB CD SuccessTAB(2) matches 'AB', leaving the cursor at 2, between 'B' and 'C'. The subject is 5 characters long, so RTAB(1) specifies a target cursor of 5 - 1, or 4, which is between the 'D' and 'E'. RTAB matches everything from the current cursor, 2, to the target, 4.
These functions produce a pattern based on a string-valued argument. Once again, the pattern may be used directly or stored in a variable.
Each function produces a pattern which matches one character based upon the subject string. ANY(S) matches the next subject character IF it appears in the string S, and fails otherwise. NOTANY(S) matches a subject character only if it does NOT appear in S. Here are some sample uses of each:
? VOWEL = ANY('AEIOU') ? DVOWEL = VOWEL VOWEL ? NOTVOWEL = NOTANY('AEIOU') ? 'VACUUM' VOWEL . OUTPUT A ? 'VACUUM' DVOWEL . OUTPUT UU ? 'VACUUM' (VOWEL NOTVOWEL) . OUTPUT ACThe argument string specifies a set of characters to be used in creating the ANY or NOTANY pattern. It may contain duplicate characters, and the order of characters in S is immaterial.
These are multicharacter versions of ANY and NOTANY. Each requires a nonnull argument to specify a set of characters.
SPAN(S) matches one or more subject characters from the set in S. SPAN must match at least one subject character, and will match the LONGEST subject string possible.
BREAK(S) matches "up to but not including" any character in S. The string matched must always be followed in the subject by a character in S. Unlike SPAN and NOTANY, BREAK will match the null string.
These two functions are called "stream" functions because each streams by a series of subject characters. SPAN is most useful for matching a group of characters with a common trait. For example, we can say an English word is composed of one or more alphabetic characters, apostrophe, and hyphen. The statements
? LETTERS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ'-" ? WORD = SPAN(LETTERS)produce a suitable pattern in WORD. To match the material between words (white space, punctuation, etc.), use the pattern:
? GAP = BREAK(LETTERS)SPAN and BREAK are two of the most useful SNOBOL4 functions. Try some examples using CODE.SNO:
? 'SAMPLE LINE' WORD . OUTPUT SAMPLE ? 'PLUS TEN DEGREES' ' ' WORD . OUTPUT TEN ? GAPO = GAP . OUTPUT ? WORDO = WORD . OUTPUT ? ': ONE, TWO, THREE' GAPO WORDO GAPO WORDO : ONE , TWO DIGITS = '0123456789' ? INTEGER = (ANY('+-') | '') SPAN(DIGITS) ? 'SET -43 VOLTS' INTEGER . OUTPUT -43 ? REAL = INTEGER '.' (SPAN(DIGITS) | '') ? 'SET -43.625 VOLTS' REAL . OUTPUT -43.625 ? S = '0ZERO,1ONE,2TWO,3THREE,4FOUR,5FIVE,' ? S 4 BREAK(',') . OUTPUT FOURIf you require a version of SPAN which WILL match the null string, or a BREAK which will NOT match the null string, you can use the following constructions:
(SPAN(S) | '') (NOTANY(S) BREAK(S))
We need to introduce one more fundamental concept -- replacement -- before we can write some meaningful programs.
Pattern matching identifies a subject substring with a particular trait, specified by the pattern. We used conditional assignment to copy that substring to a variable. Replacement moves in the other direction, letting you alter the substring in the subject. The space occupied by the matching substring may be enlarged or contracted (or removed entirely), leaving adjacent subject characters undisturbed. If the pattern matched the entire subject, replacement behaves like a simple assignment statement.
Replacement appears in a form similar to assignment:
SUBJECT PATTERN = REPLACEMENTFirst, the pattern match is attempted on the subject. If it fails, execution of the statement ends immediately, and replacement does not occur. If the match succeeds, any conditional assignments within the pattern are performed. The replacement field is then evaluated, converted to a string, and inserted in the subject in place of the matching substring. If the replacement field is empty, the null string replaces the matched substring, effectively deleting it. Try a few examples with CODE.SNO:
? T = 'MUCH ADO ABOUT NOTHING' ? T 'ADO' = 'FUSS' Success ? OUTPUT = T MUCH FUSS ABOUT NOTHING ? T 'NOTHING' = Success ? OUTPUT = T MUCH FUSS ABOUT ? 'MASH' 'M' = 'B' Execution error #8, Variable not present where required FailureThe first replacement searches for 'ADO' in the subject string, replacing it with 'FUSS'. The second replacement has a null string replacement value, and deletes the matching substring. The last example demonstrates that a variable must be the subject of replacement. Variables can be changed; string literals -- like 'MASH' -- cannot.
The following will replace the 'M' in 'MASH' with a 'B':
? VERB = 'MASH' ? VERB 'M' = 'B' ? OUTPUT = VERB BASHIf the matched substring appears more than once in the subject, only the first occurrence is changed. The remaining substrings must be found with a program loop. For example, a statement to eliminate all occurrences of the letter 'A' from the subject looks like this:
ALOOP SUBJECT 'A' = :S(ALOOP)Here ALOOP is the statement label, SUBJECT is some variable containing the subject string, 'A' is the pattern, and the replacement field is empty. If an 'A' is found, it is deleted by replacing it with the null string, and the statement succeeds. The success GOTO branches back to ALOOP, and another search for 'A' is performed. The loop continues until no 'A's remain in the subject, and the pattern match fails. Of course, the pattern and replacement can be as complex as desired.
Simple loops like this can be tried in CODE.SNO by appending a semicolon after the GOTO field. (Semicolon is used with GOTO in CODE.SNO only; you would not use it in normal programs.) Continuing with the previous example:
? VOWEL = ANY('AEIOU') ?VL T VOWEL = '*' :S(VL); ? OUTPUT = T M*CH F*SS *B**TSince conditional assignment is performed before replacement, its results are available for use in the replacement field of the same statement. Here's an example of removing the first item from a list, and placing it on the end:
? RB = 'RED,ORANGE,YELLOW,GREEN,BLUE,INDIGO,VIOLET,' ? CYCLE = BREAK(',') . ITEM LEN(1) REM . REST ? RB CYCLE = REST ITEM ',' Success ? OUTPUT = ITEM RED ? OUTPUT = RB ORANGE,YELLOW,GREEN,BLUE,INDIGO,VIOLET,RED,Pattern CYCLE matches the entire subject, placing the first color into ITEM, bypassing the comma with LEN(1), and placing the remainder of the subject into REST. REST and ITEM are then transposed in the replacement field, and stored back into RB.
I've introduced a lot of concepts in this chapter; it's time to see how they fit together into programs. They're supplied on the Vanilla SNOBOL4 diskette.
The first program counts the number of words in the input file. Lines with an asterisk in the first column are comment lines -- their contents are ignored by SNOBOL4.
* Simple word counting program, WORDS.SNO. * * A word is defined to be a contiguous run of letters, * digits, apostrophe and hyphen. This definition of * legal letters in a word can be altered for specialized * text. * * If the file to be counted is TEXT.IN, run this program * by typing: * B>SNOBOL4 WORDS /I=TEXT * &TRIM = 1 WORD = "'-" '0123456789' &UCASE &LCASE WPAT = BREAK(WORD) SPAN(WORD) NEXTL LINE = INPUT :F(DONE) NEXTW LINE WPAT = :F(NEXTL) N = N + 1 :(NEXTW) DONE OUTPUT = +N ' words' ENDAfter defining the acceptable characters in a word, the real work of the program is performed in the three lines beginning with label NEXTL. A line is read from the input file, and stored in variable LINE. The next statement attempts to find the next word with pattern WPAT. BREAK streams by any blanks and punctuation, stopping just short of the word, which SPAN then matches. Both the word and any preceding punctuation are removed from LINE by replacement with the null string.
When no more words remain in LINE, the failure transfer to NEXTL reads the next line. If the match succeeds, N is incremented, and the program goes back to NEXTW to search for another word. When the End-of-File is encountered, control transfers to DONE and the number of words is displayed.
It's simple to alter pattern WPAT to search for other things. For instance, if we wanted to count occurrences of double vowels, we could use:
WPAT = ANY('AEIOUaeiou') ANY('AEIOUaeiou')To count the occurrences of integers with an optional sign character, use:
WPAT = (ANY('+-') | '') SPAN('0123456789')Perhaps we want to count violations of simple punctuation rules: period with only one blank, or comma and semicolon followed by more than one blank:
WPAT = '. ' NOTANY(' ') | ANY(',;') ' ' SPAN(' ')Notice how closely WPAT parallels the English language description of the problem.
This program asks for two words, and displays all intersecting letters between them. For example, given the words LOOM and HOME, the program output is:
H LOOM M E H LOOM M E H O LOOM EA pattern match like this would find the first intersecting character:
HORIZONTAL ANY(VERTICAL) . CHARHowever, we want to find all intersections, so will have to iterate our search. In conventional programming languages, we might use numerical indices to remember which combinations were tried. Here, we'll use place-holding characters like '*' and '#' to remove solutions from future consideration. As seems to be the case with SNOBOL4, there are more comments than program statements:
* CROSS.SNO - Print all intersections between two words &TRIM = 1 * Get words from user * AGAIN OUTPUT = 'ENTER HORIZONTAL WORD:' H = INPUT :F(END) OUTPUT = 'ENTER VERTICAL WORD:' V = INPUT :F(END) * Make copy of horizontal word to track position. HC = H * Find next intersection in horizontal word. Save * the number of preceding horizontal characters in NH. * Save the intersecting character in CROSS. * Replace with '*' to remove from further consideration. * Go to AGAIN to get new words if horizontal exhausted. * NEXTH HC @NH ANY(V) . CROSS = '*' :F(AGAIN) * For each horizontal hit, iterate over possible * vertical ones. Make copy of vertical word to track * vertical position. * VC = V * Find where the intersection was in the vertical word. * Save the number of preceding vertical characters in NV. * Replace with '#' to prevent finding it again in that * position. When vertical exhausted, try horizontal again. * NEXTV VC @NV CROSS = '#' :F(NEXTH) * Now display this particular intersection. * We make a copy of the original vertical word, * and mark the intersecting position with '#'. * OUTPUT = PRINTV = V PRINTV POS(NV) LEN(1) = '#' * Peel off the vertical characters one-by-one. Each will * be displayed with NH leading blanks to get it in the * correct column. When the '#' is found, display the full * horizontal word instead. * When done, go to NEXTV to try another vertical position. * PRINT PRINTV LEN(1) . C = :F(NEXTV) OUTPUT = DIFFER(C,'#') DUPL(' ',NH) C :S(PRINT) OUTPUT = H :(PRINT) END
Most of the examples above match substrings which do not begin at the first subject character. This is the "unanchored" mode of pattern matching. Alternately, we can "anchor" the pattern match by requiring it to include the first subject character. Setting keyword &ANCHOR to a nonzero value produces anchored matching. Anchored matching is usually faster than unanchored, because many futile attempts to match are eliminated.
Even when the desired item is not at the beginning of the subject, it is often possible to simulate anchored matching by prefixing the pattern with a subpattern which will stream out to the desired object. The stream function spans the gap from the first subject character to the desired item. Use CODE.SNO to experiment with &ANCHOR:
? DIGITS = '0123456789' ? &ANCHOR = 1 ? 'THE NEXT 43 DAYS' BREAK(DIGITS) SPAN(DIGITS) . NThis will assign substring '43' to N, even in anchored mode. In unanchored mode, the test lines:
? &ANCHOR = 0 ? 'THE NEXT 43 DAYS' SPAN(DIGITS) . Nwould ultimately succeed, but only after SPAN failed on each of the characters preceding the '4'. The efficiency difference is more pronounced if the subject does not contain any digits. In the first formulation, BREAK(DIGITS) fails and the anchored match then fails immediately. The second construction fails only after SPAN is tried at each subject character position.
When your program first begins execution, SNOBOL4 sets keyword &ANCHOR to zero, the unanchored mode. If you can construct all your patterns as anchored patterns, you should set &ANCHOR nonzero for anchored matching. Setting and resetting &ANCHOR throughout your program is error prone and not advised. Another alternative is to leave &ANCHOR set to 0, but to 'pseudo-anchor' patterns by using POS(0) as the first pattern element.
It always takes less time for a pattern to succeed than to fail. Failure implies an exhaustive search of all combinations, whereas success stops the pattern match early. You should try to construct patterns with direct routes to success, such as the use of BREAK above. Wherever possible, impose restrictions on the number of alternatives to be tried. Combinatorial explosion is the price of loose pattern programming.