Idioms for programming in Arc

The code that implements Arc illustrates many useful programming techniques. Some of the techniques occur multiple times, and can be considered idioms of Arc programming; techniques that can be applied to many problems. This article describes some of the most common idioms, and how they can be used in Arc programming. The techniques described focus largely on macros and recursion.

Macros

Not surprisingly, given Paul Graham's enthusiasm for macros, Arc makes heavy use of macros. A few idioms appear commonly in the code.

"Decorated body"

The most common use of macros in Arc is something I call the "decorated body", which wraps a chunk of body code with something to extend it, analogous to the decorator design pattern. Typically the decorator will set something up before executing the body code, as in w/outstring, w/table, and fromstring. (Note: the link for each operation goes to the documentation. Clicking on the code icon in the documentation will show the code for the operation.) The pattern is also used to set something up before execution and then clean up afterwards. This is especially useful for file operations, as in w/infile, w/outfile, and w/instring. These macros often have names starting with "w/", short for "with/", indicating what they are providing to the body. The HTML library makes heavy use of this idiom, for example in tag, form, and cdata, which wrap the body in various HTML opening and closing tags.

To examine the implementation of this idiom, consider let, which is defined as:

(mac let (var val . body)
  `(with (,var ,val) ,@body))
The typical features of this idiom are a macro definition with argument . body to pick up the remainder of the arguments, a macro definition starting with quasiquote `, and unquote-splicing ,@body to substitute the body code into the macro. The macro performs simple substitution:
arc> (macex1 '(let foo 42 body1 body2))
(with (foo 42) body1 body2)

Macro for control of evaluation

Unlike functions, macros receive their arguments unevaluated. This allows macros to be used for conditional execution of code, or to execute code multiple times iteratively. Some examples are when, unless, while, and for.

(You might expect if to be a macro. It is actually a built-in special form, since conditional macros need to be based on some underlying conditional form.)

The code for repeat illustrates that this idiom also uses the ,@body structure as "decorated body":

(mac repeat (n . body)
  `(for ,(uniq) 1 ,n ,@body))

Macro as lambda adapter

A variant of the body decorator is to use a macro to convert body code into a lambda function, which is similar to an adapter design pattern. This idiom is used when a function expects to receive a lambda function, but passing in body code is more convenient, especially when the base function is a Scheme primitive. Examples are thread, atomic, w/stdin, and w/stdout:
(mac w/stdout (str . body)
  `(call-w/stdout ,str (fn () ,@body)))
arc> (macex '(w/stdout foo body1 body2))
(call-w/stdout foo (fn nil body1 body2))

Macro to create functions with parameters

Another common macro idiom is to take a list of parameters and body code, and build a function out of these. This idiom is typically used to define a function that has special properties, and is often given a name that starts with "def". Some examples are rfn, afn, defmemo, defop, defopr, linkf, and rlinkf.

The code for defmemo shows the key component of this idiom: (fn ,parms ,@body) to build the function out of the parameters and body.

(mac defmemo (name parms . body)
  `(safeset ,name (memo (fn ,parms ,@body))))

Complex macros

One irony is that even though though macros are extremely powerful, most of the macros in Arc perform simple substitutions. There are, however, some macros that perform complex manipulation of their arguments. Assignment is built from the setform macro, which implements generalized variables through intricate macro processing of the arguments. Slightly less complex is and, which implements short-circuit evaluation by turning its arguments into nested if statements. Another interesting example is with; the macro splits up the arguments into variables and values:
arc> (macex '(with (a 1 b 2) (prn a b)))
((fn (a b) (prn a b)) 1 2)
The litmatch macro performs string matching by generating code to perform a sequence of character comparisons.
arc> (macex '(litmatch "abc" foo))
((fn (gs1442 gs1443) (unless (> (+ gs1443 3) (len gs1442))
  (and (is #\a (gs1442 (+ gs1443 0))) (is #\b (gs1442 (+ gs1443 1)))
       (is #\c (gs1442 (+ gs1443 2)))))) foo 0)

Recursion

Tail recursion is a standard Scheme technique for efficient recursion. If the last operation of a function is a recursive call, the compiler will use tail-call optimization to turn the recursion into iteration, avoiding creation of a stack frame on each call.

Iteration in Arc is built on top of tail recursion. Basic loop operations such as while, loop, and whilet use tail recursion on a function built from rfn:

(mac while (test . body)
  (w/uniq (gf gp)
    `((rfn ,gf (,gp)
        (when ,gp ,@body (,gf ,test)))
      ,test)))

cdr-scanning

A typical idiom for scanning a list is to tail-recursively call the function on the cdr of the list. Some examples are reclist, nthcdr, last, and rev.

The code for assoc illustrates this idiom. If there is no match, assoc is called on the cdr, until the whole list has been scanned:

(def assoc (key al)
  (if (atom al)
       nil
      (and (acons (car al)) (is (caar al) key))
       (car al)
      (assoc key (cdr al))))

Tail-recursive string functions

One idiom in Arc for using tail recursion with strings is to add an index parameter to the function. This index parameter indexes into the string, and by incrementing it in each call, the function can be made tail-recursive and doesn't require modifying the string. The index parameter is often made an optional parameter so "normal" uses of the function don't need to specify the index. Some examples of this idiom are recstring, findsubseq, indented-code, parabreak, and urlend.

The code for headmatch illustrates this idiom; in this case the tail-recursion uses an anaphoric function:

(def headmatch (pat seq (o start 0))
  (let p (len pat) 
    ((afn (i)      
       (or (is i p) 
           (and (is (pat i) (seq (+ i start)))
                (self (+ i 1)))))
     0)))

Recursion with cons

Many operations in Arc need to perform operations on a sequence and create a result list. The common Arc idiom is to recurse through a sequence, building up the result with cons. Specifically, a macro (op seq) will return the cons of something done to (car seq) and (op (cdr seq)). Thus, op recursively steps through seq, processing an element at a time and building up the result. This idiom provides an interesting contrast to the previous recursion examples because this idiom is not tail-recursive. Because the cons is applied to the result of the recursive function call, tail-call optimization cannot be applied. Several examples of this idiom are join, rem, reinsert-sorted, pair, firstn, firstn-that, tuples, and range.

The code for map1 illustrates this idiom:

(def map1 (f xs)
  (if (no xs) 
      nil
      (cons (f (car xs)) (map1 f (cdr xs)))))
Note that f is applied to (car xs), map1 is called recursively on (cdr xs), and the two are cons'd together.

Miscellaneous

The majority of the idioms I discuss are based on macros or recursion. This secion discusses some miscellaneous idioms that don't fit those categories.

Push and reverse

The "push and reverse" idiom is an efficient, but somewhat surprising way to generate a list. In this idiom, successive elements are pushed onto the front of the list, and when the entire list is generated, it is reversed to obtain the desired order. This is a well-known Lisp idiom and is more efficient than the obvious method of appending elements to the end of the list. Several examples of "push and reverse" are drain, dedup, parse-format, litmatch, endmatch, and handle-post.

The code for n-of illustrates the idiom. Items are added with push to an accumulator (ga in this case), and the accumulator is reversed with rev at the end:

(mac n-of (n expr)
  (w/uniq ga
    `(let ,ga nil     
       (repeat ,n (push ,expr ,ga))
       (rev ,ga))))

String stdout

Arc makes extensive use of an idiom I'll call "string stdout" because a string is used as stdout. The code writes multiple chunks to stdout, and the results are gathered into a string. This idiom avoids having to manually concatenate the components into the result string. The tostring macro is used to implement this idiom. Arc has a surprisingly large number of macros that use this idiom, including readline, code-block, unmarkdown, varfield, markdown, eschtml, striptags, parafy, subst, multisubst, and num.

Typically, code using this idiom will use produce output with pr or writec, and use tostring to collect the output. The code for multisubst illustrates this:

(def multisubst (pairs seq)
  (tostring 
    (forlen i seq
      (iflet (old new) (find [begins seq (car _) i] pairs)
        (do (++ i (- (len old) 1))
            (pr new))
        (pr (seq i))))))

The flip side of this idiom is the "string stdin" idiom, where a string is used as stdin. The "string stdin" idiom is rarely used; one example is parse-format.

Summary

The Arc code illustrates several common Arc idioms, some of which are well-known Scheme or Lisp idioms. Since Paul Graham is a Lisp expert, carefully studying his Arc code is a rewarding way to learn more about programming Lisp-like languages.

Undoubtedly I've missed some important idioms. Please let me know if your favorite idiom is missing, or if I've mangled the descriptions of any idioms.

2 comments:

Ramon Leon said...

Excellent post, there aren't enough blog posts dissecting someone else's style, I like it.

Daniel Werner said...

When I first visited this post, it sounded like a piece of official documentation to me. Surprisingly insightful, thank you!