The code that implements Arc illustrates many useful programming techniques. Some of the techniques occur multiple times, and can be considered idioms of Arc programming; techniques that can be applied to many problems. This article describes some of the most common idioms, and how they can be used in Arc programming. The techniques described focus largely on macros and recursion.
Macros
Not surprisingly, given Paul Graham's enthusiasm for macros, Arc makes heavy use of macros. A few idioms appear commonly in the code.
"Decorated body"
The most common use of macros in Arc is something I call the "decorated body", which wraps a chunk of body code with something to extend it, analogous to the
decorator design pattern. Typically the decorator will set something up before executing the body code, as in
w/outstring,
w/table, and
fromstring.
(Note: the link for each operation goes to the documentation. Clicking on the code icon in the documentation will show the code for the operation.)
The pattern is also used to set something up before execution and then clean up afterwards. This is especially useful for file operations, as in
w/infile,
w/outfile, and
w/instring.
These macros often have names starting with "w/", short for "with/", indicating what they are providing to the body.
The
HTML library makes heavy
use of this idiom, for example in
tag,
form,
and
cdata,
which wrap the body in various HTML opening and closing tags.
To examine the implementation of this idiom, consider
let,
which is defined as:
(mac let (var val . body)
`(with (,var ,val) ,@body))
The typical features of this idiom are a macro definition with argument
. body to pick up the remainder of the arguments, a macro definition starting with quasiquote
`, and unquote-splicing
,@body to substitute the body code into the macro. The macro performs simple substitution:
arc> (macex1 '(let foo 42 body1 body2))
(with (foo 42) body1 body2)
Macro for control of evaluation
Unlike functions, macros receive their arguments unevaluated. This allows macros to be used for
conditional execution of code, or to execute code multiple times
iteratively. Some examples are
when,
unless,
while,
and
for.
(You might expect if to be a macro. It is actually a built-in special form, since conditional macros need to be based on some underlying conditional form.)
The code for repeat illustrates that this idiom also uses the ,@body structure as "decorated body":
(mac repeat (n . body)
`(for ,(uniq) 1 ,n ,@body))
Macro as lambda adapter
A variant of the body decorator is to use a macro to convert body code into a lambda function, which is similar to
an
adapter design pattern. This idiom is used when a function expects to receive a lambda function, but passing in body code is more convenient, especially when the base function is a Scheme primitive. Examples are
thread,
atomic,
w/stdin,
and
w/stdout:
(mac w/stdout (str . body)
`(call-w/stdout ,str (fn () ,@body)))
arc> (macex '(w/stdout foo body1 body2))
(call-w/stdout foo (fn nil body1 body2))
Macro to create functions with parameters
Another common macro idiom is to take a list of parameters and body code, and build a function out of these. This idiom is typically used to define a function that has special properties, and is often given a name that starts with "def". Some examples are
rfn,
afn,
defmemo,
defop,
defopr,
linkf,
and
rlinkf.
The code for defmemo shows the key component of this idiom: (fn ,parms ,@body) to build the function out of the parameters and body.
(mac defmemo (name parms . body)
`(safeset ,name (memo (fn ,parms ,@body))))
Complex macros
One irony is that even though though macros are extremely powerful, most of the macros in Arc perform simple substitutions.
There are, however, some macros that perform complex manipulation of their arguments.
Assignment is built from the
setform macro, which implements generalized variables through intricate macro processing of the arguments. Slightly less complex is
and, which implements short-circuit evaluation by turning its arguments into nested
if statements. Another interesting example is
with; the macro splits up the arguments into variables and values:
arc> (macex '(with (a 1 b 2) (prn a b)))
((fn (a b) (prn a b)) 1 2)
The
litmatch macro performs string matching by generating code to perform a sequence of character comparisons.
arc> (macex '(litmatch "abc" foo))
((fn (gs1442 gs1443) (unless (> (+ gs1443 3) (len gs1442))
(and (is #\a (gs1442 (+ gs1443 0))) (is #\b (gs1442 (+ gs1443 1)))
(is #\c (gs1442 (+ gs1443 2)))))) foo 0)
Recursion
Tail recursion is a standard Scheme technique for efficient recursion. If the last operation of a function is a recursive call, the compiler will use tail-call optimization to turn the recursion into iteration, avoiding creation of a stack frame on each call.
Iteration in Arc is built on top of tail recursion. Basic loop operations
such as
while,
loop, and
whilet use tail recursion on a function built from
rfn:
(mac while (test . body)
(w/uniq (gf gp)
`((rfn ,gf (,gp)
(when ,gp ,@body (,gf ,test)))
,test)))
cdr-scanning
A typical idiom for scanning a list is to tail-recursively call the function on the cdr of the list. Some examples are
reclist,
nthcdr,
last,
and rev.
The code for
assoc
illustrates this idiom. If there is no match, assoc is called on the cdr, until the whole list has been scanned:
(def assoc (key al)
(if (atom al)
nil
(and (acons (car al)) (is (caar al) key))
(car al)
(assoc key (cdr al))))
Tail-recursive string functions
One idiom in Arc for using tail recursion with strings is to add an index parameter to the function. This index parameter indexes into the string, and by incrementing it in each call, the function can be made tail-recursive and doesn't require modifying the string. The index parameter is often made an optional parameter so "normal" uses of the function don't need to specify the index. Some examples of this idiom are
recstring,
findsubseq,
indented-code,
parabreak,
and
urlend.
The code for headmatch illustrates this idiom; in this case the tail-recursion uses an anaphoric function:
(def headmatch (pat seq (o start 0))
(let p (len pat)
((afn (i)
(or (is i p)
(and (is (pat i) (seq (+ i start)))
(self (+ i 1)))))
0)))
Recursion with cons
Many operations in Arc need to perform operations on a sequence and create a result list. The common Arc idiom is to recurse through a sequence, building up the result with
cons.
Specifically, a macro
(op seq) will return the cons of something done to
(car seq) and
(op (cdr seq)). Thus,
op recursively steps through
seq, processing an element at a time and building up the result.
This idiom provides an interesting contrast to the previous recursion examples because this idiom is not tail-recursive. Because the
cons is applied to the result of the recursive function call, tail-call optimization cannot be applied.
Several examples of this idiom are
join,
rem,
reinsert-sorted,
pair,
firstn,
firstn-that,
tuples,
and
range.
The code for
map1
illustrates this idiom:
(def map1 (f xs)
(if (no xs)
nil
(cons (f (car xs)) (map1 f (cdr xs)))))
Note that
f is applied to
(car xs),
map1 is called recursively on
(cdr xs), and the two are cons'd together.
Miscellaneous
The majority of the idioms I discuss are based on macros or recursion. This secion discusses some miscellaneous idioms that don't fit those categories.
Push and reverse
The "push and reverse" idiom is an efficient, but somewhat surprising way to generate a list. In this idiom, successive elements are pushed onto the front of the list, and when the entire list is generated, it is reversed to obtain the desired order. This is a well-known
Lisp idiom and is more efficient than the obvious method of appending elements to the end of the list.
Several examples of "push and reverse" are
drain,
dedup,
parse-format,
litmatch,
endmatch,
and
handle-post.
The code for n-of illustrates the idiom. Items are added with
push
to an accumulator (ga in this case), and the accumulator is reversed with
rev
at the end:
(mac n-of (n expr)
(w/uniq ga
`(let ,ga nil
(repeat ,n (push ,expr ,ga))
(rev ,ga))))
String stdout
Arc makes extensive use of an idiom I'll call "string stdout" because a string is used as stdout. The code writes multiple chunks to stdout, and the results are gathered into a string. This idiom avoids having to manually concatenate the components into the result string.
The
tostring
macro is used to implement this idiom.
Arc has a surprisingly large number of macros that use this idiom, including
readline,
code-block,
unmarkdown,
varfield,
markdown,
eschtml,
striptags,
parafy,
subst,
multisubst,
and
num.
Typically, code using this idiom will use produce output with pr or writec, and use tostring to collect the output.
The code for multisubst illustrates this:
(def multisubst (pairs seq)
(tostring
(forlen i seq
(iflet (old new) (find [begins seq (car _) i] pairs)
(do (++ i (- (len old) 1))
(pr new))
(pr (seq i))))))
The flip side of this idiom is the "string stdin" idiom, where a string is used as stdin. The "string stdin" idiom is rarely used; one example is
parse-format.
Summary
The Arc code illustrates several common Arc idioms, some of which are well-known Scheme or Lisp idioms. Since Paul Graham is a Lisp expert, carefully studying his Arc code is a rewarding way to learn more about programming Lisp-like languages.
Undoubtedly I've missed some important idioms. Please let me know if your favorite idiom is missing, or if I've mangled the descriptions of any idioms.