Arc Internals, Part 1

This article is the first part of a description of the implementation of the Arc language, based on my examination of the code.

The core of Arc is implemented in Scheme (in ac.scm), and runs in the mzscheme interpreter. The remainder of the Arc language is implemented in Arc itself (arc.arc), and is loaded into the running Arc implementation. In addition, the Arc distribution includes libraries for a web server, string handling, and a few others. Arc runs a standard Read-Evaluate-Print Loop (REPL), allowing commands to be entered and executed interactively.

In a bit more detail, when an Arc expression is input to the REPL, the Scheme reader reads and parses it. The Arc expression is converted to an analogous Scheme expression, and eval is applied to execute the Scheme code. The result is converted from the internal Scheme representation to a displayable form, and displayed as the output from the REPL.

Interestingly, Arc's [ _ ] syntax is implemented entirely independently from the rest of Arc; the code in brackets.scm extends the reader so the bracket syntax is converted to standard syntax as the input is parsed. The bracket support could easily be run on Scheme without Arc, or Arc could be run without bracket support.

Entering :a into the Arc REPL drops execution into Scheme, allowing easy examination of the internals:

> (= x 3)
3
arc> (= y '(1 (2 3) 4))
(1 (2 3) 4)
arc> (def z () (pr "hello"))
#<procedure: z>
arc> :a
> _x
3
> _y
(1 (2 3 . nil) 4 . nil)
> _z
#<procedure: z>
Arc's data and procedures are stored in the Scheme environment, with a few modifications. Arc symbols have an underscore prepended internally to avoid collisions with Scheme names. In the Arc language, the empty list is nil, while the Scheme empty list is (). Thus, Arc lists are stored in Scheme terminated by the (arbitrary) symbol 'nil, as can be seen by the dotted notation above. Arc macros are stored as a vector of the symbol 'tagged, the symbol 'mac, and the procedure.

The REPL

Arc runs from a Read-Evaluate-Print Loop (REPL), which is started by executing the simple procedure tl:
(define (tl)
  (display "Use (quit) to quit, (tl) to return here after an interrupt.\n")
  (tl2))
Note that the REPL runs in Scheme, not in Arc. tl is a wrapper around tl2, which is the real REPL implementation:
(define (tl2)
  (display "arc> ")
  (on-err (lambda (c) 
            (set! last-condition* c)
            (display "Error: ")
            (write (exn-message c))
            (newline)
            (tl2))
    (lambda ()
      (let ((expr (read)))
        (if (eqv? expr ':a)
            'done
            (let ((val (arc-eval expr)))
              (write (ac-denil val))
              (namespace-set-variable-value! '_that val)
              (namespace-set-variable-value! '_thatexpr expr)
              (newline)
              (tl2)))))))
The arguments to on-err are an error procedure and the body procedure. The error procedure is executed if the main procedure encounters an exeception, similar to a try/catch block, but with the catch procedure first. The on-err procedure is implemented with continuations. If you're looking for the "mind-expanding" parts of Lisp, continuations will definitely interest you, but I will ignore on-err for now.

The meat is the second lambda function. The Scheme read procedure reads the input and creates an object using the Scheme parser. The input is passed to arc-eval, which evaluates the input as an Arc form. The result is converted by arc-denil from the internal 'nil-terminated form to displayable form and written out. The tl2 procedure then calls itself; to the C programmer, this may look like a stack overflow waiting to happen. However, Scheme is tail-recursive so the stack doesn't grow, and the call acts like a simple loop. The Scheme variables _that and _thatexpr record the expression to help debugging.

arc-eval and ac

The arc-eval procedure is the main entry point for executing an Arc form. It calls ac to convert the Arc form to a quoted Scheme form, and then does an eval on the Scheme expression:
(define (arc-eval expr)
  (eval (ac expr '()) (interaction-environment)))
The arc-eval procedure can be executed directly from inside Scheme:
> (arc-eval '((fn (x) (+ x 1)) 42))
43
The ac procedure is the real meat of Arc, as it translates Arc to Scheme. Its second argument is the "environment", a list of symbols that are currently bound. At the REPL, this list is empty.
(define (ac s env)
  (cond ((string? s) (string-copy s))  ; to avoid immutable strings
        ((literal? s) s)
        ((eqv? s 'nil) (list 'quote 'nil))
        ((ssyntax? s) (ac (expand-ssyntax s) env))
        ((symbol? s) (ac-var-ref s env))
        ((ssyntax? (xcar s)) (ac (cons (expand-ssyntax (car s)) (cdr s)) env))
        ((eq? (xcar s) 'quote) (list 'quote (ac-niltree (cadr s))))
        ((eq? (xcar s) 'quasiquote) (ac-qq (cadr s) env))
        ((eq? (xcar s) 'if) (ac-if (cdr s) env))
        ((eq? (xcar s) 'fn) (ac-fn (cadr s) (cddr s) env))
        ((eq? (xcar s) 'set) (ac-set (cdr s) env))
        ((pair? s) (ac-call (car s) (cdr s) env))
        (#t (err "Bad object in expression" s))))
Arc strings are copied to Scheme strings. Arc literals are unchanged. The Arc symbol 'nil is unchanged. Input with ssyntax (i.e. : or ~) is expanded and re-evaluated. Symbols are handled by ac-var-ref. Special operators quote, quasiquote, if, fn, and set are handled by separate procedures. Procedures are handled by ac-call.

xdef

Arc primitives are created with xdef, which enters a Scheme procedure into the namespace:
(define (xdef a b)
  (namespace-set-variable-value! (ac-global-name a) b)
  b)
For example, the Arc newstring procedure is just the Scheme make-string procedure:
(xdef 'newstring make-string)
Note that namespace-set-variable-value! is somewhat similar to define, except it takes a symbol such as 'a, rather than a variable such as a.

Conclusion

Many more interesting aspects of the Arc implementation remain, such as procedures, scoping, and macros. I hope to do more analysis later. Much of the above is based on discussions in the Arc forum. The code snippets from the Arc distribution are copyright Paul Graham and Robert Morris; the distribution is available at http://arclanguage.org/install.

1 comment:

Unknown said...

Thanks for the explain of how some of this black magic is working. Very helpful to those of us new to Scheme.

One thing - brackets may be independent of arc, but arc relies on the brackets syntax throughout. Of course those could all be re-written in the expanded form...