Simple Cryptanalysis with Arc

I came across a thread on reddit about why you shouldn't write your own encryption algorithm. In it, one person presented a homegrown algorithm with a challenge to break it. The algorithm turned out to be the weak Vigenère cipher, so I thought it would be interesting to break it with Arc.

In a Caesar cipher, each character in the plaintext is shifted by a fixed amount. For instance, if the shift is 3, "a" becomes "d", "b" becomes "e", and so on. In the Vigenère cipher, multiple shifts are used. For instance, if the key is (2, 4, 9), the first character is shifted by 2, the second character is shifted by 4, the third character is shifted by 9. The pattern then repeats with shifts of 2, 4, 9, 2, 4, 9, and so on until the entire plaintext is encrypted. Note that the length of the key is arbitrary.

The encrypted string of the reddit cipher can be expressed in Arc as:

(= s "t/GlC/,@RiG.^(xBQ2.MslF27@w#xXw,KHx,/9xiU-(!\"ixf%\";iA-=@wiW-=6EnRX#,stL-\"6Mqy!6:HDf#,6LDA0,)LoS9@:slPY9!xmx!?/LiM,,MsjL16@xCx.,6zrT26:HDx-$,svM/,6MnQ!P6sXz#/%NBJ^6jsly,6#ttCX/+sjx9/+MuCX8/MiFY(.xAx$/+AxS!69AjL4/$ziR5,6tuE-(/MqKC")
The first thing we can do is determine how long the key is. Since the key repeats, it is likely that characters in the ciphertext will also repeat with the same frequency. We can apply a simple autocorrelation function:
(def autocorrelate (s offset)
  (let total 0
    (for i 0 (- (len s) offset 1)
      (if (is (s i) (s (+ i offset)))
        (++ total)))
    total))
This function will compare each character with the character offset later in the string, and count the number of matches. Let's find the best value for offset:
(for i 1 20 (prn i " " (autocorrelate s i)))
1 2
2 3
3 1
4 6
5 6
6 16
7 6
8 3
9 0
10 1
11 3
12 16
13 7
14 5
15 0
16 3
17 5
18 17
19 1
20 3
Note that comparing each character with the character 6 characters later has 16 matches, which is much better than the other offsets. So it's pretty likely that the key is length 6. (There are also spikes at multiples of 6, as you'd expect.)

Now let's make a function to decode an encrypted character given an offset. We need the algorithm's character set mapping m used to shift the characters. The typical Vigenère uses a character set of 26 letters, but the challenge algorithm uses a much larger character set. To decode a character, we look it up in the mapping to get an integer index, shift by the offset, and convert back to a character.

(= m " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.,-\'?/\\!@#$%^*()+=~\";:")
(def decodechar (ch offset)
  (m (mod (- (pos ch m) offset) (- (len m) 1))))
For example, if our characters were shifted by 3, "D" should decode as "A":
(decodechar #\D 3)
#\A
At this point, there are multiple approaches to break the cipher. If you have any known plaintext, you can immediately get the key. Without a known plaintext, you can take an expected plaintext such as " the " and drag it through the ciphertext, generating a presumtive key at each point. If the same pattern pops out in several places, you may have the key.

Alternatively, if we take every 6th character in the ciphertext, we have 6 Caesar cipers, and we can break each one separately. Breaking a Caesar cipher is easy. One approach is frequency analysis; assuming there are more e's and t's than q's and z's in the plaintext, you can compute the likelihood of each key. Log-likelihood is a typical way, but Arc doesn't have a log function, annoyingly enough.

I decided to take a more brute force approach. I made a function that prints out the decoding of one of the Caesar substrings given an offset value.

(def decodesubstring (s startpos skip offset)
    (loop (= i startpos) (< i (len s)) (= i (+ i skip))
      (pr (decodechar (s i) offset)))
    (prn))
For instance, to take every sixth character, starting with the second, and shift it by 35:
(decodesubstring s 2 6 35)
"i$/#b$UV#=/d;cc/$c/$^;/d/e/\/dd$**^\d
That deciphering doesn't look too good if we're looking for English text. Either our plaintext is a Perl program or 35 is the wrong offset. Let's brute-force try all the offsets:
(def testall (s startpos skip)
  (for i 0 (len m)
    (pr i " ")
    (decodesubstring s startpos skip i)))

(testall s 2 6)
0 GRxswKx";wEsMHLLsxLsxzHsMsNstsMMxAAztM
1 FQwrvJw~"vDrLGKKrwKrwyGrLrMrsrLLwzzysL
2 EPvquIv=~uCqKFJJqvJqvxFqKqLqrqKKvyyxrK
3 DOuptHu+=tBpJEIIpuIpuwEpJpKpqpJJuxxwqJ
4 CNtosGt)+sAoIDHHotHotvDoIoJopoIItwwvpI
5 BMsnrFs()rznHCGGnsGnsuCnHnInonHHsvvuoH
6 ALrmqEr*(qymGBFFmrFmrtBmGmHmnmGGruutnG
7 zKqlpDq^*pxlFAEElqElqsAlFlGlmlFFqttsmF
8 yJpkoCp%^owkEzDDkpDkprzkEkFklkEEpssrlE
9 xIojnBo$%nvjDyCCjoCjoqyjDjEjkjDDorrqkD
10 wHnimAn#$muiCxBBinBinpxiCiDijiCCnqqpjC
11 vGmhlzm@#lthBwAAhmAhmowhBhChihBBmppoiB
12 uFlgkyl!@ksgAvzzglzglnvgAgBghgAAloonhA
13 tEkfjxk\!jrfzuyyfkyfkmufzfAfgfzzknnmgz
14 sDjeiwj/\iqeytxxejxejlteyezefeyyjmmlfy
15 rCidhvi?/hpdxswwdiwdiksdxdydedxxillkex
16 qBhcguh'?gocwrvvchvchjrcwcxcdcwwhkkjdw
17 pAgbftg-'fnbvquubgubgiqbvbwbcbvvgjjicv
18 ozfaesf,-emaupttaftafhpauavabauufiihbu
19 nye dre.,dl toss es ego t u a ttehhgat
20 mxd;cqd9.ck;snrr;dr;dfn;s;t; ;ssdggf s
21 lwc"bpc89bj"rmqq"cq"cem"r"s";"rrcffe;r
22 kvb~aob78ai~qlpp~bp~bdl~q~r~"~qqbeed"q
23 jua= na67 h=pkoo=ao=ack=p=q=~=ppaddc~p
24 it +;m 56;g+ojnn+ n+ bj+o+p+=+oo ccb=o
25 hs;)"l;45"f)nimm);m);ai)n)o)+)nn;bba+n
26 gr"(~k"34~e(mhll("l(" h(m(n()(mm"aa )m
27 fq~*=j~23=d*lgkk*~k*~;g*l*m*(*ll~  ;(l
28 ep=^+i=12+c^kfjj^=j^="f^k^l^*^kk=;;"*k
29 do+%)h+01)b%jeii%+i%+~e%j%k%^%jj+""~^j
30 cn)$(g)Z0(a$idhh$)h$)=d$i$j$%$ii)~~=%i
31 bm(#*f(YZ* #hcgg#(g#(+c#h#i#$#hh(==+$h
32 al*@^e*XY^;@gbff@*f@*)b@g@h@#@gg*++)#g
33  k^!%d^WX%"!faee!^e!^(a!f!g!@!ff^))(@f
34 ;j%\$c%VW$~\e dd\%d\%* \e\f\!\ee%((*!e
35 "i$/#b$UV#=/d;cc/$c/$^;/d/e/\/dd$**^\d
36 ~h#?@a#TU@+?c"bb?#b?#%"?c?d?/?cc#^^%/c
37 =g@'! @ST!)'b~aa'@a'@$~'b'c'?'bb@%%$?b
38 +f!-\;!RS\(-a=  -! -!#=-a-b-'-aa!$$#'a
39 )e\,/"\QR/*, +;;,\;,\@+, ,a,-,  \##@-
40 (d/.?~/PQ?^.;)""./"./!).;. .,.;;/@@!,;
41 *c?9'=?OP'%9"(~~9?~9?\(9"9;9.9""?!!\."
42 ^b'8-+'NO-$8~*==8'=8'/*8~8"898~~'\\/9~
43 %a-7,)-MN,#7=^++7-+7-?^7=7~787==-//?8=
44 $ ,6.(,LM.@6+%))6,)6,'%6+6=676++,??'7+
45 #;.59*.KL9!5)$((5.(5.-$5)5+565)).''-6)
46 @"948^9JK8\4(#**49*49,#4(4)454((9--,5(
47 !~837%8IJ7/3*@^^38^38.@3*3(343**8,,.4*
48 \=726$7HI6?2^!%%27%279!2^2*232^^7..93^
49 /+615#6GH5'1%\$$16$168\1%1^121%%69982%
50 ?)504@5FG4-0$/##05#057/0$0%010$$58871$
51 '(4Z3!4EF3,Z#?@@Z4@Z46?Z#Z$Z0Z##47760#
52 -*3Y2\3DE2.Y@'!!Y3!Y35'Y@Y#YZY@@3665Z@
53 ,^2X1/2CD19X!-\\X2\X24-X!X@XYX!!2554Y!
54 .%1W0?1BC08W\,//W1/W13,W\W!WXW\\1443X\
55 9$0VZ'0ABZ7V/.??V0?V02.V/V\VWV//0332W/
56 8#ZUY-ZzAY6U?9''UZ'UZ19U?U/UVU??Z221V?
57 7@YTX,YyzX5T'8--TY-TY08T'T?TUT''Y110U'
58 6!XSW.XxyW4S-7,,SX,SXZ7S-S'STS--X00ZT-
59 5\WRV9WwxV3R,6..RW.RWY6R,R-RSR,,WZZYS,
60 4/VQU8VvwU2Q.599QV9QVX5Q.Q,QRQ..VYYXR.
61 3?UPT7UuvT1P9488PU8PUW4P9P.PQP99UXXWQ9
62 2'TOS6TtuS0O8377OT7OTV3O8O9OPO88TWWVP8
63 1-SNR5SstRZN7266NS6NSU2N7N8NON77SVVUO7
64 0,RMQ4RrsQYM6155MR5MRT1M6M7MNM66RUUTN6
65 Z.QLP3QqrPXL5044LQ4LQS0L5L6LML55QTTSM5
66 Y9PKO2PpqOWK4Z33KP3KPRZK4K5KLK44PSSRL4
67 X8OJN1OopNVJ3Y22JO2JOQYJ3J4JKJ33ORRQK3
68 W7NIM0NnoMUI2X11IN1INPXI2I3IJI22NQQPJ2
69 V6MHLZMmnLTH1W00HM0HMOWH1H2HIH11MPPOI1
70 U5LGKYLlmKSG0VZZGLZGLNVG0G1GHG00LOONH0
71 T4KFJXKklJRFZUYYFKYFKMUFZF0FGFZZKNNMGZ
72 S3JEIWJjkIQEYTXXEJXEJLTEYEZEFEYYJMMLFY
73 R2IDHVIijHPDXSWWDIWDIKSDXDYDEDXXILLKEX
74 Q1HCGUHhiGOCWRVVCHVCHJRCWCXCDCWWHKKJDW
75 P0GBFTGghFNBVQUUBGUBGIQBVBWBCBVVGJJICV
76 OZFAESFfgEMAUPTTAFTAFHPAUAVABAUUFIIHBU
77 NYEzDREefDLzTOSSzESzEGOzTzUzAzTTEHHGAT
78 MXDyCQDdeCKySNRRyDRyDFNySyTyzySSDGGFzS
79 LWCxBPCcdBJxRMQQxCQxCEMxRxSxyxRRCFFEyR
80 KVBwAOBbcAIwQLPPwBPwBDLwQwRwxwQQBEEDxQ
81 JUAvzNAabzHvPKOOvAOvACKvPvQvwvPPADDCwP
82 ITzuyMz ayGuOJNNuzNuzBJuOuPuvuOOzCCBvO
83 HSytxLy; xFtNIMMtyMtyAItNtOtutNNyBBAuN
84 GRxswKx";wEsMHLLsxLsxzHsMsNstsMMxAAztM
85 FQwrvJw~"vDrLGKKrwKrwyGrLrMrsrLLwzzysL
Looking this over, it's pretty obvious that offset 19 yields normal English letters, punctuation, and spacing, and everything else doesn't. (Of course, since we're only looking at every sixth letter, the result isn't readable.) We can repeat this for the other positions in the string
(testall s 0 6)
...
59 SepdaVirouumw eelche e ne?i  iibri ier
...
(testall s 1 6)
...
59 ilr,leckwl e y syki,l ye  oImttidtcn i
...
By scanning the six sets of outputs, the obvious key set is (59 59 19 9 24 50). Now we can write a simple function to decrypt the entire string from a list of offsets:
(def decode (s offsets)
   (for i 0 (- (len s) 1)
     (pr (decodechar (s i) (offsets (mod i (len offsets))))))
   (prn))

(decode s '(59 59 19 9 24 50))
Sincerely impressed, cheald.  Very nice work.  Now, could you let me know that you've successfully cracked this one, and let me give you one more test?  Obviously I can make it a little bit harder without changing the algorithm.
The original algorithm uses a string for the key, not integers, so we can map the offsets back to a string key:
(each x '(59 59 19 9 24 50) (pr (m x)))
66sixXnil
Ignoring nil, the key is "66sixX". Thus, a few minutes with Arc is sufficient to break the cipher and determine the original key.

The reddit thread includes multiple other challenges. One is:

(= s2 "sBC^myM3f5MC6nBF~d?OM*mqt5y@wB3Bvt'm6vy%vIB3x9RNJmJB@y,Bt6rHN4m@FS3nrF8d(It5rpL8e8:t^BpN,e(tw,AuO?m^wB8ApG4C4Ny^GpI(x4NB-Fpy!g^SJ*vEH3q9NB@q+t3M.tB8mIO6g9yx^mpQ8p\\tx@Auf3dhtM-Asy%i\\St6BDA%e(OF4Gut,m!;t3Vvt,i4xI8FpH@x4t23nIE3x-uN3nJt/i5MN3uut!s(tw@AJC!y9tN@mCu?i4zO!mEz3x-CM3pyJ,i^tQ,vsB3m*tM@muu^C4ut1$pS8e^tI/qnt6s)Fx3pHu6o4CNJmEL3b'Nt6nDt5i4wL4pAy7d6St,nDx1d5ww@EtC!k4NI3BJB8v*;t3YuN3q9tL8Cuu*d#Hw8mCI%i4t23xDI+d'NtaN3t5i4wL4pAy7/4vO*mYt5i(t6m59t#i#JF8miL8h8CN3vIt!s(tr\\BIN3t9IJ/rnn3A#Hb*mtI3m(;")
The same technique finds the key length is 9, the offsets are (57 20 20 56 13 16 20 56 4), and the plaintext and key are:
(decode s2 '(57 20 20 56 13 16 20 56 4))
This is basically just a bunch of jibberish text, though certainly able to be read, so that chneukirchen may test out this encryption method.  If he succeeds  well done!  I sincerely congratulate him.  If he does not  I ask that at least he not continue to make fun of this cipher which is so easy a "7 year old" could crack it, or "it can be cracked by hand" according to others.  Let me repeat once more  I know it CAN be cracked, but I bet MOST people (reddit is not "most people") won't do it.

(each x '(57 20 20 56 13 16 20 56 4) (pr (m x)))
4tt3mpt3dnil
This shows how Arc can be used as a tool in simple cryptanalysis, and also shows that homegrown cryptography can be very weak. If you're interested in cryptography, I highly recommend that you read Applied Cryptography. If it's the history of cryptography that interests you, I recommend The Codebreakers.

QR codes in Lego

I've been experimenting with QR codes (2-d bar codes) with my mobile phone. While helping my daughter assemble a Lego® spaceship, I had the random thought that maybe I could build a QR code out of Lego. Would such a QR code be readable, or would the bumps and gaps mess it up?

To find out, I generated a QR code and started assembling it out of my daughter's spare Lego pieces. This would have been much easier if she had a lot of small white pieces, so I needed to really dig to find the necessary 1x1 blocks. (I'm pretty sure that determining if a code can be tiled with a specific set of blocks is a NP-complete problem, but my manual heuristics were sufficient to get it assembled.)

The moment of truth... would it scan? No, not at all. The problem was I needed a border around the code block and I'd put the pieces too close to the edge. So I laboriously shifted all the pieces over and added a white border.

Lego QR code

With the border, the QR code scanned amazingly well. Here's a camera-eye view. If you have a barcode-enabled phone, you can probably scan this off the screen:

Lego QR code

I used the ZXing barcode scanning program on my phone. I expected the scan would be sensitive to the orientation, lighting, and distance, since Lego isn't very even. However, the phone scanned the blocks surprisingly rapidly and effectively. Note the orientation-independence:

Phone scanning Lego QR code

My conclusion is that Lego is a viable medium for implementing QR codes.

Some tips if you try this at home... QR codes come in multiple sizes with various levels of error correction. The minimum size is 21x21 but many QR websites will generate larger ones. Avoid the larger sizes unless you want to do more assembly. I used the Raco Industries online QR code generator since that site provides a lot of control over the generated QR code. If you print out the code with .3125" pixels, the output will conveniently match the size of the Lego blocks.

(Apologies if you were expecting some Arc code in this article.)

The rise of scripting languages and the fall of Java

Java is very much in full retreat.
-- R. Loui
Professor Ronald Loui has an interesting article on the rise of scripting languages (In Praise of Scripting: Real Programming Pragmatism) in the July 2008 issue of IEEE Computer. It claims scripting languages such as Perl, Python, and Javascript have dramatically fulfilled their early promise, provide many benefits, and are poised to take over the lead from Java. However, the academic programming language community is stuck in theory and hasn't recognized the ascendence of scripting languages.

I agree that scripting languages are on the rise. Most people would agree that they provide rapid development, higher levels of abstraction, and brevity that helps the programmer. The article also describes how scripting languages can be a performance win, since they can allow experimentation and implementation of efficient algorithms that would be too painful in Java or C++. So even if C++ is faster on the micro-benchmark level, a programmer using a scripting language may end up with faster algorithms overall. I've argued somewhat controversally that Arc is too slow for my programming problems, so I remain unconvinced that basic performance can be ignored entirely.

As for the claim that Java is in full retreat, it strikes me as wishful thinking. (I'd believe "slow decline" though.) It will be interesting to check back on this claim in 5 years.

I personally believe that CS1 [freshman computer science] Java is the greatest single mistake in the history of computing curricula.
-- R. Loui
The article suggests good languages for teaching introductory computer science are gawk, Javscript, PHP, and ASP, but Python is emerging as a consensus for the best freshman programming language. This is the hardest part of the article for me to swallow. The idea of writing real programs in Awk never occurred to me, and I remain skeptical even though the author claims it works well. For those who would suggest Scheme as an introductory programming language, it was displaced as a dominant freshman language by Java a decade ago, and is apparently no longer considered an option.

I can't argue with the author's claim that student learning is enhanced by experimenting, writing code, and getting hands-on experience, and that scripting languages make this faster and easier.

Python and Ruby have the enviable properties that almost no one dislikes them, and almost everyone respects them.
-- R. Loui
In Why your favorite language is unpopular I discussed how the Change Function model can explain the success of programming languages based on maximizing the crisis solved and minimizing the perceived pain of adoption. I can apply this model applies to scripting languages as well:

Magnitude of crisis solved by Tcl/Tk: High - How to add a scripting language to a C program. How to add a GUI to a C program without painful X11 and Motif code.
Total Perceived Pain of Adoption: Low - Link Tcl in with your C program and add a few hooks. Create the GUI with trivial scripts.

Magnitude of crisis solved by Perl: High - How to quickly write CGI scripts. How to solve problems too complex for shell scripts. How to process files. How to develop quickly and iteratively.
Total Perceived Pain of Adoption: Low - Apart from looking like line noise, Perl is easy to get started with, is well integrated with Unix, has the definitive regex implementation, and has libraries for almost everything.

My point is that these languages solved specific painful problems and had low pain of adoption. As a result, they were much more successful than beautiful, powerful languages that were less able to directly solve painful problems or were more painful to adopt.

The real reason why academics were blindsided by scripting is their lack of practicality.
-- R. Loui
A major thrust of the article is that academics are too concerned with theoretical issues of syntax and semantics, rather than pragmatic issues of what a language can achive quickly, inexpensively, and practically. Academics are said to be too tied to theoretical concepts such as object-oriented programming and strong typing, and are missing the real-world benefits of scripting languages.

(Interestingly, Rob Pike made a similar argument against academics in the context of operating systems software (Systems Software Research is Irrelevant), stating that academic research is irrelevant and the real innovation is in industry. Since I have friends doing academic OS research, I should add a disclaimer here that I don't necessarily agree.)

One measure of pragmatics raised by the paper is how well does a language work with other Unix tools. I think the importance of this is underappreciated. In particular, I view this as a significant barrier to adoption of Arc. Running Arc as a shell script instead of a REPL is nontrivial (as is the case with many Lisp and Scheme implementations). Running an external program from Arc is clunky, even though it is often necessary to actually get things done (Kens' law), and real pipes are missing from Arc entirely.

Java's integration with Unix also has painful gaps - where's getpid() for instance? Why is JNI so difficult compared to calling native code from C#? I blame Sun's pure-Java platform independence ideology, and I'm surprised it hasn't hurt Java more.

On the other hand, Python and Perl provide a remarkable degree of integration, which I view as a key factor in their success. Likewise, Visual Basic is highly integrated with the Windows environment and highly successful there.

In conclusion, Loui's paper raises numerous interesting points about the success of scripting languages. I expect that the reasons for the rise of scripting languages will only get stronger, and languages that don't support the scripting model will have an increasingly harder time gaining adoption.

Note: quotes above are from the preprint and may not match the published article.

Why your favorite language is unpopular

The total world's population of Haskell programmers fits in a 747. And if that goes down, nobody would even notice.
-- Erik Meijer

I recently saw an interesting talk on functional programming by Erik Meijer (of Bananas, Lenses, Envelopes, and Barbed Wire fame). Among other things, he discussed why many superior technologies such as Haskell don't catch on.

Geek formula for success

He claims the "Geek formula" for success of a technology is that if a technology is 10 times better, it should catch on and become popular. Even if it is slower, Moore's law will soon make it 10 times faster.

So if Haskell is 10 times better than C and Haskell programs are 10 times shorter, everybody should be using Haskell.

Real-life formula for success

However, as Erik points out, "That's not how it is in real life." In real life, success is based on the perceived crisis divided by the perceived pain of adoption. Users want something that will get the job done and solve their crisis, without a lot of pain to switch.


This argument applies to many languages that remain unpopular despite their technical merits, such as Lisp, Arc, and Erlang, as well as technologies such as the Semantic Web and LaTex.

The Change Function

The above argument is based on the book The Change Function: Why Some Technologies Take Off and Others Crash and Burn by Pip Coburn. To summarize the book, new technologies aren't adopted because they are great, new, and disruptive; they are adopted only if the user's crisis solved by the technology is greater than the perceived pain of adoption. As a result, most new technologies fail.

The first half of the book is a bit fluffy, but gets more interesting when it discusses specific technologies that failed or succeeded. The book also goes out on a limb and predicts future winners (mobile enterprise Email, satellite radio, business intelligence software) and losers (RFID, entertainment PC, WiMax).

Languages and The Change Function

The Change Function argument has a lot of merit for explaining what languages become popular and what languages don't. If Lisp is so great, why are there 8 million Visual Basic programmers worldwide and few Lisp programmers? The answer isn't pointy-haired bosses (since Lisp isn't popular on SourceForge either). The crisis vs. pain of adoption model provides a powerful explanation:

Magnitude of crisis solved by Visual Basic: High (e.g. how to easily write Windows applications)
Total Perceived Pain of Adoption for Visual Basic: Very Low (hit Alt-F11 in Excel and you're done)

Magnitude of crisis solved by Lisp: Low (metaprogramming, powerful macros, and higher-order functions are solutions in search of problems)
Total Perceived Pain of Adoption for Lisp: High (this shouldn't require explanation)

The same model explains the success of, for instance, Java:
Magnitude of crisis solved by Java: High (originally how to run code in a browser and write portable code, now how to avoid crashes due to memory allocation errors and bad pointers)
Total Perceived Pain of Adoption: Low (syntax similar to C++, easy to deploy)

Applying this model to other languages is left as an exercise for the reader.

Erik points out that Erlang and Haskell are now being marketed according to the second formula: there is a multicore crisis and functional languages are the solution. It will be interesting to see how much additional traction these languages get without addressing the "pain of adoption" part.

The Change Function and startups

The Change Function ends with ten sets of questions and a set of techniques for designing technologies that will be adopted; this part of the book has many ideas that would be beneficial for startups. Many of these are fairly obvious, such as "Fail fast and iterate", and have a customer-centered culture instead of a sales-centered culture, while others are more thought-provoking: "What is the user crisis you intend to solve? What are the top five reasons a user with this crisis would not use your product?" The ultimate conclusion of the book is "Figure out what people really want!", which brings to mind the advice to make something people want.

"Maxwell's equations of software" examined

A recent post quotes Alan Kay's statement that expressing Lisp in itself is the "Maxwell's Equations of Software":

Yes, that was the big revelation to me when I was in graduate school—when I finally understood that the half page of code on the bottom of page 13 of the Lisp 1.5 manual was Lisp in itself. These were “Maxwell’s Equations of Software!”

This quote appears many places on the web, but the code itself is harder to find. What is this amazing half page of code?

The Lisp 1.5 Manual, which was written by John McCarthy et al in 1961, is available at softwarepreservation.org. In it, the "Maxwell's equations" define a universal Lisp function evalquote that can evaluate any given function:

evalquote[fn;x] = apply[fn;x;NIL]
where
apply[fn;x;a] =
     [atom[fn] -> [eq[fn;CAR] -> caar[x];
                  eq[fn;CDR] -> cdar[x];
                  eq[fn;CONS] -> cons[car[x];cadr[x]];
                  eq[fn;ATOM] -> atom[car[x]];
                  eq[fn;EQ] -> eq[car[x];cadr[x]];
                  T -> apply[eval[fn;a];x;a]];
     eq[car[fn];LAMBDA] -> eval[caddr[fn];pairlis[cadr[fn];x;a]];
     eq[car[fn];LABEL] -> apply[caddr[fn];x;cons[cons[cadr[fn];
                               caddr[fn]];a]]]

eval[e;a] = [atom[e] -> cdr[assoc[e;a]];
     atom[car[e]] ->
      [eq[car[e],QUOTE] -> cadr[e];
      eq[car[e];COND] -> evcon[cdr[e];a];
      T -> apply[car[e];evlis[cdr[e];a];a]];
      T -> apply[car[e];evlis[cdr[e];a];a]]

evcon[c;a] = [eval[caar[c];a] -> eval[cadar[c];a];
      T -> evcon[cdr[c];a]]

evlis[m;a] = [null[m] -> NIL;
      T -> cons[eval[car[m];a];evlis[cdr[m];a]]]

The above code is defined in a meta-language (M-expressions), which can be straighforwardly translated into S-expressions. Functions in M-expressions use square brackets and have arguments separated by semicolons. M-expressions conditionals are of the form [predicate -> value; predicate -> value; ...]. M-expression label is analogous to defun or define.

The point of all this is that M-expressions are the code that operates on the S-expression data, but the M-expression meta-language and S-expression data actually coincide. Thus, code and data are the same thing in Lisp, and a half-page of code is sufficient to define a basic Lisp interpreter in Lisp given a few primitives (car, cdr, cons, eq, atom). The code presents a Meta-circular evaluator for Lisp; see (SICP chapter 4.1 for more details on metacircular evaluators. (Unfortunately, this won't give you a working Lisp interpreter for free; things such as the garbage collector, the list primitives, and parsing need to be implemented somewhere. Also note that this metacircular evaluator doesn't give you niceties such as arithmetic.))

To understand the above code, apply takes a function and argument, while eval acts on a form. The last argument to these is an association list for the environment, which stores the values of bound values and function names. In brief, apply implements CAR, CDR, CONS, ATOM, and EQ in terms of the primitives. It implements LAMBDA by pairing up the variables and arguments and passing them to eval. It implements LABEL (which defines a function) by adding the function name and definition to the association list.

The code for eval processes a form in a straightforward manner. It handles the QUOTE form by returning the quoted value. It handles COND by evaluating the predicates with the help of evcon. Otherwise, it interprets an atom as a variable and returns the value. If given a list, it interprets this as a function application; the arguments are evaluated with evalis and the function is evaluated by apply.

The above code is not quite complete; it relies on some other simple functions that were previously defined in the manual, such as equals and cadr Less obvious functions are pairlis[x;y;a] pairs up lists x and y and adds them to association list a. assoc[x;a] looks up x in association list a. sublis[a;y] treats association list a as a mapping of variables to values, and replaces variables in S-expression y with the associated variables. These functions can be straightforwardly built from the primitive functions.

(By the way, I'm pretty sure the comma in eq[car[e],QUOTE] is a typo, but that's how it is in the original.)

Why Arc is good for video games

Can Arc be used to create a Tetris-like game? In my previous posting on using OpenGL with Arc, I described how the effort to set up OpenGL had stalled my game-writing adventure. I finally got the game written, only to discover that Conan Dalton had beat me by writing Arc Tetris running on Rainbow, his version of Arc running on Java. Nonetheless, here's my implementation in Arc using OpenGL.

screenshot of game

Playing the game

See using OpenGL with Arc for details on setting up OpenGL. To summarize, the files are in the opengl directory of the Anarki git. Run gl-arc.scm in DrScheme. Then load "tet.arc" into the Arc REPL, and the game will start.

To control the game, use the arrow keys. Left and right arrows move the tile. Up-arrow rotates the tile, and down-arrow drops it faster. When the game ends, "s" will start a new game.

Implementing games in Arc

Note that Arc itself doesn't have any graphics support, or any way to get at Scheme libraries. I hacked on Arc's implementation in Scheme to get access to the OpenGL libraries in PLT Scheme.

My first concern with using Arc was speed. It turns out that Arc on a newish laptop is plenty fast to run the game, apart from occasional garbage collection glitches. The game itself doesn't require a fast refresh rate, and the amount of processing per frame is pretty small, so Arc manages just fine.

The biggest problem with implementing the game in Arc was the lack of useful error reporting in Arc; I should have emphasized this in Why Arc is bad for exploratory programming. Every time I wrote more than a few lines at a time, I'd encounter a mystery error such as "ac.scm::18267: car: expects argument of type ; given 1" or "ac.scm::17324: Function call on inappropriate object 2 ()". I often had to stick print statements throughout the code, just to figure out where the error was happening. Reporting what is the error and where an error occurs is basic functionality that a programming language implementation should provide. An actual debugger would be a bonus.

I found that game programming doesn't really let you take advantage of the REPL for development. OpenGL is very stateful, so I can't just call a drawing routine from the REPL to see how it's working. I usually had to restart my code, and often reload the Arc interpreter itself to clear up the various Scheme objects. When I worked incrementally, I often ended up with startup issues when I ran from the start (i.e. variables getting initialized in the wrong order).

One annoyance is Arc's lack of arrays. I used tables in some places, and lists of lists in other places, but vectors and arrays seem like really basic datatypes.

Another annoyance is OpenGL's lack of any text support. If you're wondering about the lack of text in the screenshot, that's why. I implemented a simple 7-segment number-drawing function to display the score.

Some (pretty obvious in retrospect) advice to other game writers: Figure out in advance if the Y axis points up or down. I wrote some of the code assuming Y increases going down before realizing that other parts of the code expected the Y axis to point up. Also, figuring out the origin and scale in advance is a good thing.

Another tricky part of game implementation is state transitions. Both startup and end-of-game caused me more difficulty than I expected.

Implementing the game became much more rewarding once I got a minimal interactive display working. I should have done that sooner, rather than implementing the algorithms up front. On the other hand, I found it useful to disable animation much of the time, using keypresses to single-step movement.

The game has no audio, since PLT Scheme doesn't have audio support.

It took me a long time to realize that the pieces in Tetris don't actually rotate around a fixed point. Once I realized this and hard-coded the rotations instead of implementing a rotation function, things became easier.

I display the piece in motion separately from the playing field array that holds the stationary pieces. When a line is completed, instead of updating the playing field array in place, I decided it would be interesting to write remove-rows in a functional way so it returns an entirely new playing field array, rather than updating the existing array (as I did in saveshape when a piece hits bottom). Writing the whole game in a functional style would be very difficult, as others have described.

I could make the code more "Arcish" with various macros (e.g. w/matrix to wrap code in gl-push-matrix and gl-pop-matrix). But I decided to "ship early". Likewise I didn't bother implementing game levels, although the game does speed up as it progreses. And I've only made minimal efforts to clean up the code, so I'm not presenting it as a model of good style.

Conclusion

It's possible to write a playable game in Arc (with enough hacking on the language to get OpenGL support). I think it would have been considerably easier to write it in PLT Scheme directly, but the Arc implementation builds character. I was pleasantly surprised that Arc had enough speed to run the game. The Arc code works out to 389 lines for those keeping score; OpenGL is responsible for much of the length (50 lines just to display the score). I expect writing in Arc will be considerably easier if the language gets decent error reporting.

Using OpenGL with Arc

I saw a posting on news.ycombinator entitled "Take the Tetris test (an Arc version, anyone?)", which suggested writing a simple Tetris-like game. One obvious problem with doing this in Arc is the lack of graphics support. But how hard could it be to use OpenGL graphics from Arc?

It turns out that using OpenGL from Arc is harder than I expected, but possible, given enough hacking on Arc's underlying Scheme implementation. In this article, I discuss how I got OpenGL to work.

Challenge 1: How to access libraries from Arc

The first challenge is that the official Arc release does not let you access Scheme libraries or functions. Not at all. Even though Arc is implemented on top of MzScheme, there is no mechanism to access the underlying MzScheme implementation. If you want to access Scheme, you must actually modify the Arc language implementation by hacking on the Scheme code that implements Arc.

The unofficial Anarki implementation is a modified version of Arc that provides access to Scheme (as well as many other useful improvements). However, I decided to base my OpenGL project on the offical Arc implementation rather than Anarki.

I replaced the Arc implementation file ac.scm with a modified version called arc-gl.scm that gives me access to the necessary Scheme functions. The relevant Scheme code is:

(require (lib "mred.ss" "mred")
       (lib "class.ss")
       (lib "math.ss")
       (prefix gl- (lib "sgl.ss" "sgl"))
       (lib "gl-vectors.ss" "sgl"))

; List of functions to export from Scheme to Arc
(map (lambda (s) (xdef s (eval s)))
   '(gl-shade-model gl-normal gl-begin gl-end gl-vertex gl-clear-color gl-clear
     gl-push-matrix gl-pop-matrix gl-rotate gl-translate gl-call-list gl-flush
     gl-light-v gl-enable gl-new-list gl-gen-lists gl-material-v gl-viewport
     gl-matrix-mode gl-load-identity gl-frustum gl-light-v gl-enable gl-end-list
     gl-scale sin cos))

; Arc doesn't provide access to vector, so make gl-float-vector
; take individual arguments instead of a vector
(xdef 'gl-float-vector (lambda (a b c d) (vector->gl-float-vector (vector a b c d))))
First, the code imports the necessary libraries. Next, it uses xdef to allow access to the list of specified Scheme functions. I included the OpenGL functions I used; you will need to extend this list if you want additional OpenGL functionality. I also include sin and cos; they are missing from Arc, which is almost pervesely inconvenient.

To get my code:

This contains arc-gl.scm, the updated Scheme code; arch.arc, the arch demo in Arc; and gears.arc, the gears demo in Arc. It is also available from the Anarki git.

Challenge 2: Where is OpenGL?

OpenGL isn't part of the plain vanilla MzScheme that is recommended for Arc, but it is part of DrScheme. Both DrScheme and MzScheme are versions of Scheme under the PLT Scheme umbrella; MzScheme is the lightweight version, while DrScheme is the graphical version that includes MrEd graphics toolbox and the OpenGL bindings for Scheme. Thus:
  • Download and install DrScheme version 352.

An alternative to OpenGL would be using MrEd's 2-d graphics; I've described before how to add simple graphics to Arc. However, I wanted to use the opportunity to learn more about OpenGL.

Challenge 3: Running an Arc REPL in PLT Scheme

It's straightforward to run as.scm inside PLT Scheme and get an Arc REPL. However, a problem immediately turns up with OpenGL

An OpenGL animation will lock up as soon as the Arc REPL is waiting for input. The problem is that MrEd is built around an event loop, which needs to keep running (similar to the Windows message loop). When the REPL blocks on a read call, the entire system blocks.

The solution is to implement a new GUI-based Arc REPL instead of the read-based REPL. MrEd provides text fields that can be used to provide non-blocking input. When the input is submitted, the event loop executes a callback, which can then run the Arc code. Of course, the Arc code needs to return reasonably promptly, or else things will be locked up again. However, the Arc code can start a new thread if it needs to do a long-running computation.

The following MzScheme code creates a frame, text-field for output, text-field for input, and a submit button, and executes the submitted code through arc-eval. It is analogous to the REPL code in ac.scm. I put this code in arc-gl.scm.

(define frame (instantiate frame% ("Arc REPL")))
(send frame show #t)

(define (cb-submit a b) (on-err (lambda (c)
 (append-tf (exn-message c)))
      (lambda ()
 (append-tf (send in-field get-value))
 (append-tf (format "~a" (ac-denil (arc-eval (read (open-input-string
                                  (send in-field get-value)))))))
 (send in-field set-value ""))))


(define tf (instantiate text-field% ("" frame)
    (style '(multiple)) (min-width 600) (min-height 150) (enabled #f)))

(define (append-tf str)
    (send tf set-value (string-append (send tf get-value) str "\n")))

(define in-field (instantiate text-field% ("" frame) (style '(single)) ))

(define bb  (instantiate button% ("submit" frame) (style '(border)) (callback cb-submit)))
To start up Arc with the new REPL:
  • Load the new REPL code into the same directory as the original Arc files.
  • Run drscheme.
  • Go to Language -> Choose Language -> PLT -> Graphical. Click Ok.
  • Go to File -> Open -> arc-gl.scm
  • Click Run
As you can see from the screenshot, the REPL doesn't get any points for style, but it gets the job done.

Challenge 4: Using Scheme's object model

The mechanism above can't be used to access PLT Scheme's windowing operations, because they are heavily based on the Scheme object implementation, which is implemented through complex Scheme macros. Thus, I can't simply map the windowing operations into Arc, as I did with sin. If the operations are called directly, they will try to apply Scheme macros to Arc code, which won't work. If they're called after Arc evaluation, the Arc implementation will have already tried to evaluate the Scheme macros as Arc code, which won't work either.

I tried several methods of welding the Scheme objects into Arc, none of which are entirely satisfactory. The first approach was to encapsulate everything in Scheme and provide simple non-object-based methods that can be called from Arc. For example, an Arc function make-window could be implemented by executing the necessary Scheme code. This works for simple operations, and is the approach I used for simple 2-d graphics, but the encapsulation breaks when the code gets more complex, for example with callbacks and method invocations. It is also unsatisfying because most of the interesting code is written in Scheme, not Arc.

Another approach would be to fully implement Scheme's object model in Arc, so everything could be written in Arc. That was way more difficulty and work than I wanted to do, especially since the object system is implemented in very complex Scheme macros.

My next approach was to implement an exec-scheme function that allows chunks of Scheme code to be called directly from inside Arc. This worked, but was pretty hacky.

Minor semantic differences between Arc and Scheme add more ugliness. Arc converts #f to nil, which doesn't work for Scheme code that is expecting #f. I hacked around this by adding a symbol false that gets converted to #f. Another problem is Arc lists get nil added at the end; so the lists must be converted going to and from Scheme.

Finally, I ended up with a somewhat hybrid approach. On top of eval-scheme, I implemented Arc macros to wrap the object operations of send and invoke. These macros try to do Arc compilation on the Arc things, and leave the Scheme things untouched. Even so, it's still kind of an ugly mix of Scheme and Arc with lots of quoting. I found writing these macros surprisingly difficult, mixing evaluated and unevaluated stuff.

As an aside, Scheme's object framework uses send foo bar baz to invoke bar on object foo with argument baz. I.e. foo.bar(baz) in Java. I found it interesting that this semantic difference made me think about object-oriented programming differently: Scheme objects are getting sent a message, doing something with the message, and providing a reply. Of course, this is the same as invoking a method, but it feels more "distant" somehow.

At the end of this, I ended up with a moderately ugly way of creating Scheme objects from Arc, providing callbacks to Arc functions, and implementing instantiate and send in Arc. This isn't a full object implementation, but it was enough to get the job done. For example, to instantiate the MrEd frame% class and assign it to an Arc variable:

(= frame (instantiate 'frame% '("OpenGL Demo" 'false)))
A function to send a refresh message to the canvas:
(def refresh () (send arc-canvas refresh))
Defining a subclass of canvas% to display the image is uglier as it is Arc code that's mostly in Scheme, but using the Arc callbacks:
(eval-scheme `(define arc-canvas%
(class* canvas% ()
  (inherit refresh with-gl-context swap-gl-buffers get-parent)
  (define/public (run) (,ex-run))
  (define/override (on-size width height)
     (with-gl-context (lambda () (,ex-on-size width height))))
  (define/override (on-paint) (with-gl-context ,ex-on-paint))
  (super-instantiate () (style '(gl no-autoclear))))))

Finally doing something with OpenGL in Arc

Here's a simple example of an animated arch displayed in OpenGL. To run this example, download the code and start up DrScheme as described above. Then:
  • From the Arc REPL (not the Scheme REPL) execute:
    (load "arch.arc")
    (motion)
    

The code, in arch.arc, has several parts. The arch function does the OpenGL work to create the Arch out of triangles and quadrilaterals. It uses archfan to generate the triangle fan for half of the front of the arch. The full code is too long to include here, but the following code, which generates the inside of the arch, should give a taste:

(for i 0 (+ n n -1)
(withs (angle (* (/ 3.1415926 n 2) i)
  angle2 (* (/ 3.1415926 n 2) (+ i 1)))
  (gl-normal (* r (cos angle)) (* r -1 (sin angle)) 0)
  (gl-vertex (* r -1 (cos angle)) (* r (sin angle)) z)
  (gl-vertex (* r -1 (cos angle)) (* r (sin angle)) negz)
  (gl-normal (* r (cos angle2)) (* r -1 (sin angle2)) 0)
  (gl-vertex (* r -1 (cos angle2)) (* r (sin angle2)) negz)
  (gl-vertex (* r -1 (cos angle2)) (* r (sin angle2)) z)))
The next section of the code contains the graphics callback functions:
  • ex-run: the animation entry point called by the timer. Updates the rotation and refreshes the image.
  • ex-on-paint: uses OpenGL commands to draw the arch
  • ex-on-size: handles window resize and initial size. The OpenGL model (arch, lighting, projection) is set up here.
The last part of the code calls into Scheme to create the canvas and tie it to the appropriate callbacks, create the frame holding the canvas, and display the frame.

I find the arch-generation code somewhat unsatisfying stylistically, as there is a lot of duplicated code to generate the vertices and normal vectors for the front, back, and sides. I couldn't come up with a nice way to fold everything together. I suppose the Arc-y solution would be to write a DSL to express graphical objects, but that's beyond the scope of this project.

Let me mention that low-level OpenGL is not particuarly friendly for exploratory programming. It's tricky to generate complex shapes correctly: it's really easy to end up with the wrong normals, vertices that aren't clockwise, non-convex polygons, and many other random problems. I find it works much better to sketch out what I'm doing on paper first; if I just start coding, I end up with a mess of bad polygons. In addition, I've found that doing the wrong thing in OpenGL will lock up DrScheme and/or crash my machine if I'm unlucky.

The gears example

I also ported the classic OpenGL "gears" demo from Scheme to Arc. This demo includes GUI buttons to rotate the gears. (The animated GIF at the top of the page shows the program in operation.) This is a fairly straightforward port of gears.scm that comes with DrScheme. To run it:
  • Enter into the Arc REPL: (load "gears.arc")

The interesting thing to note in gear.arc is the horizontal-panel%, vertical-panel%, and button% objects that provide the UI controls. They are linked to Arc functions to update the viewing parameters. For example, in the following, note that instantiate is passing Arc code (using fn) to the Scheme constructor for button%. The tricky part is making sure the right things get evaluated in the right language:

 (instantiate 'button% (list "Right" h (fn x (ex-move-right)))
                       '(stretchable-width #t))
How did I generate the animated gifs? Just brute force: I took some screenshots and joined them into an animated gif using gimp. The real animation is smoother. I found the animated gifs are a bit annoying, so I added JavaScript to start and stop them. The animation is stopped by substituting a static gif.

Conclusion

So what about the Tetris challenge that inspired my work on OpenGL? After all the effort to get OpenGL to work in Arc, I lost momentum on the original project of implementing the game. (This is an example of why Arc is bad for exploratory programming. If I wanted to get something done, I would have been much better off using DrScheme directly.) Maybe I'll complete the game for a future posting.