The book is an excellent example of exploratory programming, showing how to incrementally build up these applications and experiment with different algorithms from the Python interactive prompt. For instance, topic clustering is illustrated by first implementing code to fetch blog pages from RSS feeds, breaking the pages into words, applying first a hierarchical clustering algorithm and then a K-means clustering algorithm to the contents, and then graphically displaying a dendrogram showing related blogs. At each step, the book shows how to try out the code and perform different experiments from the interactive prompt.
By using Python libraries, each step of implementation is pretty easy; the book can focus on the core algorithms, and leave the routine stuff to libraries: urllib2 to fetch web pages, Universal Feed Parser to access RSS feeds, Beautiful Soup to parse HTML, Python Imaging Library to generate images, pydelicious to access links on del.icio.us, and so forth.
If you want more details than the book provides (it is surprisingly lacking in references), I recommend Andrew Moore's online Statistical Data Mining Tutorials, which covers many of the same topics.
What does this have to do with Arc?While reading this book, I was struck by the contradiction that this book is a perfect example of exploratory programming, Arc is "tuned for exploratory programming", and yet using Arc to work through the Collective Intelligence algorithms in Arc is an exercise in frustration.
The problem, of course, is that Arc lacks libraries. Arc lacks basic functionality such as fetching a web page, parsing an XML document, or accessing a database. Arc lacks utility libraries to parse HTML pages or perform numerical analysis. Arc lacks specialized API libraries to access sites such as del.icio.us or Akismet. Arc lacks specialized numerical libraries such as a support-vector machine implementation. (In fact, Arc doesn't even have all the functionality of TRS-80 BASIC, which is a pretty low bar. Arc is inexplicably lacking trig, exp, and log, not to mention arrays and decent error reporting.)
To be sure, one could implement these libraries in Arc. The point is that implementing libraries detours you from the exploratory programming you're trying to do.
Paul Graham has commented that libraries are becoming an increasingly important component of programming languages, that huge libraries are now an expected part of a new programming language, and that libraries are an increasing important feature of programming languages. Given this understanding of the importance of libraries, it's surprising that Arc is so lacking in libraries. (It's also surprising that it lacks a module system or some other way to package libraries.) It's a commonplace complaint about Lisp that it lacks libraries compared to other languages, and Arc makes this even worse.
I think there are two different kinds of exploratory programming. The first I'll call the "Lisp model", where you are building a system from scratch, without external dependencies. The second, which I believe is much more common, is the "Perl/Python model", where you are interacting with existing systems and building on previous work. In the first case, libraries don't really matter, but in the second case, libraries are critical. The recently-popular article Programming in a Vacuum makes this point well, that picking the "best" language is fine in a vacuum, but in the real world what libraries are available is usually the key.
Besides the lack of libraries. Arc's slow performance rules it out for many of the algorithms from Programming Collective Intelligence. Many of the algorithms run uncomfortably slow in Python, and running Arc is that much worse. It's just not true that speed is unimportant in exploratory programming.
On the positive side for Arc, chapter 11 of Programming Collective Intelligence implements genetic programming algorithms by representing programs as trees, which are then evolved and executed. To support this, the book provides Python classes to represent code as a parse tree, execute the code tree, and prettyprint the tree. As the book points out, Lisp and its variants let you represent programs as trees directly. Thus, using Arc gives you the ability to represent code as a tree and dynamically modify the code tree for free. (However, it only takes 50 lines of Python to implement the tree interpreter, so the cost of Greenspunning is not particularly severe.)
To summarize, a language for exploratory programming should be concise, interactive, reasonably fast, and have sufficient libraries. Arc currently fails on the last two factors. Time will tell if these issues get resolved or not.