Sunday, June 28, 2009

Arduino + SheevaPlug = Cool Hardware Platform

I recently started using an Arduino in combination with a SheevaPlug as a convenient platform for hardware hacking. The two work together well, with complementary strengths. This describes my experience.

The Arduino

The Arduino is a popular microcontroller platform consisting of a small, inexpensive board and an easy-to-use C-based development environment. The board has multiple digital I/O pins and analog inputs, and a USB serial port connection for programming and communication. The Arduino can be interfaced with a wide variety of hardware. (The Arduino is kind of like a Microchip PIC development board, but uses the ATmega328 AVR microcontroller.) The Arduino is about the size of a credit card (but thick). It has connectors on top which will receive a daughter board (inexplicably called a "shield"), or jumper wires.
The Arduino board
The Arduino is very easy to program and use. The Arduino is programmed using the Wiring language, which is an extension of C with many libraries to interface with various types of hardware. The Arduino IDE provides an editor and compiler for a "sketch" (which is the name for an Arduino program), and downloads the code to the Arduino board over USB. After seeing the Arduino hype at the Maker Faire, I got a Adafruit Arduino Starter Pack, and I've been happy with it. More information about the Arduino is at www.arduino.cc; the playground is a good place to start.

You can do a lot with an Arduino, but it has some limitations. There's only 30KB of memory for programming (yes, kilobytes). Many things you might want to connect (Ethernet, SD card, USB hard drive or flash) require additional hardware, and even then are pretty limited.

The SheevaPlug

The Marvell SheevaPlug is a very small Linux server with a 1.2GHz 88F6281 ARM processor and 512MB of RAM and 512MB of flash storage. The SheevaPlug has a Gigabit Ethernet connection, a SD card slot, and a serial USB console connection. This all fits into a box the size of three cassette tapes that sells for $99 and uses about 5 watts of power.
The SheevaPlug
The SheevaPlug is designed to be plugged directly into the wall as a wall-wart, but the plug snaps out and can be replaced by a power coard. I'm currently using the "desktop" configuration with a power cord, since it's easier to connect things. Marvell recently gave me a SheevaPlug, and I've been enjoying it a lot.

Because the SheevaPlug is a full Linux system, the standard Linux software packages are available and easily installed with apt-get (full list). You can run X-windows (of course you need to VPN in). You can run Apache and PHP on the SheevaPlug and use it as a web server. The GCC toolchain runs on it. You can program it in C++, Python, Scheme, or whatever.

The main information source on the SheevaPlug is plugcomputer.org. Another useful site is computingplugs.com, a wiki about the SheevaPlug which is actually served by a SheevaPlug.

There are a few limitations of the SheevaPlug. The SheevaPlug is headless, so you need to access it via ssh or VNC. You can't run x86 binaries. The SheevaPlug's performance is roughly at the Pentium III level (benchmark), so it won't replace a state-of the-art server. Also, floating point performance is poor since there's no hardware FPU. The Linux install has some annoying minor issues with apt-get, DNS, NTP, and passwd (fixes). The SheevaPlug isn't really suited for low-level electronics projects because it doesn't have any accessible general-purpose I/O pins. (The CPU has 50 digital GPIO pins, but the SheevaPlug doesn't expose them.)

Arduino + SheevaPlug

The SheevaPlug and Arduino make a great pair together for hardware hacking projects since the SheevaPlug can easily provide the web server, computing, and storage functionality that the Arduino lacks, while the Arduino makes it easy to interface to hardware circuits. Moreover, the two are easily connected with a USB cable, and then they can communicate through simple serial communication (modulo kernel details below).
SheevaPlug with Arduino
The above picture shows the SheevaPlug and Arduino connected together by a USB cable. The SheevaPlug also has an Ethernet connection (green) and power (on the right). The Arduino has some jumpers to a breadboard circuit (described below).

A sample project: illumination recording

For an initial project, I decided to collect data on the illumnination in my room. (Temperature would have been more useful, but I didn't have a temperature sensor handy.) I implemented a simple Arduino sketch (i.e. program) to read illumination from a photocell and write it to the serial port. The SheevaPlug collects the data, generates a graph using gnuplot, and provides the data as a web page on the Internet. (Sorry, no URL to access it since I'm not currently running this.)

The screenshot below shows the web page that the SheevaPlug serves showing illumination across a few days. There are a lot of spikes due to turning lights on and off, but you can see where the sun rises, light increases through the day, peaking shortly before sunset. This web page illustrates the ability of the Arduino to measure analog data combined with the SheevaPlug's ability to collect data, plot it, and serve web pages.
SheevaPlug screenshot of browser

The Arduino code

Hooking up the Arduino to measure illumination was trivial; I used a 10K resistor and a photocell connected to an analog input pin. (I made the following schematic with Eagle.)
schematic of photocell connection

The Arduino sketch below (download) simply reads the analog input once a second and writes the value to the serial port as a decimal number. (One annoyance is the Arduino IDE won't run on the ARM processor, so I have to connect the Arduino to a separate computer to download the sketch.)

int ainPin = 2;     // select the analog input pin

void setup() {
  Serial.begin(9600);
}

void loop() {
  Serial.println(analogRead(ainPin), DEC);    // read the value from the sensor and write to serial
  delay(1000);
}

The Python code

The Python code that runs on the SheevaPlug (download) is straightforward. I used a simple Python web server to provide the graph page and some static files. It uses gnuplot to generate the graph. Some highlights of the code:

The pyserial library (apt-get install python-serial) provides simple access to the serial port. One thread reads lines from the serial port and dumps them to the data file along with a timestamp. Only one sample per minute is kept, and the others discarded. The Arduino may appear as a different device, e.g. /dev/ttyUSB0. If no such device appears when you plug the Arduino into the SheevaPlug, you probably need the kernel update below.

class Arduino(threading.Thread):
  def run(self):
    f = open('/tmp/data', 'a')
    # Port may vary from /dev/ttyUSB1
    self.ser = serial.Serial('/dev/ttyUSB1', 9600, timeout=10)
    self.ser.flushInput()
    old_timestamp = None
    while 1:
      data = self.ser.readline().strip()
      if data:
        timestamp = time.strftime("%m/%d/%Y %H:%M", time.localtime())
        if timestamp != old_timestamp:
          # Only log once per minute
          print >>f, timestamp, data.strip()
          old_timestamp = timestamp
        f.flush()

The other part of the Python code is an HTTP server using SimpleHTTPServer; a /graph URL runs gnuplot on the data file to generate a png image file. The returned web page loads the image, which is served by the Python server as a static file. (Since the SheevaPlug supports Apache, PHP, and mysql, I could have used them, but it seems like overkill for a simple demo.)

 def graph(self):
    g = os.popen('gnuplot', 'w')
    print >>g, GNUPLOT_CMD
    self.send_response(200)
    self.send_header('Content-type', 'text/html')
    self.end_headers()
    self.wfile.write(GRAPH_HTML)

Kernel with FTDI Serial support

The one stumbling block I encountered when using the SheevaPlug and Arduino together is the SheevaPlug kernel doesn't include FTDI support, which is necessary to access the Arduino via USB. An upgraded kernel is available, and much to my surprise it installed with no problems.

Where from here?

So far, I've found the Arduino and SheevaPlug to work together well. I have various plans to use the Arduino + SheevaPlug for home monitoring and control, solar panel monitoring, IR remote control, and other random projects. If there's interest, maybe I'll write about them in the future.

Monday, June 8, 2009

Inside the news.yc ranking formula

The recent arc3 release of Arc includes news.arc, the Arc source code for the Hacker News forum. Examining this code can give some insight into the ranking algorithm that selects the top articles on the Hacker News forum.

The ranking formula

In outline, each item is given a ranking, and the articles are sorted according to the ranking. The simplistic way to think about ranking is the number of votes is divided by time, so more votes results in a higher ranking, but the ranking also drops over time. The votes are raised to a power less than one, while the time is raised to a power greater than one, so time has more effect than votes. Some additional penalties also may be applied to the ranking.

To be specific, the following is the ranking code from news.arc:

(= gravity* 1.8 timebase* 120 front-threshold* 1
   nourl-factor* .4 lightweight-factor* .3 )

(def frontpage-rank (s (o scorefn realscore) (o gravity gravity*))
  (* (/ (let base (- (scorefn s) 1)
          (if (> base 0) (expt base .8) base))
        (expt (/ (+ (item-age s) timebase*) 60) gravity))
     (if (no (in s!type 'story 'poll))  1
         (blank s!url)                  nourl-factor*
         (lightweight s)                (min lightweight-factor*
                                             (contro-factor s))
                                        (contro-factor s))))

This algorithm can be expressed as a (slightly simplified) equation:

rank=\frac{ \left( score-1 \right) ^{.8}}{ \left( age _{hours} + 2 \right) ^ {1.8}} * penalties

This ranking algorithm is used for items on the front page as well as for ordering the comments on an item.

The key factor in the ranking formula is the exponent on time (i.e. the optional parameter gravity) is higher than the exponent on the votes. As a result, even if an item keeps getting a large number of votes, eventually the denominator will overwhelm it and its ranking will drop. For example, if a popular item gets 100 votes an hour, eventually it won't be able to keep pace with a new item getting 10 votes an hour. This ensures that even the most popular items won't stay on the list forever. In other words, gravity controls how fast articles get pulled down in ranking.

By default, the scoring function scorefn is the realscore, which is the net votes for an item ignoring any upvotes from pontential sockpuppet accounts. Sockpuppet accounts are accounts created to manipulate voting, and an account is considered potentially a sockpuppet based on several simple factors.

Because the score is given an exponent of .8 (for positive scores), the benefit of large numbers of votes is reduced. Mathematically, though, this exponent could be eliminated. Since only relative rankings matter, raise everything to the 1/.8 power, and the numerator's exponent disappears, along with the need for the negative check. This would be a win from the "code golf" perspective.

A few other constants are of interest. Since 1 is subtracted from the score, an item starts off with a rank of 0. In the denominator, 120 minutes are added to the time. Thus, an article can be considered two hours old at time of posting, limiting the ranking boost for very recent articles. The front-threshold* specifies a minimum ranking score for articles to appear.

Penalties

A story (i.e. a normal posting) or poll can receive additional ranking penalties. The penalties are not applied to comments or poll options. The penalties are as follows:
  • An item with a blank url is penalized by nourl-factor*. This lowers the ranking of an "Ask HN" item, for instance.
  • A "lightweight" item is penalized.
  • A "controversial" item is penalized.
An item is considered lightweight if the title is a "rallying cry", the post is an image, or the URL's site is in a list of lightweight sites. A lightweight item is penalized by lightweight-factor*. (The code doesn't show how a "rallying cry" is determined.)

The contro-factor penalty is applied to controversial articles. The contro-factor is computed as:

(def contro-factor (s)
  (aif (check (visible-family nil s) [> _ 20])
       (min 1 (expt (/ (realscore s) it) 2))
       1))
An item with too many comments (at least 20 and more comments than upvotes) gets the contro-factor penalty. An item's visible-family is the number of visible comments (plus one for the item itself). If the visible-family is more than 20, the penalty is:

min\left[ 1, \left( \frac{realscore}{visible\-family}\right) ^{2} \right]

A lightweight item gets the worse of the "lightweight" and "contro-factor" penalties. The contro-factor penalty is only applied if less than 1; i.e. an item can't get a boost from it. Because the ratio is squared, the effect of the penalty is more substantial. I would have expected contro-factor to penalize controversial comment subthreads too, but apparently it doesn't.

How the ranking is performed

When the server starts up, it loads the top stories from a topstories file. If the file is not present, it uses the ranking algorithm to select the top 180 stories out of the most recent 1000 stories. Stories normally are re-ranked by adjust-rank only when they receive a vote. Interestingly, only the updated item is re-ranked, rather than doing a global re-rank, probably for efficiency reasons. The adjust-rank function also saves the top 180 items to thetopstories file to disk.

There's also a background reranking thread to rerank a random story from the top 50 every 30 seconds (again using adjust-rank). This ensures that stories that are no longer receiving any votes at all get reranked occasionally.

As far as I can tell, the list of ranked stories never gets shrunk (except on restarts), but the number of displayed stories is capped at 210; you will hit this limit if you keep paging through the top stories.

Comparison with Arc2

The old code in arc2 is much simpler:
(= gravity* 1.4 timebase* 120 front-threshold* 1)

(def frontpage-rank (s (o gravity gravity*))
  (/ (- (realscore s) 1)
     (expt (/ (+ (item-age s) timebase*) 60) gravity)))
The main differences are that arc3 has higher gravity, so articles will drop off faster; arc3 has an exponent on the numerator, and arc3 adds the various penalties. This illustrates that the ranking formula is changing and gaining complexity over time.

Conclusions

There may be differences there are between the live Hacker News server and news.arc, so my analysis may not exactly describe what's really happening. I'd assume the code is constantly being modified; comparison with arc2 shows that the ranking formula has become considerably more complex. In addition, news.arc has some YC-specific stuff removed.. Finally, since this is Arc, the ranking constants and even the algorithms can be changed at the REPL while the server is running. Thus, the live ranking algorithm might never actually exist in a source file.

So, what's the secret to getting a high-ranked article? I don't see any magic bullets in the code, just the obvious: get lots of upvotes in a short period of time, avoid penalties (don't post fluff), and don't use sockpuppets.

Friday, June 5, 2009

What's new in arc3?

The long-awaited new update of Arc (arc3) has been made available. I've examined the release to determine what's new in arc3 beyond the mysterious new logo: arc logo.

arc3 is a major change from arc2, with about 3000 lines and most files changed. The following gratuitous pie chart illustrates that the largest number of changes are in the news server (news.arc), with many changes to the fundamental language (ac.scm and arc.arc) as well as the web and application servers (app.arc and srv.arc).
Summary of changes
The news server has been heavily modified with new ranking, support for polls, and improved spam rejection, among other features. A new "how-to-run-news" file explains how to set up your own Hacker News-like server. I will discuss the news server, web server, app server, and html library in more detail in a future article. I will focus this article on the language and main libraries.

If you're looking to try arc3, it can be downloaded here. Arc3 has significant incompatibilities with previous versions that will cause problems if you're not prepared.

Incompatibilities

  • Arc now requires mzscheme 372. Arc will not work with the version 352 required by arc2, or with the newer versions of mzscheme.
  • flush-output is disabled by default on write, disp, and functions that use these. See declare below.
  • The result of accum is no longer in reverse order.
  • news.arc is no longer loaded as part of arc.
  • a.b.c is now ((a b) c), not (a b c)
  • a!b!c is now ((a 'b) 'c), not (a 'b 'c)
  • The set function has been renamed assign, and the assert function has been renamed set.
  • writefile1 removed.
  • flat no longer supports stringstoo option.
  • to-nearest renamed to nearest.
  • random-elt renamed to rand-elt.
  • firstn-that renamed to retrieve.
  • plural renamed to pluralize.

New functions

  • int function coerces to an integer.
  • force-close discards buffered output and closes the descriptor.
  • mvfile moves a file.
  • memory returns the current memory use.
  • sin, cos, tan, and log are finally available.
  • declare is used to set system variables: explicit-flush to control automatic output flushing, and direct-calls for the function call optimization.
  • timedate converts seconds to date.
  • copylist copies a list.
  • defs applies def to name, args, body triples to define multiple functions at once.
  • get provides simpler table / array lookup. get!foo creates a function that takes a table and looks up 'foo. (Actually, it's a generic mechanism to create a function to apply an argument to something.)
  • writefile writes to a temporary file first.
  • letter tests if a character is a letter.
  • sum sums up function results.
  • med returns median of list.
  • minutes-since returns minutes since a time.
  • defcache wrapper around cache to reduce boilerplate.
  • datestring creates "y - m - d" string.
  • or= modifies a variable if the variable is false.
  • out evaluates and prints an expression.
  • fromdisk, diskvar, disktable, and todisk provide a simplified mechanism for loading and storing data.
  • halve splits a string in two on the first separator.
  • positions returns a list of positions where a test is true.
  • lines splits a string into lines.
  • urlencode urlencodes a string.
  • nonblank returns a string if it's nonblank.
  • plural returns a count and pluralized word.

Bug fixes

  • atomic-invoke fixed.
  • socket-accept uses custodians so web servers can forcibly close connections if the clients aren't reading data.
  • coerce of string into int rounds the value as does number to char.
  • list modified to copy the list.
  • safeset now writes warnings to stderr.
  • each fixed to avoid conflict with afn.
  • cut fixed.
  • trues fixed.
  • whiler fixed.
  • rand-string now more random to help fix huge security hole.
  • split fixed for negative positions..
  • memo now memoizes nil results too.
  • readline will terminate on nil.
  • copy fixed.
  • hours-since fixed.
  • date fixed to avoid system operation.
  • intersperse now works with nil.
  • load fixed.
  • posmatch fixed.
  • num fixed to handle negative numbers.

Optimizations

  • complement in function position
  • The <, >, and + operations when applied to two arguments.
  • (foo bar) is optimized to call foo directly if foo is a global bound to a function. This is controlled by declare.

Other changes

  • Debugging has been improved by giving the Arc names of things to Scheme, and turning on line counting.
  • A default value can be provided when accessing a hash table.
  • Currying was largely implemented but not enabled.
In conclusion, arc3 has some nice improvements, bug fixes, and optimizations, but nothing too dramatic in the language itself. If you're using the Arc news server, you'll definitely want arc3 for the security fixes. On the other hand, the unofficial Anarki release of Arc has a lot of interesting features too.

Disclaimers: I have provided my interpretations of the changes in arc3, and I'm sure there are some errors; please provide feedback if you find mistakes. Note that I am not connected with the arc team in any way, and this is not an "official" list of changes. In addition, arc3 may change without warning, so any of the above can be obsoleted. (There are some good reasons for real version numbers, bug tracking, and source code control, but I won't belabor that point.)