Monday, February 28, 2011

IPv6 web serving with Arc or Python: adventures in IPv6

Lego Minifig on a Sheevaplug running IPv6 I've been experimenting with IPv6 on my home network. In part 1, I described how I set up an IPv6 tunnel on Windows 7 and how IPv6 killed my computer. Now, I will describe how I set up a simple (i.e. trivial) IPv6 web server using Arc or Python, on Windows or Linux, which can be accessed at http://ipv6.nlanguages.com.

My IPv6 explorations are roughly driven by Hurricane Electric's IPv6 Certification levels. To get from "Newbie" to "Enthusiast" you have to have an IPv6 web server. I expected I could just use the web server you're viewing right now (provided through Pair), but Pair (like most web providers) doesn't support IPv6. So I figured I'd just set up my own simple web server.

To make things more interesting, I decided to set up the server on my Sheevaplug, a very small plug-based computer running Linux. (See my previous articles about Arc on the Sheevaplug, and Arduino with the Sheevaplug.) The things I describe will work just as well on a standard Linux or Windows box, though.

Setting up the SixXS IPv6 tunnel on the Sheevaplug was much easier than setting it up on Windows; I just followed the directions. One gotcha: if your clock is wrong, aiccu will silently fail - check /var/log/syslog.

IPv6 with the Python web server

Python comes with a simple web server. It is easy to configure the web server for IPv6 once you know how, but it took some effort to figure out how to do it.
import socket
from BaseHTTPServer import HTTPServer
from SimpleHTTPServer import SimpleHTTPRequestHandler

class MyHandler(SimpleHTTPRequestHandler):
  def do_GET(self):
    if self.path == '/ip':
      self.send_response(200)
      self.send_header('Content-type', 'text/html')
      self.end_headers()
      self.wfile.write('Your IP address is %s' % self.client_address[0])
      return
    else:
      return SimpleHTTPRequestHandler.do_GET(self)

class HTTPServerV6(HTTPServer):
  address_family = socket.AF_INET6

def main():
  server = HTTPServerV6(('::', 80), MyHandler)
  server.serve_forever()

if __name__ == '__main__':
  main()
Most of this code is standard Python SimpleHTTPServer code. It implements a handler for the path /ip, which returns the client's IP address. For other paths, SimpleHTTPRequestHandler returns a file from the current directory; this is not particulary secure, but works for a demonstration.

The key change to support IPv6 is subclassing HTTPServer and setting the address_family to IPv6. The other key change is starting the server with the name '::', which is the IPv6 equivalent of '', and binds to no specific address. Similar changes work with related Python classes such as SocketServer.

IPv6 with Arc's web server

My next adventure was to run Arc's web server (which I've documented here) on IPv6. Much to my surprise, Arc's web server worked on Windows with IPv6 without any trouble. You don't have to do anything different to serve on IPv6 and the code and logs handle IPv6 addresses as you'd expect. (Most of the credit for this should go to MzScheme/Racket, which provides the underlying socket implementation.)

I simply start a server thread on port 80, and then define a simple web operation on the home page (represented as || for obscure reasons).

arc> (thread (serve 80))
arc> (defop || req (pr "Welcome to Arc!  Your IP is " (req 'ip)))
Accessing this home page displays a message and the IP address of the client (IPv4 or IPv6 as appropriate).

Unfortunately, when I tried running the same Arc code on the Sheevaplug or another Linux box, it refused to work at all with IPv6. The problem turned out to be misconfiguration in the compilation of Racket (the new name for mzscheme). To get IPv6 working, you can recompile Racket following the instructions here. After recompiling, I was able to get Arc working on my Sheevaplug with IPv6. Eventually this fix will get into the official builds.

Note that implementing web pages in Arc requires much less boilerplate than in Python. This isn't too surprising, given that the primary application of Arc is web serving.

Putting the server's IPv6 address into DNS

Once you have an IPv6 web server, how do you access it? You can access a server with a raw IPv6 address: http://[2001:1938:81:1f8::2]/mypage. (Note that the IPv6 address is enclosed in square brackets, unlike a regular IP address.) However, you'll almost certainly want to access your IPv6 server through a domain name in DNS. To do this, you need to create an AAAA record in your domain zone file.

I had the domain name nlanguages.com available, and wanted to point it at my IPv6 web server. To do this, I went to the DNS Zone File Editor for my nameserver (godaddy.com in my case). I created one AAAA entry for host @ to point to my IPv6 address, and a second AAAA entry for host ipv6 to point to my IPv6 address. The @ provides an entry for my top-level domain (nlanguages.com), and the second provides an entry for ipv6.nlanguages.com. I then tested the DNS entries with a nslookup query, asking for the AAAA record: nslookup -q=aaaa nlanguages.com

The result is that http://ipv6.nlanguages.com will access my IPv6 server (if you have IPv6 access.)

Conclusion

Running a simple IPv6 web server in Python or Arc is straightforward, at least if you don't run into a problem with the underlying language implementation. Python requires just a couple additional lines to support IPv6, while Arc supports IPv6 automatically. Thus, you can easily set up an IPv6 web server, just in time for World IPv6 Day.

Sunday, February 27, 2011

Solving a math problem of Schrödinger with Arc and Python

I recently received an interesting math problem: how many 12-digit numbers can you make from the digits 1 through 5 if each digit must appear at least once. The problem seemed trivial at first, but it turned out to be more interesting than I expected, so I'm writing it up here. I was told the problem is in a book of Schrodinger's, but I don't know any more details of its origin.

(The problem came to me via Aardvark (vark.com), which is a service where you can send questions to random people, and random people send you questions. Aardvark tries to match up questions with people who may know the answer. I find it interesting to see the questions I receive.)

The brute force solution

I didn't see any obvious solution to the problem, so I figured I'd first use brute force. I didn't use the totally brute force solution - enumerating all 12 digit sequences and counting the valid ones - since it would be pretty slow, so I took a slightly less brute force approach.

If 1 appears a times, 2 appears b times, and 3, 4, and 5 appear c, d, and e times, then the total number of possibilities is 12!/(a!b!c!d!e!) - i.e. the multinomial coeffient.

Then we just need to sum across all the different combinations of a through e. a has to be at least 1, and can be at most 8 (in order to leave room for the other 4 digits). Then b has to be at least 1 and at most 9-a, and so on. Finally, e is whatever is left over.

The Python code is straightforward, noting that range(1, 9) counts from 1 to 8:

from math import factorial
total = 0
for a in range(1, 8+1):
  for b in range(1, 9-a+1):
    for c in range(1, 10-a-b+1):
      for d in range(1, 11-a-b-c+1):
        e = 12-a-b-c-d
        total += (factorial(12) / factorial(a) / factorial(b)
          / factorial(c) / factorial(d) / factorial(e))
print total
Or, in Arc:
(def factorial (n)
  (if (<= n 1) 1
     (* n (factorial (- n 1)))))

(let total 0
  (for a 1 8
    (for b 1 (- 9 a)
      (for c 1 (- 10 a b)
        (for d 1 (- 11 a b c)
          (let e (- 12 a b c d)
            (++ total (/ (factorial 12) (factorial a) (factorial b)
              (factorial c) (factorial d) (factorial e))))))))
  total)
Either way, we quickly get the answer 165528000.

The generalized brute force solution

The above solution is somewhat unsatisfying, since if we want to know how many 11 digits numbers can be made from at least one of 1 through 6, for instance, there's no easy way to generalize the code.

We can improve the solution by writing code to generate the vectors of arbitrary digits and length. In Python, we can use yield, which magically gives us the vectors one at a time. The vecs function recursively generates the vectors that sum to <= t. The solve routine adds the last entry in the vector to make the sum exactly equal the length. My multinomial function is perhaps overly fancy, using reduce to compute the factorials of all the denominator terms and multiply them together in one go.

from math import factorial

# Generate vectors of digits of length n with sum <= t, and minimum digit 1.
def vecs(n, t):
  if n <= 0:
    yield []
  else:
    for vec in vecs(n-1, t-1):
      for last in range(1, t - sum(vec) + 1):
        yield vec + [last]

def multinomial(n, ks):
  return factorial(n) / reduce(lambda prod, k: prod * factorial(k), ks, 1)
  
# Find number of sequences of digits 1 to n, of length len,
# with each digit appearing at least once
def solve(n, len):
  total = 0
  for vec0 in vecs(n-1, len-1):
    # Make the vector of length n summing exactly to len
    vec = vec0 + [len - sum(vec0)]
    total += multinomial(len, vec)
  return total
Now, solve(5, 12) solves the original problem, and we can easily solve related problems.

In Arc, doing yield would need some crazy continuation code, so I just accumulate the list of vectors with accum and then process it. The code is otherwise similar to the Python code:

(def vecs (n t)
  (if (<= n 0) '(())
    (accum accfn
      (each vec (vecs (- n 1) (- t 1))
       (for last 1 (- t (reduce + vec))
          (accfn (cons last vec)))))))

(def multinomial (n ks)
  (/ (factorial n) (reduce * (map factorial ks))))

(def solve (n len)
  (let total 0
    (each vec0 (vecs (- n 1) (- len 1))
        (++ total (multinomial len
                    (cons (- len (reduce + vec0)) vec0))))
    total))
This solution will solve our original problem quickly, but in general it's exponential, so solving something like n = 10 and length = 30 gets pretty slow.

The dynamic programming solution

By thinking about the problem a bit more, we can come up with a simpler solution. Let's take all the sequences of 1 through 5 of length 12 with each number appearing at least once, and call this value S(5, 12). How can we recursively find this value? Well, take a sequence and look at the last digit, say 5. Either 5 appears in the first 11 digits, or it doesn't. In the first case, the first 11 digits form a sequence of 1 through 5 of length 11 with each number appearing at least once. In the second case, the first 11 digits form a sequence of 1 through 4 of length 11 with each number appearing at least once. So we have S(5, 11) sequences of the first case, and S(4, 11) of the second case. We assumed the last digit was a 5, but everything works the same if it was a 1, 2, 3, or 4; there are 5 possibilities for the last digit.

The above shows that S(5, 12) = 5 * (S(5, 11) + S(4, 11)). In general, we have the nice recurrence S(n, len) = n * (S(n, len-1) + S(n-1, len-1)). Also, if you only have a single digit, there's only one solution, so S(1, len) = 1. And if you have more digits than length to put them in, there are clearly no solutions, so S(n, len) = 0 if n > len. We can code this up in a nice recursive solution with memoization:

memo = {}
def solve1(digits, length):
  if memo.has_key((digits, length)):
    return memo[(digits, length)]
  if digits > length:
    result = 0
  elif digits == 1:
    result = 1
  else:
    result = digits * (solve1(digits-1, length-1) + solve1(digits, length-1))
  memo[(digits, length)] = result
  return result
An interesting thing about this solution is it breaks down each function call into two simpler function calls. Each of these turns into two simpler function calls, and so forth, until it reaches the base case. This results in an exponential number of function calls. This isn't too bad for solving the original problem of length 12, but the run time rapidly gets way too slow.

Note that the actual number of different function evaluations is pretty small, but the same ones keep getting evaluated over and over exponentially often. The traditional way of fixing this is to use dynamic programming. In this approach, the structure of the algorithm is examined and each value is calculated just one, using previously-calculated values. In our case, we can evaluate S(1, 0), S(2, 0), S(3, 0), and so on. Then evaluate S(2, 1), S(2, 2), S(2, 3) using those values. Then evaluate S(3, 2), S(3, 3), and so on. Note that each evaluation only uses values that have already been calculated. For an introduction to dynamic programming, see Dynamic Programming Zoo Tour.

Rather than carefully looping over the sub-problems in the right order to implement a dynamic programming solution, it is much easier to just use memoization. The trick here is once a subproblem is solved, store the answer. If we need the answer again, just use the stored answer rather than recomputing it. Memoization is a huge performance win. Without memoization solve1(10, 30) takes about 15 seconds, and with it, the solution is almost immediate. (solve1 10 35) takes about 80 seconds.

The Arc implementation is nice, because defmemo creates a function that is automatically memoized.

(defmemo solve1 (digits length)
  (if (> digits length) 0
      (is digits 1) 1
      (* digits (+ (solve1 (- digits 1) (- length 1))
                   (solve1 digits (- length 1))))))

A faster dynamic programming solution

When trying to count the number of things satisfying a condition, it's often easier to figure out how many don't satisfy the condition. We can apply that strategy to this problem. The total number of 12-digit sequences made from 1 through 5 is obviously 5^12. We can just subtract the number of sequences that are missing one or more digits, and we get the answer we want.

How many sequences are missing 1 digit? S(4, 12) is the number of sequences containing at least one of 1 through 4, and by definition missing the digit 5. Of course, we need to consider sequences missing 4, 3, 2, or 1 as well. Likewise, there are S(4, 12) of each of these, so 5 * S(4, 12) sequences missing exactly one digit.

Next, sequences missing exactly 2 of the digits 1 through 5. S(3, 12) is the number of sequences containing at least one of 1 through 3, so they are missing 4 and 5. But we must also consider sequences missing 1 and 2, 1 and 3, and so on. There are 5 choose 2 pairs of digits that can be missing. Thus in total, (5 choose 2) * S(3, 12) sequences are missing exactly 2 of the digits.

Likewise, the number of sequences missing exactly 3 digits is (5 choose 3) * S(2, 12). The number of sequences missing exactly 4 digits is (5 choose 4) * S(1, 12). And obviously there are no sequences missing 5 digits.

Thus, we get the recurrence S(5, 12) = 5^12 - 5*S(4, 12) - 10*S(3, 12) - 10*S(2, 12) - 5*S(1, 12)

In general, we get the recurrence:

The same dynamic programming techniques and memoization can be applied to this. Note that the subproblems all have the same length, so there are many fewer subproblems than in the previous case. That is, instead of a 2-dimensional grid of subproblems, the subproblems are all in a single row.

In Python, the solution is:

memo = {}
def solve2(digits, length):
  if memo.has_key((digits, length)):
    return memo[(digits, length)]
  if digits > length:
    result = 0
  elif digits == 1:
    result = 1
  else:
    result = digits**length
    for i in range(1, digits):
      result -= sol2(i, length)*choose(digits, i)
  memo[(digits, length)] = result
  return result
And in Arc, the solution is similar. Instead of looping, I'm using a map and reduce just for variety. That is, the square brackets define a function to compute the choose and solve term. This is mapped over the range 1 to digits-1. Then the results are summed with reduce. As before, defmemo is used to memoize the results.
(def choose (m n)
  (/ (factorial m) (factorial n) (factorial (- m n))))

(defmemo solve2 (digits length)
  (if (> digits length) 0
      (is digits 1) 1
      (- (expt digits length)
         (reduce +
           (map [* (choose digits _) (solve2 _ length)]
             (range 1 (- digits 1)))))))

The mathematical exponential generating function solution

At this point, I dug out my old combinatorics textbook to figure out how to solve this problem. By using exponential generating functions and a bunch of algebra, we obtain a fairly tidy answer to the original problem:

Since the mathematics required to obtain this answer is somewhat complicated, I've written it up separately in Part II.

Discussion

This problem turned out to be a lot more involved than I thought. It gave me the opportunity to try out some new things in Python and Arc, and make use of my fading knowledge of combinatorics and generating functions.

The Python and Arc code was pretty similar. Python's yield construct was convenient. Arc's automatic memoization was also handy. On the whole, both languages were about equally powerful in solving these problems.

Acknowledgement: equations created through CodeCogs' equation editor

Friday, February 25, 2011

IPv6 killed my computer: Adventures in IPv6

Lego Minifig examining IPv6 configuration You may have heard that the Internet is running out of IP addresses and disaster will strike unless everyone switches over to IPv6. This may be overhyped, since civilization continues even though the last block of addresses was given out on February 3, 2011.

However, with World IPv6 Day fast approaching (June 8), I figured this was a good time to try out IPv6 on my home network. I was also motivated by Hurricane Electric's IPv6 Certification (which is a "fun and educational" certification, not a "study thick books of trivia" certification).

This article describes my efforts to run IPv6 on my Windows 7 64-bit machine, and future articles will describe other IPv6 adventures.

IPv6 tunnels

If your ISP provides IPv6, then using IPv6 is easy. Comcast, for instance, is rolling out some IPv6 support. Unfortunately, AT&T (which I have) doesn't provide IPv6 and is lagging other providers in their future IPv6 plans.

The alternative is an IPv6 tunnel, which lets your computer connect to IPv6 sites over a normal IP connection. The number of different IPv6 tunnel mechanisms is very confusing: 6to4, Teredo, AYIYA, and 6in4, to name a few.

A Hacker News thread recommended Hurricane Electric's free tunnel, so I tried that. Unfortunately that tunnel requires IP protocol 41, and I quickly found that my 2Wire AT&T router inconveniently blocks protocol 41.

SixXS IPv6 tunnel on Windows 7 64-bit

Next, I tried SixXS's free IPv6 tunnel service, which doesn't require protocol 41 and promises IPv6 in 10 easy steps. This went well until Step 5, and then things got complicated.

This section gets into the gory details of setting up the tunnel; it's only to help people who are actually setting up a tunnel, and can be ignored by most readers :-)

Eventually I figured that for Windows 7 64-bit, you should ignore most of the 10 easy steps. Instead, get the Tap driver by installing OpenVPN according to the Vista64 instructions. Then go to the Vista instructions and follow all the mysterious steps including the obscure Vista parameter setup commands. Finally, use the AICCU command line, not the GUI.

If you encounter these errors, you're probably using the wrong Tap driver:

tapinstall.exe failed.
[warning] Error opening registry key: SYSTEM\CurrentControlSet\Control\Class\{4D
36E972-E325-11CE-BFC1-08002BE10318}\Properties (t1)
[warning] Found = 0, Count = 0
[error] [tun-start] TAP-Win32 Adapter not configured properly...
Finally, I was able to get my tunnel running with aiccu-2008-03-15-windows-console.exe start.

Once I had the tunnel running, my computer had an IPv6 address, it could talk to other IPv6 addresses, and my computer could be accessed via IPv6.

So now that I have IPv6 running on my computer, what can I do with it? Well, that's one of the problems. There isn't a whole lot you can do with IPv6 that you couldn't do before. You can connect to ipv6.google.com. You can see the animated kame turtle instead of the static turtle. You can see your IPv6 address at whatismyipv6address.com. But there's no IPv6 killer app.

Overall, the immediate benefit of running IPv6 is pretty minimal, which I think is a big barrier to adoption. As I discussed in a surprisingly popular blog post, the chance of adoption of a new technology depends on the ratio of perceived crisis to perceived pain of adoption. Given a high level of pain to run IPv6 and a very low benefit - any potential crisis is really someone else's crisis - I expect the rate of adoption to be very slow until people are forced to switch.

Getting IPv6 running reminds me a lot of TCP/IP circa 1992, when using TCP/IP required a lot of configuration, tunneling (over a serial line) with SLIP, and using immature software stacks such as Winsock. And most of what you wanted to do back then didn't even need TCP/IP. It will be interesting to see how long it takes IPv6 to get to the point where it just works and nobody needs to think about it. (None of this is meant as a criticism of SixXS or Hurricane Electric, by the way, who are providing useful free services. AT&T on the other hand...)

Running IPv6 did let me gain another level in the Hurricane Electric certification, though.

IPv6 kills my computer

The day after setting up IPv6, I turned on my computer. The power light came on, and that was all, no Windows, no BIOS screen, and not even a startup beep. I tried a different monitor with no luck. I opened up the computer, re-seated the memory modules, and poked at anything else that might be loose. I powered up my system again, and everything worked, fortunately enough.

The pedant might insist that IPv6 didn't kill my computer since any IPv6 issues wouldn't show up until after the OS boots, and IPv6 couldn't possibly have stopped my computer from even booting to the BIOS. However, I remain suspicious. Just a coincidence that I set up IPv6 and my computer dies? I don't think so. What about the rest of you? Anyone's computer get killed by IPv6?

Sheevaplug as IPv6 router

A few weeks later, I decided to try using my Sheevaplug as an IPv6 router for my home network, so all my machines could access IPv6 through my Sixxs tunnel. Although I used a Sheevaplug, these steps should work for other Linux systems too. The following are the gory details in the hope that they may be helpful to someone.

I started with the page Sixxs Sheevaplug, which describes how to set up the Sheevaplug with IPv6. Unfortunately, it didn't quite work. It suggests using a program called ufw, which stands for Uncomplicated FireWall. Unfortunately, I found it to be anything but uncomplicated. The first problem was that the default ufw version 0.29 didn't work for me - it failed to create the necessary chains with the inconvenient side-effect of blocking all ports; version 0.30 avoids this problem. I ended up using ip6tables directly, which was much easier - see instructions here.

The steps I took to setting up the Sheevaplug:

  • Request a subnet from Sixxs. In the examples below, I use: 2001:1938:2a9::/48. Important: use your subnet, not mine.
  • aiccu start to start the tunnel
  • sysctl -w net.ipv6.conf.all.forwarding=1 to enable IPv6 routing.
  • Configure /etc/radvd.conf as follows:
    interface eth0
    {
     AdvSendAdvert on;
     AdvLinkMTU 1280;
     prefix 2001:1938:2a9::/64
     {
                   AdvOnLink on;
                   AdvAutonomous on;
     };
    };
    
    Note that your subnet is /48, but you need to use /64 in the file.
  • ip -6 ro add 2001:1938:2a9::/48 dev lo
  • ip -6 add add 2001:1938:2a9::1/64 dev eth0
    I'm not sure why adding that address is necessary, but routing just doesn't work without it.
  • service radvd start
    radvd is the Router Advertisement Daemon, which lets other computers on your network know about your IPv6 tunnel.
At this point, you should be able to start up a Windows box, and it will automatically get an IPv6 address on the subnet. You should then be able to access ipv6.google.com or ip6.me.

Debugging: if the IPv6 subnet doesn't work, many things can be wrong. First, make sure your host machine can access IPv6. Assuming your client machine runs Windows, you can do "ping -6 ipv6.google.com" or "tracert -6 ipv6.google.com". With "ipconfig /all" you should see a "Temporary IPv6 Address" on the subnet, and a "Default Gateway" pointing at your Sheevaplug. If tracert gets one step and then hangs, try disabling your firewall on the Sheevaplug. If these tips don't work, then I guess you'll end up spending hours with tcpdump like I did :-(

Also read Part 2: IPv6 Web Serving with Arc or Python.