Showing posts with label reverse-engineering. Show all posts
Showing posts with label reverse-engineering. Show all posts

Understanding and repairing the power supply from a 1969 analog computer

We recently started restoring a vintage1 analog computer. Unlike a digital computer that represents numbers with discrete binary values, an analog computer performs computations using physical, continuously changeable values such as voltages. Since the accuracy of the results depends on the accuracy of these voltages, a precision power supply is critical in an analog computer. This blog post discusses how this computer's power supply works, and how we fixed a problem with it. This is the second post in the series; the first post discussed the precision op amps in the computer.

The Model 240 analog computer from Simulators Inc. was a "precision general purpose analog computer" for the desk top, with up to 24 op amps. (This one has 20 op amps.)

The Model 240 analog computer from Simulators Inc. was a "precision general purpose analog computer" for the desk top, with up to 24 op amps. (This one has 20 op amps.)

Analog computers used to be popular for fast scientific computation, especially differential equations, but pretty much died out in the 1970s as digital computers became more powerful. They were typically programmed by plugging cables into a patch panel, yielding a spaghetti-like tangle of wires. In the photo above, the colorful patch panel is in the middle. Above the patch panel, 18 potentiometers set voltage levels to input different parameters. A smaller patch panel for the digital logic is in the upper right.

The power supply

The computer uses two reference voltages: +10 V and -10 V, which the power supply must generate with high accuracy. (Older, tube-based analog computers typically used +/- 100 V references.) The power supply also provides regulated +/- 15 V to power the op amps, power for the various relays in the computer, and power for the lamps.

The power supply in the bottom section of the analog computer. The transformer/rectifier section is on the left and the regulator card cage is on the right. Wiring harnesses on top of the power supply connect it to the rest of the computer.

The power supply in the bottom section of the analog computer. The transformer/rectifier section is on the left and the regulator card cage is on the right. Wiring harnesses on top of the power supply connect it to the rest of the computer.

The photo above shows the power supply in the lower back section of the analog computer. The power supply is more complex than I expected. The section on the left converts line-voltage AC into low-voltage AC and DC. These outputs go to the card cage on the right, which has 8 circuit boards that regulate the voltages. The complex wiring harnesses on top of the power supply provide power to the five analog computation modules above the power supply as well as the rest of the computer.

With a vintage computer, it's important to make sure the power supply is working properly, since if it is generating the wrong voltages, the results could be catastrophic. So we proceed methodically, first checking the components in the power supply, then testing the power supply outputs while disconnected from the rest of the computer, and finally powering up the whole computer.

The transformer / rectifier section

We started by removing the power supply from the computer, and disconnecting the two halves. The left half of the power supply (below) produces four unregulated DC outputs and a low-voltage AC output. In contains two large power transformers, four large filter capacitors, stud rectifiers (upper back), smaller diodes (front right), and fuses. This is a large and very heavy module because of the transformers.2 The smaller transformer powers the lamps and relays, while the larger transformer powers the +15 and -15 volt supplies as well as the oscillator. Presumably, using separate transformers prevents noise and fluctuations from the lamps and relays from affecting the precision reference supplies.

This section of the power supply reduces the line-voltage AC to low-voltage DC and AC.

This section of the power supply reduces the line-voltage AC to low-voltage DC and AC.

One concern with old power supplies is that the electrolytic capacitors can dry out and fail over time. (These capacitors are the large cylinders above.) We measured the capacitance and resistance of the large capacitors (using Marc's vintage HP LCR meter) and they tested okay. We also checked the input resistance of the power supply to make sure there weren't any obvious shorts; everything seemed fine.

We removed all the cards from the card cage, cautiously plugged in the power supply, and... nothing at all happened. For some reason, no AC voltage was getting to the power supply. The fuse was an obvious suspect, but it was fine. Carl asked about the power switch on the control panel, and we figured out that the switch was connected to the power supply via the socket labeled "CP" (below). We added a jumper, powered up the supply, and this time found the expected DC voltages from the module.

The side of the power supply has three twist-lock AC sockets labeled "FAN", "DVM-LOGIC", and "CP" (control panel). The "DVM-LOGIC" socket powers a 5-volt supply for the digital logic, which we still need to repair.

The side of the power supply has three twist-lock AC sockets labeled "FAN", "DVM-LOGIC", and "CP" (control panel). The "DVM-LOGIC" socket powers a 5-volt supply for the digital logic, which we still need to repair.

The regulator cards

Next, we tested the power supply's various cards individually. The power supply has four regulator cards generating "lamp voltage", "+15", "-15", and "relay voltage". The purpose of a regulator card is to take an unregulated DC voltage from the transformer module and reduce it to the desired output voltage.

We hooked up the regulator cards using a bench power supply as input to make sure they were working properly. We tweaked the potentiometer on the +15 V regulator to get exactly 15 V output. The -15 V regulator seemed temperamental and the voltage jumped around when we adjusted it. I suspected a dirty potentiometer, but it settled down to a stable output (narrator: this is foreshadowing). We don't know what the lamp and relay voltages are supposed to be, and they're not critical, so we left those boards unadjusted.

One of the voltage regulator cards. A large power transistor is attached to the heat sink.

One of the voltage regulator cards. A large power transistor is attached to the heat sink.

The photo above shows one of the regulator cards; you might think it has a lot of components just to regulate a voltage. The first voltage regulator chip was created in 1966, so this computer uses a linear regulator built from individual components instead. The large metal transistor on the heat sink is the heart of the voltage regulator; it acts kind of like a variable resistor to control the output. The rest of the components provide the control signal to this transistor to produce the desired output. A Zener diode (yellow and green stripes on the right) acts as the voltage reference, and the output is compared to this reference. A smaller transistor generates the control signal for the power transistors. In the lower right, a multi-turn potentiometer is used to adjust the voltage output. The larger capacitors (metal cylinders) filter the voltage, while the smaller capacitors ensure stability. Most power supply of just a few years later would replace all of these components (except the filter capacitors) with a voltage regulator IC.

The chopper oscillator

The precision op amps in the analog computer use a chopper circuit for better DC performance, and the chopper requires 400 Hertz pulses. These pulses are generated by the oscillator board in the power supply (called the gate for some reason). We powered up the board separately to test it, and found it produced 370 Hz, which seemed close enough.

The gate card provides 400 Hertz oscillations to control the op amp choppers.

The gate card provides 400 Hertz oscillations to control the op amp choppers.

The circuitry of this card is somewhat bizarre, and not what I was expecting on an oscillator card. The left side has three large capacitors and three diodes, powered by low-voltage AC from the transformer. After puzzling over this for a bit, I determined it was a full-wave voltage doubler, producing DC at twice the voltage of the AC input. I assume that the chopper pulses needed to be higher voltage than the computer's +15 volt supply, so they used this voltage doubler to get enough voltage swing.

The oscillator itself (right side of the card), uses one NPN transistor as an oscillator, and another NPN transistor as a buffer. It took me a while to figure out how a single-transistor oscillator works. It turns out to be a phase-shift oscillator; the three white capacitors in the middle of the board shift the signal 180°; inverting it causes oscillation.

The op amps

Calculations in the analog computer are referenced to +10 volt and -10 volt reference voltages, so these voltages need to be very accurate. The regulator cards produce fairly stable voltages, but not good enough. (While testing the regulator cards, I noticed that the output voltage shifted noticeably as I changed the input voltage.) To achieve this accuracy, the reference voltages are generated by op amp circuits, built from two op amp boards and a feedback network card.

An op amp card. This card has a single input on the right. It uses a round metal-can op amp IC, but the chopper circuitry improves performance.

An op amp card. This card has a single input on the right. It uses a round metal-can op amp IC, but the chopper circuitry improves performance.

Somewhat surprisingly, the op amp cards used in the power supply are exactly the same as the precision op amps used in the analog computer itself. Back in 1969, op amp integrated circuits weren't accurate enough for the analog computer, so the designers of this analog computer combined an op amp chip with a chopper circuit and many other parts to create a high-performance op ap card. I described the op amp cards in detail in the first post, so I won't go into more detail here.

The network card

The network card has two jobs. First, it has precision resistors to create the feedback networks for the power supply op amps. Second, it has two power transistors (circular metal components below) that buffer the reference voltages from the op amp for use by the rest of the computer.

The network card. The two connectors on the left are attached to the op amp inputs.

The network card. The two connectors on the left are attached to the op amp inputs.

One of the problems with an analog computer is that the results are only as accurate as the components. In other words, if the 10 volt reference is off by 1%, your answers will be off by 1%. The result is that analog computers need expensive, high-precision resistors. (In contrast, the voltages in a digital computer can drift a lot, as long as a 0 and a 1 can be distinguished. This is one reason why digital computers replaced analog computers.) Typical resistors have a tolerance of 20%, which means the resistance can be up to 20% different from the indicated value. More expensive resistors have tolerance of 10%, 5%, or even 1%. But the resistors on this board have tolerance of 0.01%! (These resistors are the pink cylinders.) The two large resistors on the left are 15Ω "Brown Devil" power resistors. They protect the voltage outputs in case someone plugs the wrong wire into the patch panel and shorts an output, which would be easy to do.

The network card receives an adjustment voltage from the control panel, and also has multi-turn potentiometers on the right for adjustment (like the regulator cards). The green connectors are used to connect the network card to the op amp cards. (The op amps have a separate connector for the input, to reduce electrical noise.)

Powering it up and fixing a problem

Finally, we put all the power supply boards back in the cabinet, put the power supply back in the computer, and powered up the chassis (but not the analog computer modules). Some of the indicator lights on the control panel lit up and the +15 V supply showed up on the meter. However, the -15 V supply wasn't giving any voltage, and the op amp overload lights were illuminated on the front panel, and the reference voltages from the op amps weren't there. The bad -15 V supply looked like the first thing to investigate, since without it, the op amp boards wouldn't work.

I removed the working +15 regulator and failing -15 regulator from the card cage and tested them on the bench. Conveniently, both boards are identical, so I could easily compare signals on the two boards. (Modern circuits typically use special regulators for negative voltage outputs, but this power supply used the same regulator for both.) The output transistor on the bad board wasn't getting any control signal on its base, so it wasn't producing any output. Tracing the signals back, I found the transistor generating this signal wasn't getting any voltage. This transistor was powered directly from the connector, so why wasn't any voltage getting to the transistor?

A regulator board was failing due to loose screws (red arrows). The circuit was powered through the thick bottom PCB trace and then
current passed through the heat sink from the lower screw to the upper screw.

A regulator board was failing due to loose screws (red arrows). The circuit was powered through the thick bottom PCB trace and then current passed through the heat sink from the lower screw to the upper screw.

I studied the printed circuit board and noticed that there wasn't a PCB trace between the transistor and the connector! Instead, part of the current path was through the heat sink. The heat sink was screwed down to the PCB, making a connection between the two red arrows above. After I tightened all the screws, the board worked fine.

The analog computer with the plugboard and sides removed to show the internal circuitry. The power supply is in the lower back section. One module has been removed and placed in front of the computer.

The analog computer with the plugboard and sides removed to show the internal circuitry. The power supply is in the lower back section. One module has been removed and placed in front of the computer.

We put the boards back in, powered up the chassis, and this time the voltages all seemed to be correct. The op amp overload warning lights remained off; the warning light went on before because the op amps couldn't operate with one voltage missing. The next step is to power up the analog circuitry modules and test them. We also need to repair the separate 5-volt power supply used by the digital logic since we found some bad capacitors that will need to be replaced. So those are tasks for the next sessions.

Follow me on Twitter @kenshirriff to stay informed of future articles. I also have an RSS feed.

Notes and references

  1. The computer's integrated circuits have 1968 and 1969 date codes on them, so I think the computer was manufactured in 1969. 

  2. Most modern power supplies are switching power supplies, so they are much smaller and lighter than linear power supplies like the one in the analog computer. (Your laptop charger, for instance, is a switching power supply.) Back in this era, switching power supplies were fairly exotic. However, linear power supplies are still sometimes used since they have less noise than switching power supplies. 

Risky line printer music on a vintage IBM mainframe

At the Computer History Museum, we recently obtained card decks for a 50-year-old computer music program. Back then, most computers didn't have sound cards but creative programmers found a way to generate music by using the line printer.2 We were a bit concerned that the program might destroy the printer, but we took the risk of running it on the vintage IBM 1401 mainframe. As you might expect, music from a line printer sounds pretty bad, but the tunes are recognizable and the printer survived unscathed.1

The IBM 1401 business computer was announced in 1959 and went on to become the best-selling computer of the mid-1960s, with more than 10,000 systems in use. A key selling point of the IBM 1401 was its high-speed line printer, the IBM 1403. By rapidly rotating a chain of characters (below), the printer produced output at high speed (10 lines per second) with excellent print quality, said to be the best printing until laser printers were introduced in the 1970s.

The type chain from the IBM 1401's printer. The chain has 48 different characters, repeated five times.

The type chain from the IBM 1401's printer. The chain has 48 different characters, repeated five times.

Line printers produced a lot of noise, but programmers soon discovered that by printing specific lines of characters, the noise had specific frequencies. It was possible to play a tune by printing the right lines for each note. Around 1970, computer scientist Ron Mak coded up some songs on punch cards using an earlier music program. He recently came across his old programs and gave us the opportunity to try them out.

How the line printer works

To print characters, the printer uses a chain of type slugs that rotates at high speed in front of the paper, with an inked ribbon between the paper and the chain. The printer produces 132-column output so each of the 132 print columns has a hammer and an electromagnet. At the right moment when the desired character passes the hammer, an electromagnet drives the hammer against the back of the paper, causing the paper and ribbon to hit the type slug, printing the character.

Printing mechanism of the IBM 1401 line printer. From 1401 Reference Manual, p11.

Printing mechanism of the IBM 1401 line printer. From 1401 Reference Manual, p11.

The printer required careful timing to make this process work. The chain spins around rapidly at 7.5 feet per second and every 11.1 µs, a print slug lines up with a hammer. The control circuitry has just enough time to read that position's character from core memory, compare it to the character under the hammer, and fire the hammer if there is a match. After 132 time intervals, each hammer has had an opportunity to print one character; this is called a "scan".3 Since there are 48 characters in the character set (no lower case), this process must be repeated 48 times so all the characters can be printed in any column.54 During each scan, the chain moves by just a single character's width6.

A hammer bank in the IBM 1403 printer. At the bottom, the impact points for the 132 hammers (one for each column) are visible. The coils and wiring for 1/4 (33) of the 132 hammers are visible at the top.

A hammer bank in the IBM 1403 printer. At the bottom, the impact points for the 132 hammers (one for each column) are visible. The coils and wiring for 1/4 (33) of the 132 hammers are visible at the top.

The photo below is a closeup of a hammer. The electromagnet coil and wires are on the upper left. We had to replace this hammer after the coil overheated and smoked; you can see a blackened region on the coil. (This problem happened a while ago due to a bad circuit board, and is unrelated to the printer music.)

An individual hammer from the IBM 1403 printer.

An individual hammer from the IBM 1403 printer.

Generating music

Now that you see how the printer works, with a hammer potentially firing every 11.1 µs, the strategy to make music should be clearer. By printing carefully-selected text, you can control the times at which hammers fire. By firing hammers at specific intervals, you can create a desired frequency. An A note (440 Hz), for instance, can be produced by printing a line of text that fires the hammers every 1/440th of a second. This can be done by printing a 1 in column 1 (the first hammer to be aligned), followed by a # in column 14 on the next scan, a comma in column 30 the scan after that, and so forth. (There's no real pattern to this; it's just how things line up.3) The full line printed to generate this note is below.7 (It may be a bit surprising that with a character set of just 48 characters, the printer includes unusual characters such as ⌑ and ‡.)

1    ⌑Y     C#    0   Q     3,    ‡F      R T   4 -   ,   I     U     $7        M   V .   *        9N     ⌑        ZE     @     P3

The diagram below shows the timing of the hammers, illustrating the uniform 440 Hz frequency produced by the above print line. The diagram has time on the X-axis, with a red bar when each character is printed. The red bars are spaced evenly with a spacing of 1/440th of a second, generating a 440 Hz note. Each bar is labeled with the associated character and column on the page. Note that characters are printed in a different order from how they appear on the line. There's no simple relationship between the arrangement of characters on the line and their time sequence. There are a few gray lines where you'd expect a hammer to fire, but no character is printed. These correspond to times when the chain is syncing up and can't print.

Timing diagram for the note A4. Each red line indicates a printed character.

Timing diagram for the note A4. Each red line indicates a printed character.

By printing a different line, a different note can be produced. Below is the note B5, which is 987 Hz (over an octave higher). As you'd expect, the higher-frequency note has more characters.

1 @EQ4S J   8. N D ‡  S H 7 AM  Y#2   G-  KV . 0 D  Q S J 7&   N D ‡/4  H   AMX0  2 Q G J   W. 0 DP‡  S   7&AM     ‡/4G   *  MX0 D 3

Timing diagram for the note B5. Each red line indicates a printed character.

Timing diagram for the note B5. Each red line indicates a printed character.

The printed line for the low note C♯3 (138 Hz) is below. I was puzzled at first why this line (and the other C♯ notes) had all the characters clustered together, rather than scattered across the line like other notes. It turns out that 138 Hz just happens to correspond to hammers that are consecutive on the line. Even though the characters are clumped together on the line, they are spread out uniformly in time.

16#UZKP*E&38                                                                                                                      

Timing diagram for the note C♯3.

Timing diagram for the note C♯3.

Why chain music might be risky

We were concerned that the print chain music program might damage the printer. There are plenty of stories of people destroying line printers by printing a line that fires all the hammers at once. I think these are mostly urban legends (among other things, the hammers on the 1403 fire one at a time, not all at once). Nonetheless, we were somewhat concerned about chain music overstressing the print chain and breaking it. The photo below shows a print chain that broke during normal use; you can see the broken wires and the individual type slugs.

A broken 1403 print chain. It broke during normal use, not from line printer music. (Photo from TechWorks.)

A broken 1403 print chain. It broke during normal use, not from line printer music. (Photo from TechWorks.)

Print chain were manufactured by winding a thin wire into a band, with type blocks attached. Up until recently, print chains were rare and irreplaceable; if the wire broke, there was no way to fix it. However, the Techworks! museum in Binghamton, NY recently developed a technique to rebuild print chains. Because of this, Frank King (our IBM 1401 guru) approved the use of a rebuilt chain for line printer music, with some trepidation. Fortunately, the chain survived the music generation just fine. (After studying the music program carefully, I think it puts less stress on the chain than the average program, unless there's some really unfortunate resonance.)

Closeup of the type chain (upside down) for an IBM 1403 line printer.

Closeup of the type chain (upside down) for an IBM 1403 line printer.

The program

Card decks to play a variety of songs, courtesy of Ron Mak.

Card decks to play a variety of songs, courtesy of Ron Mak.

The source code to the program is long gone, so I disassembled the machine code on the cards to determine how the program works (listing here). First, it reads "frequency cards" that define what line to print for each note. It builds up an array of print lines in memory, along with a table of note names and addresses of the print lines. Next, the program reads the notes of the song, one note per card. (As you can see above, some songs require many cards.) For each note, it looks up the appropriate print line in the note table. Based on the note's duration, it prints the line the appropriate number of times (using a jump table, not a loop). A rest is implemented by looping 200 to 2000 times to provide silence for the appropriate delay.

A closeup of cards with the machine code for the music program. For some reason, the contents of each card are printed twice on the card.

A closeup of cards with the machine code for the music program. For some reason, the contents of each card are printed twice on the card.

Machine code for the 1401 is very different from modern machines. One difference is that self-modifying code was very common, while nowadays it is usually frowned upon. For instance, the table of print lines is created by actually modifying load instructions, replacing the address field in the instruction. Even subroutine returns use self-modifying code, putting the return address into a jump instruction at the end of the subroutine. To handle a note, the program generated on-the-fly a sequence of three instructions to load the print line, jump to the print code, and then jump back to the main loop. Self-modifying code made it more challenging for me to understand the program since the disassembled code isn't what actually gets run.

The program cards are followed by frequency cards, defining the print line for each note. The code supported up to 20 different notes, so the frequency cards were selected according to the song's need. Each 132-column line is split across two cards, with the first card defining the right half of the line. Each card is punched at the right with the note name and frequency.

Frequency cards. Each pair of cards defines the 132-character print line that generates the specified note. At the right, the card is punched with the note name (e.g. E4) and frequency (e.g. 329 Hz). The notation F/C labels the first card in the deck.

Frequency cards. Each pair of cards defines the 132-character print line that generates the specified note. At the right, the card is punched with the note name (e.g. E4) and frequency (e.g. 329 Hz). The notation F/C labels the first card in the deck.

The final set of cards creates the tune, with one card per note (or rest). Each card is punched with a note and duration. A long song may use hundreds of cards. It is straightforward to create a new song, just a matter of punching the tune onto cards. The notes are specified in Standard Pitch Notation with the note name followed by an octave number. For example, C4 is middle C. Since only some print chains had the # symbol, sharps were indicated with an "S", e.g. CS for C♯.

Closeup of the cards for the song Silver Bells. Each card has the note and octave, followed by its duration. The first card is (confusingly) "END", indicating the end of the frequency cards.

Closeup of the cards for the song Silver Bells. Each card has the note and octave, followed by its duration. The first card is (confusingly) "END", indicating the end of the frequency cards.

Conclusion

We succeeded in generating music on the IBM 1403 printer, running programs that hadn't been run in almost 50 years. Although the music quality isn't very good, we were happy that the printer didn't self-destruct. Ron Mak last ran these programs in 1970; this link has some songs from then, such as Raindrops keep fallin' on my head. The video below shows an excerpt of La Marseillaise; in this video you can see each line being printed.

I announce my latest blog posts on Twitter, so follow me at @kenshirriff for future articles. I also have an RSS feed. The Computer History Museum in Mountain View runs demonstrations of the IBM 1401 on Wednesdays and Saturdays so if you're in the area you should definitely check it out (schedule). Thanks to Ron Mak for supplying the vintage programs, Carl Claunch for reading the cards, and the 1401 restoration team for running the program, in particular, Robert Garner and Frank King.

Notes and references

  1. In case you're wondering why nothing shows up on the printer in the video, the printer's line feed was disabled to save paper. You can see the lines being printed in the video at the end of the article. 

  2. Programmers also used the 1401 to generate music on an AM radio via RF interference. Running the right instruction sequence generated a particular tone. We hope to try this in the future. 

  3. I've created an animation of the print chain here that shows exactly how it works; it's more complex than you'd expect. 

  4. The print chain and hammer alignment scheme may seem excessively complicated. But what makes it clever is that the 11.1 µs between hammer times is just enough time to read a character from core memory to see if it matches the chain slug under the hammer, and thus should be printed. In other words, the system is designed to match the mechanical speed of the chain to the electronic speed of core memory. 

  5. The printer's operation is explained in detail in the Field Engineering Manual of Instruction. The section starting on page 37 discusses the chain timing in detail. Each scan is broken down into 3 subscans, but I won't get into that here. Note that while a line is 132 characters, printing a line takes about 150 time intervals (1665 µs); the extra time is used to sync the chain position. (This explains why some notes have "missing" characters in the timing plots.) 

  6. The chain only moves 1/1000 of an inch during the 11.1 µs time., but that is enough to line up the next character and hammer. The trick that makes this work is that the hammer spacing and the chain spacing are very slightly different (a vernier mechanism), so a tiny chain movement causes a much larger change in the alignment position. 

  7. I've archived the code and full set of frequency cards here for future reference. 

Apollo Guidance Computer: Dipstiks and reverse engineering the core rope simulator

Onboard the Apollo spacecraft, the revolutionary Apollo Guidance Computer helped navigate to the Moon and land on its surface. The AGC's software was physically woven into permanent storage called core rope memory. We1 are restoring an Apollo Guidance Computer (below), which is missing the core ropes, but instead has core rope simulator boxes. These boxes were used during development and ground testing to avoid constantly manufacturing ropes. The core rope simulator is undocumented, so I reverse-engineered it, built an interface, and we used the simulator to run programs on our Apollo Guidance Computer. But we ran into some complications along the way.

The Apollo Guidance Computer with the cover removed, showing the wire-wrapped backplane. At the back, rope simulator boxes are visible in the core rope slots. The interface boards at the front are modern.

The Apollo Guidance Computer with the cover removed, showing the wire-wrapped backplane. At the back, rope simulator boxes are visible in the core rope slots. The interface boards at the front are modern.

The AGC's core ropes

The Apollo Guidance Computer held six core rope modules, each storing just 6 kilowords of program information (about 12 kilobytes).2 Core rope modules were a bit like a video game ROM cartridge, holding software in a permanent yet removable format. Programs were hard-wired into core rope by weaving wires through magnetic cores. A wire passed through a core for a 1 bit, while a wire going around a core was a 0 bit. By weaving 192 wires through or around each core, each core stored 192 bits, achieving much higher density than read/write core memory that held 1 bit per core.

Detail of core rope memory wiring from an early (Block I) Apollo Guidance Computer. Photo from Raytheon.

Detail of core rope memory wiring from an early (Block I) Apollo Guidance Computer. Photo from Raytheon.

Manufacturing a core rope was a tedious process that took about 8 weeks and cost $15,000 per module. Skilled women wove the rope by hand, threading a hollow wire-filled needle back and forth through the cores, as shown below3. They had the assistance of an automated system that read the program from a punched tape and positioned an aperture over the matrix of cores. The weaver threaded the needle through the aperture to install the wire in the right location. Once completed, the core rope was mounted in a module along with hundreds of resistors and diodes and encased in epoxy to make it solid for flight. (See my earlier article on core rope for details.)

A woman weaving a core rope memory, wiring software into read-only memory.

A woman weaving a core rope memory, wiring software into read-only memory.

The core rope simulator

Since weaving a core rope was a time-consuming and expensive process, an alternative was required during development and ground testing. In place of the core ropes, NASA used rope simulators4 that allowed the AGC to load data from an external system. Our Apollo Guidance Computer was used for ground testing so it didn't have core ropes but instead had a core rope simulator. The simulator consists of two boxes that plugged into the AGC's core rope slots, each box filling three rope slots. These boxes are visible in the upper-left side of the AGC below, with round military-style connectors for connection to the external computer.

The core rope simulators are installed in the left side of the AGC in place of the real core ropes. Two round connectors on the left allowed the simulators to be connected to an external computer that provided the data.

The core rope simulators are installed in the left side of the AGC in place of the real core ropes. Two round connectors on the left allowed the simulators to be connected to an external computer that provided the data.

Although we have extensive documentation for the Apollo Guidance Computer, I couldn't find any documentation on the simulator boxes. Thus, I had to reverse engineer the boxes by tracing out all the circuitry and then figuring out what the boxes were doing. From the outside, the boxes didn't reveal much. One end of each box has a round MIL-Spec plug for connection to an external system. The other end has three groups of 96 pins that plugged into the AGC. Each group of pins took the place of one core rope module.

Each core rope box communicated with the external system via a round 39-pin connector.
Each box had three sets of 96 pins that plugged into the AGC, replacing three rope modules.

Each core rope box communicated with the external system via a round 39-pin connector. Each box had three sets of 96 pins that plugged into the AGC, replacing three rope modules.

Opening up the boxes showed their unusual construction techniques. Part of the circuitry used high-density cordwood construction which mounted components vertically through holes in a metal block. On either side of the block, the component leads were welded to point-to-point wiring. Other circuitry in the boxes used standard integrated circuits (7400-series TTL). But unlike modern printed circuit boards, the chips were mounted inside plastic units called Dipstiks and wire-wrapped together.

A rope simulator box, partially disassembled. The round external connector is visible at the right, and the pins to connect to the AGC at the left. Analog circuitry with cordwood construction is center-left. To the right, several Dipstik modules are visible, white with rows of pins.

A rope simulator box, partially disassembled. The round external connector is visible at the right, and the pins to connect to the AGC at the left. Analog circuitry with cordwood construction is center-left. To the right, several Dipstik modules are visible, white with rows of pins.

Cordwood construction

Cordwood construction was extensively used in the Apollo Guidance Computer for analog circuitry, and the cordwood construction in the rope simulators is similar (below). The white circles in the center are the ends of resistors and diodes mounted vertically through the module, with connections welded on either side. These components are stacked together densely, like wood logs, giving cordwood construction its name. Pulse transformers are under the large gray circles. Similar pulse transformers are on the other side of the module with their orange, yellow, red, and brown wires emerging from the holes. The black wires connect the cordwood circuitry to the digital logic. At the top of the photo, the posts have diodes and resistors mounted behind them, along with connections to the pins that plug into the AGC.

A closeup of the cordwood circuitry in a rope simulator box.

A closeup of the cordwood circuitry in a rope simulator box.

The main purpose of the cordwood circuitry was to provide electrical isolation between the Apollo Guidance Computer's circuitry and the rope simulator boxes. In modern circuitry, this function would be implemented with optoisolators but the rope simulator used small pulse transformers instead. Because each box receives signals directed to three different rope modules, numerous diodes merge the three signals into one. Resistors control the current through the pulse transformers.

Reverse-engineering the analog cordwood circuitry was a pain. First, none of the components were visible since they are embedded in the module. I had to use a multimeter to try to figure out what the components were. Second, since cordwood construction has connections on both sides, I spent a lot of time flipping the box back and forth to find the connection I wanted. Finally, I couldn't come up with a good way of drawing a diagram of cordwood construction without ending up in a maze of lines.

Digital logic and the Dipstiks

The Dipstik was a plug-in module introduced in 1968 to simplify prototyping with integrated circuits. It replaced printed circuit boards with a packaging system that provided twice the density. (See this vintage Dipstik ad.) The idea of the Dipstik was a plastic connector block with wire-wrap pins on the bottom for wiring up the circuit. The integrated circuits were clipped into a carrier that fit into the connector block. The carrier had solder lugs on top for additional components, such as decoupling capacitors. (The photo below shows Dipstik modules with one IC carrier removed. Each carrier held 5 integrated circuits.) The pins of the integrated circuit were sandwiched between contacts on the carrier and contacts on the connector block. It seemed like a great idea but turned out to be unreliable. As the plastic flexed and bowed out, the contacts with the pins became unreliable. (This was a problem both for contemporary Dipstik users and for us decades later.) The Dipstik was a stock market failure.

A Dipstik package opened up. The carrier (left) holds the ICs, and is inserted into the connector block (right). Photo courtesy of Marc Verdiell.

A Dipstik package opened up. The carrier (left) holds the ICs, and is inserted into the connector block (right). Photo courtesy of Marc Verdiell.

The photo below shows the wire-wrapped connections on the underside of the Dipstiks. Tracing this was extremely tedious since I couldn't follow a wire through the sea of identical blue wires. Instead, I had to beep everything out with a multimeter to find what was connected to what. Then I could construct a schematic diagram of the logic circuitry and ponder what it was doing.5 In total, the rope simulator used about 50 ICs.

Each rope simulator box contains complex circuitry. On the left, wire-wrapped wiring connects the TTL integrated circuits. On the left, the analog components are mounted using cordwood construction. At the back, two voltage regulators in large metal TO-3 packages are mounted on a heat sink.

Each rope simulator box contains complex circuitry. On the left, wire-wrapped wiring connects the TTL integrated circuits. On the left, the analog components are mounted using cordwood construction. At the back, two voltage regulators in large metal TO-3 packages are mounted on a heat sink.

Based on the dates on the components, the simulator boxes were built in 1971. Even though this is just a few years after the design of the AGC, the technology in the simulator boxes is much more advanced, illustrating the rapid changes in IC design between the mid-1960s and the early 1970s. The AGC was built with simple integrated circuits, each containing two NOR gates and built with primitive resistor-transistor logic (RTL). The simulator boxes, on the other hand, were built from more complex 7400-series chips6 containing up to a dozen TTL (transistor-transistor logic) gates. Unlike the obsolete flat-pack integrated circuits in the AGC, the simulator boxes used DIP (dual in-line package) ICs, a packaging style that is still in use.

Results of reverse engineering

After tracing out all the circuitry, I figured out how the rope simulator worked and created a schematic.

Essentially, one box decodes the address being accessed, while the second box sends the desired data to the AGC. (I'll call these the "address box" and "data box".)7

The address box takes the rope signals and converts them into a binary address. This task is not straightforward because the signals it receives are 14-volt high-current pulses designed to flip the rope cores. These pulses are also separated in time since some flip the core and others flip the core back. Finally, the pulses sent to the ropes are not a simple address, but also signals to select one of the 6 rope modules, signals to select one of 12 strands in a module.

The address box uses pulse transformers to convert the 14-volt pulses into TTL signals. It has a bunch of AND-OR logic to convert the signals into a binary address. (This is not trivial because each module holds 6 kilowords, not a power of 2, so a lot of bit manipulation is required.) A flip flop latches the address when it is available. Finally, resistor-capacitor one-shots control the timing, determining from the various signals when the address is ready and when the result should be sent to the AGC.

The data box is simpler. It receives 16 bits of data from the external system and sends signals to the AGC's sense amplifiers simulating the millivolt output from a core. These signals are generated via pulse transformers. The address box and data box communicate with each other via wires on the AGC backplane.

The boxes communicate with the external system via differential signals, to avoid picking up noise on long cables. The boxes contain LM109 5-volt regulators to power their TTL circuits. One box receives unregulated DC through the external connector, and sends unregulated DC to the other box through the AGC's backplane wiring. (This seems strange to me.)

With the address box opened, the wire-wrapped circuitry is visible.

With the address box opened, the wire-wrapped circuitry is visible.

The BeagleBone interface

Once I had reverse-engineered the core rope simulator, the next step was to build an interface that could provide program data to the simulator. I used a BeagleBone, a tiny single-board Linux system. The advantage of the BeagleBone is that it includes fast microcontrollers that could respond to the AGC's memory requests quickly, in real time. (I've written about the BeagleBone's PRU microcontrollers before, and used them to make a Xerox Alto Ethernet interface.)

The interface to the rope simulator consists of a board plugged into a BeagleBone. The two cables from the interface board go to the two simulator boxes.

The interface to the rope simulator consists of a board plugged into a BeagleBone. The two cables from the interface board go to the two simulator boxes.

I designed an interface board that plugged into the BeagleBone. The board is pretty straightforward: some AM26C32 differential line receivers to convert the differential signals from the simulator into 3.3V logic signals for the BeagleBone, and some AM26C31 differential line drivers to send signals to the simulator.8 I designed the board in KiCad and PCBWay manufactured it. They are a sponsor of our AGC restoration, so send them some business :-)

I wrote some software that runs on the PRU, the BeagleBone's microcontroller. This software is basically a state machine that waits for an address from the simulator box, waits for the timing signal, reads the word from BeagleBone RAM, and sends the word to the simulator box. The software is on Github.

Problems with the core rope simulator

The core rope simulator boxes were not built to the standard of the Apollo Guidance Computer, and I ended up spending a lot of time debugging them.10 Many welds in the cordwood circuitry were broken and needed to be soldered. (I don't know if the welds went bad over time or if we broke connections while reverse engineering.) We also found a short circuit on a Dipstik and a bad IC.

Debugging the simulator boxes. We're used Marc's vintage Tektronix 7854 scope (1980) to examine the pulse transformer's differential signals. The problem turned out to be a broken connection in the cordwood circuitry. Behind the AGC are the power supplies for the AGC and the rope simulator.

Debugging the simulator boxes. We're used Marc's vintage Tektronix 7854 scope (1980) to examine the pulse transformer's differential signals. The problem turned out to be a broken connection in the cordwood circuitry. Behind the AGC are the power supplies for the AGC and the rope simulator.

The Dipstiks were the worst problem, as many of the contacts between a Dipstik and the IC were intermittent. The problem was that the IC pins are sandwiched between contacts in the Dipstik carrier and contacts in the Dipstik connector block. The plastic Dipstiks tended to bow outwards, resulting in intermittent bad contacts. By bending the IC pins into S curves, Marc was able to keep the pins in contact with both sides, at least for a little while. But after a few hours, the soft IC pins would bend back and connections became unreliable again, so we don't have a good long-term fix.

The most interesting problem was a race condition between two signals from the AGC that should have dropped simultaneously. They fed two ends of a pulse transformer coil, so the transformer should have produced no signal. However, one signal dropped a bit slower than the other, causing a glitch pulse from the pulse transformer.5 Unfortunately, the digital logic in the simulator box was asynchronous, so the glitch latched a bad address bit into the box's flip flops, causing an access to the wrong memory location. Eventually, we tracked the problem down and put capacitors across the offending signals to filter out the glitch. Unfortunately, we used capacitors that were a bit too large and delayed the address signal too much in other circumstances, causing different errors. We put in smaller capacitors and we were finally able to successfully run programs on the AGC, using the vintage core rope simulator.

Conclusion

The Apollo Guidance Computer used core ropes for program storage. Since it wasn't practical to constantly manufacture core ropes during development, rope simulators were used in place of the core rope modules. I reverse-engineered the rope simulator and built a BeagleBone-based interface to drive it. We successfully ran programs on the AGC through the rope simulator. The rope simulator, however, had many problems and wasn't very reliable.

We will be demonstrating the Apollo Guidance Computer next week to celebrate the 50th anniversary of the Moon landing. Come see these historic demos at the Cradle Of Aviation Museum (Long Island) on July 18 and the MIT Museum on July 20.

@CuriousMarc made a video (below) showing our work with the core rope simulator. I announce my latest blog posts on Twitter, so follow me @kenshirriff for future articles. I also have an RSS feed. My rope simulator files are on Github. Thanks to PCBWay for sponsoring this board.

Notes and references

  1. The AGC restoration team consists of Mike Stewart (creator of FPGA AGC), Carl Claunch, Marc Verdiell (CuriousMarc on YouTube) and myself. The AGC that we're restoring belongs to a private owner who picked it up at a scrapyard in the 1970s after NASA scrapped it. For simplicity, I refer to the AGC we're restoring as "our AGC". 

  2. The AGC was a 15-bit machine: each word consisted of 15 data bits and a parity bit. While a word that isn't a power of two may seem bizarre now, computers in the 1960s were designed with whatever word size fit the problem. 

  3. The original caption on the photo was: "Space age needleworker 'weaves' core rope memory for guidance computers used in Apollo missions. Memory modules will permanently store mission profile data on which critical maneuvers in space are based. Core rope memories are fabricated by passing needle-like, hollow rod containing a length of fine wire through cores in the module frame. Module frame is moved automatically by computer-controlled machinery to position proper cores for weaving operation. Apollo guidance computer and associated display keyboard are produced at Raytheon Company plant in Waltham, Massachusetts." Caption and photo are from a Raytheon document, courtesy of Transistor Museum

  4. Several different core rope simulators were built for the AGC. The AGC monitor provided a debugging console with lights and switches along with the rope simulator. The Portafam was a rope simulator that could load programs from magnetic tape. While these rope simulators had some documentation, unfortunately, I couldn't find any documentation on the Raytheon rope simulator that we had, so I had to reverse engineer everything. 

  5. I figured out all the circuitry except for two mysteries. The first mystery is the parity circuit, using two uncommon 74180 parity generator chips. These are not used for the memory parity bit, which is supplied externally. They do not check the address parity supplied by the AGC. It appears that based on an external switch they will optionally replace address bit 7 with the parity of the other address bits. Lacking an address bit, the system would then be unusable. We left the parity switched off and everything worked fine.

    The second mystery is a transistor that looks like it would amplify the strand select signal. The problem is that the transistor's collector isn't connected to anything, so the transistor is unpowered and doesn't function. When we encountered timing issues with the strand select signal, we powered up this transistor to see if it helped, but it made things worse. In addition, the transistor appears to have been added to the cordwood circuitry at a later date. We ended up ignoring the transistor. 

  6. The TTL chips in the interface boxes were 5400-series chips. These are the military version of the well-known 7400-series, identical except operating over a wider temperature range. 

  7. The split into "address box" and "data box" isn't exact. Due to the way the rope slots are wired, the data box determines which of the 6 modules are being addressed and provides this information to the address box. 

  8. I had some difficulty getting the round MIL-spec 20-39S connectors to attach to the simulator boxes, since they have unusual keying. There are a zillion slightly-different variants of these connectors, many costing hundreds of dollars. I ended up getting connectors off eBay and Marc milled the keying off so the connectors worked. I used 25-pair Amphenol telco cables between the BeagleBone and the simulator. Soldering the wires to the connectors was more of a pain than I expected. 

  9. Viewing the pulse transformer signals on a scope was difficult because you need to see the small differential signal across the transformer. Since we didn't have a differential probe for the modern scopes, we used Marc's vintage scope that did have a differential probe. (You might think you could subtract the two signals with a modern scope, but the problem was that the difference was much smaller than the common-mode voltage, so you essentially get zero when you subtract.) 

  10. We built some infrastructure to help debug the simulator boxes. I tested the boxes extensively outside the AGC, using an Arduino and a power transistor to generate test signals. I added debugging code to the BeagleBone to detect when something went wrong, and illuminated a status LED on my interface board. I also used the BeagleBone to generate an oscilloscope trigger signal at a known-bad address, letting us see the analog signals at that specific point. Mike wrote some FPGA code to check the data from the simulator box against the data the AGC should have read, detecting whenever something went wrong. Finally, I logged the addresses that the BeagleBone saw, while Mike logged that addresses that the AGC was sending. By comparing the addresses, we could see which addresses were bad.

    I learned a few lessons from my interface board that I'll apply to future boards. Putting an RGB status LED on the board was my best idea, since it made it much easier to tell what was happening. I should have exposed the BeagleBone's serial port connector on my interface board. As it was, if I ran into any problems booting the BeagleBone, I had to pull off the interface board, attach the FTDI serial adapter, fix the problem via the serial console, remove the serial adapter, and then reinstall the interface board. Finally, I should have exposed a couple generic I/O pins for functions such as oscilloscope triggering. Instead, I had to solder temporary wires onto my interface board. 

Two bits per transistor: high-density ROM in Intel's 8087 floating point chip

The 8087 chip provided fast floating point arithmetic for the original IBM PC and became part of the x86 architecture used today. One unusual feature of the 8087 is it contained a multi-level ROM (Read-Only Memory) that stored two bits per transistor, twice as dense as a normal ROM. Instead of storing binary data, each cell in the 8087's ROM stored one of four different values, which were then decoded into two bits. Because the 8087 required a large ROM for microcode1 and the chip was pushing the limits of how many transistors could fit on a chip, Intel used this special technique to make the ROM fit. In this article, I explain how Intel implemented this multi-level ROM.

Intel introduced the 8087 chip in 1980 to improve floating-point performance on the 8086 and 8088 processors. Since early microprocessors operated only on integers, arithmetic with floating point numbers was slow and transcendental operations such as trig or logarithms were even worse. Adding the 8087 co-processor chip to a system made floating point operations up to 100 times faster. The 8087's architecture became part of later Intel processors, and the 8087's instructions (although now obsolete) are still a part of today's x86 desktop computers.

I opened up an 8087 chip and took die photos with a microscope yielding the composite photo below. The labels show the main functional blocks, based on my reverse engineering. (Click here for a larger image.) The die of the 8087 is complex, with 40,000 transistors.2 Internally, the 8087 uses 80-bit floating point numbers with a 64-bit fraction (also called significand or mantissa), a 15-bit exponent and a sign bit. (For a base-10 analogy, in the number 6.02x1023, 6.02 is the fraction and 23 is the exponent.) At the bottom of the die, "fraction processing" indicates the circuitry for the fraction: from left to right, this includes storage of constants, a 64-bit shifter, the 64-bit adder/subtracter, and the register stack. Above this is the circuitry to process the exponent.

Die of the Intel 8087 floating point unit chip, with main functional blocks labeled.

Die of the Intel 8087 floating point unit chip, with main functional blocks labeled.

An 8087 instruction required multiple steps, over 1000 in some cases. The 8087 used microcode to specify the low-level operations at each step: the shifts, adds, memory fetches, reads of constants, and so forth. You can think of microcode as a simple program, written in micro-instructions, where each micro-instruction generated control signals for the different components of the chip. In the die photo above, you can see the ROM that holds the 8087's microcode program. The ROM takes up a large fraction of the chip, showing why the compact multi-level ROM was necessary. To the left of the ROM is the "engine" that ran the microcode program, essentially a simple CPU.

The 8087 operated as a co-processor with the 8086 processor. When the 8086 encountered a special floating point instruction, the processor ignored it and let the 8087 execute the instruction in parallel.3 I won't explain in detail how the 8087 works internally, but as an overview, floating point operations were implemented using integer adds/subtracts and shifts. To add or subtract two floating point numbers, the 8087 shifted the numbers until the binary points (i.e. the decimal points but in binary) lined up, and then added or subtracted the fraction. Multiplication, division, and square root were performed through repeated shifts and adds or subtracts. Transcendental operations (tan, arctan, log, power) used CORDIC algorithms, which use shifts and adds of special constants, processing one bit at a time. The 8087 also dealt with many special cases: infinities, overflows, NaN (not a number), denormalized numbers, and several rounding modes. The microcode stored in ROM controlled all these operations.

Implementation of a ROM

The 8087 chip consists of a tiny silicon die, with regions of the silicon doped with impurities to give them the desired semiconductor properties. On top of the silicon, polysilicon (a special type of silicon) formed wires and transistors. Finally, a metal layer on top wired the circuitry together. In the photo below, the left side shows a small part of the chip as it appears under a microscope, magnifying the yellowish metal wiring. On the right, the metal has been removed with acid, revealing the polysilicon and silicon. When polysilicon crosses silicon, a transistor is formed. The pink regions are doped silicon, and the thin vertical lines are the polysilicon. The small circles are contacts between the silicon and metal layers, connecting them together.

Structure of the ROM in the Intel 8087 FPU. The metal layer is on the left and the polysilicon and silicon layers are on the right.

Structure of the ROM in the Intel 8087 FPU. The metal layer is on the left and the polysilicon and silicon layers are on the right.

While there are many ways of building a ROM, a typical way is to have a grid of "cells," with each cell holding a bit. Each cell can have a transistor for a 0 bit, or lack a transistor for a 1 bit. In the diagram above, you can see the grid of cells with transistors (where silicon is present under the polysilicon) and missing transistors (where there are gaps in the silicon). To read from the ROM, one column select line is energized (based on the address) to select the bits stored in that column, yielding one output bit from each row. You can see the vertical polysilicon column select lines and the horizontal metal row outputs in the diagram. The vertical doped silicon lines are connected to ground.

The schematic below (corresponding to a 4×4 ROM segment) shows how the ROM functions. Each cell either has a transistor (black) or no transistor (grayed-out). When a polysilicon column select line is energized, the transistors in that column turn on and pull the corresponding metal row outputs to ground. (For our purposes, an NMOS transistor is like a switch that is open if the input (gate) is 0 and closed if the input is 1.) The row lines output the data stored in the selected column.

Schematic of a 4×4 segment of a ROM.

Schematic of a 4×4 segment of a ROM.

The column select signals are generated by a decoder circuit. Since this circuit is built from NOR gates, I'll first explain the construction of a NOR gate. The schematic below shows a four-input NOR gate built from four transistors and a pull-up resistor (actually a special transistor). On the left, all inputs are 0 so all the transistors are off and the pull-up resistor pulls the output high. On the right, an input is 1, turning on a transistor. The transistor is connected to ground, so it pulls the output low. In summary, if any inputs are high, the output is low so this circuit implements a NOR gate.

4-input NOR gate constructed from NMOS transistors.

4-input NOR gate constructed from NMOS transistors.

The column select decoder circuit takes the incoming address bits and activates the appropriate select line. The decoder contains an 8-input NOR gate for each column, with one NOR gate selected for the desired address. The photo shows two of the NOR gates generating two of the column select signals. (For simplicity, I only show four of the 8 inputs). Each column uses a different combination of address lines and complemented address lines as inputs, selecting a different address. The address lines are in the metal layer, which was removed for the photo below; the address lines are drawn in green. To determine the address associated with a column, look at the square contacts associated with each transistor and note which address lines are connected. If all the address lines connected to a column's transistors are low, the NOR gate will select the column.

Part of the address decoder. The address decoder selects odd columns in the ROM, counting right to left. The numbers at the top show the address associated with each output.

Part of the address decoder. The address decoder selects odd columns in the ROM, counting right to left. The numbers at the top show the address associated with each output.

The photo below shows a small part of the ROM's decoder with all 8 inputs to the NOR gates. You can read out the binary addresses by carefully examining the address line connections. Note the binary pattern: a1 connections alternate every column, a2 connections alternate every two columns, a3 connections every four columns, and so forth. The a0 connection is fixed because this decoder circuit selects the odd columns; a similar circuit above the ROM selects the even addresses. (This split was necessary to make the decoder fit on the chip because each decoder column is twice as wide as a ROM cell.)

Part of the address decoder for the 8087's microcode ROM. The decoder converts an 8-bit address into column select signals.

Part of the address decoder for the 8087's microcode ROM. The decoder converts an 8-bit address into column select signals.

The last component of the ROM is the set of multiplexers that reduces the 64 output rows down to 8 rows.4 Each 8-to-1 multiplexer selects one of its 8 inputs, based on the address. The diagram below shows one of these row multiplexers in the 8087, built from eight large pass transistors, each one connected to one of the row lines. All the transistors are connected to the output so when the selected transistor is turned on, it passes its input to the output. The multiplexer transistors are much, much larger than the transistors in the ROM to reduce distortion of the ROM signal. A decoder (similar to the one discussed earlier, but smaller) generates the eight multiplexer control lines from three address lines.

One of eight row multiplexers in the ROM. This shows the poly/silicon layers, with metal wiring drawn in orange.

One of eight row multiplexers in the ROM. This shows the poly/silicon layers, with metal wiring drawn in orange.

To summarize, the ROM stores bits in a grid. It uses eight address bits to select a column in the grid. Then three address bits select the desired eight outputs from the row lines.

The multi-level ROM

The discussion so far explained of a typical ROM that stores one bit per cell. So how did 8087 store two bits per cell? If you look closely, the 8087's microcode ROM has four different transistor sizes (if you count "no transistor" as a size).6 With four possibilities for each transistor, a cell can encode two bits, approximately doubling the density.7 This section explains how the four transistor sizes generate four different currents, and how the chip's analog and digital circuitry converts these currents into two bits.

A closeup of the 8087's microcode ROM shows four different transistor sizes. This allows the ROM to store two bits per cell.

A closeup of the 8087's microcode ROM shows four different transistor sizes. This allows the ROM to store two bits per cell.

The size of the transistor controls the current through the transistor.8 The important geometric factor is the varying width of the silicon (pink) where it is crossed by the polysilicon (vertical lines), creating transistors with different gate widths. Since the gate width controls the current through the transistor, the four transistor sizes generate four different currents: the largest transistor passes the most current and no current will flow if there is no transistor at all.

The ROM current is converted to bits in several steps. First, a pull-up resistor converts the current to a voltage. Next, three comparators compare the voltage with reference voltages to generate digital signals indicating if the ROM voltage is lower or higher. Finally, logic gates convert the comparator output signals to the two output bits. This circuitry is repeated eight times, generating 16 output bits in total.

The circuit to read two bits from a ROM cell.

The circuit to read two bits from a ROM cell.

The circuit above performs these conversion steps. At the bottom, one of the ROM transistors is selected by the column select line and the multiplexer (discussed earlier), generating one of four currents. Next, a pull-up resistor12 converts the transistor's current to a voltage, resulting in a voltage depending on the size of the selected transistor. The comparators compare this voltage to three reference voltages, outputting a 1 if the ROM voltage is higher than the reference voltage. The comparators and reference voltages require careful design because the ROM voltages could differ by as little as 200 mV.

The reference voltages are mid-way between the expected ROM voltages, allowing some fluctuation in the voltages. The lowest ROM voltage is lower than all the reference voltages so all comparators will output 0. The second ROM voltage is higher than Reference 0, so the bottom comparator outputs 1. For the third ROM voltage, the bottom two comparators output 1, and for the highest ROM voltage all comparators output 1. Thus, the three comparators yield four different output patterns depending on the ROM transistor. The logic gates then convert the comparator outputs into the two output bits.10

The design of the comparator is interesting because it is the bridge between the analog and digital worlds, producing a 1 or 0 if the ROM voltage is higher or lower than the reference voltage. Each comparator contains a differential amplifier that amplifies the difference between the ROM voltage and the reference voltage. The output from the differential amplifier drives a latch that stabilizes the output and converts it to a logic-level signal. The differential amplifier (below) is a standard analog circuit. A current sink (symbol at the bottom) provides a constant current. If one of the transistors has a higher input voltage than the other, most of the current passes through that transistor. The voltage drop across the resistors will cause the corresponding output to go lower and the other output to go higher.

Diagram showing the operation of a differential pair. Most of the current will flow through the transistor with the higher input voltage, pulling the corresponding output lower. The double-circle symbol at the bottom is a current sink, providing a constant current I.

Diagram showing the operation of a differential pair. Most of the current will flow through the transistor with the higher input voltage, pulling the corresponding output lower. The double-circle symbol at the bottom is a current sink, providing a constant current I.

The photo below shows one of the comparators on the chip; the metal layer is on top, with the transistors underneath. I'll just discuss the highlights of this complex circuit; see the footnote12 for details. The signal from the ROM and multiplexer enters on the left. The pull-up circuit12 converts the current into a voltage. The two large transistors of the differential amplifier compare the ROM's voltage with the reference voltage (entering at top). The outputs from the differential amplifier go to the latch circuitry (spread across the photo); the latch's output is in the lower right. The differential amplifier's current source and pull-up resistors are implemented with depletion-mode transistors. Each output circuit uses three comparators, yielding 24 comparators in total.

One of the comparators in the 8087. The chip contains 24 comparators to convert the voltage levels from the multi-level ROM into binary data.

One of the comparators in the 8087. The chip contains 24 comparators to convert the voltage levels from the multi-level ROM into binary data.

Each reference voltage is generated by a carefully-sized transistor and a pull-up circuit. The reference voltage circuit is designed as similar as possible to the ROM's signal circuitry, so any manufacturing variations in the chip will affect both equally. The reference voltage and ROM signal both use the same pull-up circuit. In addition, each reference voltage circuit includes a very large transistor identical to the multiplexer transistor, even though there is no multiplexing in the reference circuit, just to make the circuits match. The three reference voltage circuits are identical except for the size of the reference transistor.9

Circuit generating the three reference voltages. The reference transistors are sized between the ROM's transistor sizes. The oxide layer wasn't fully removed from this part of the die, causing the color swirls in the photo.

Circuit generating the three reference voltages. The reference transistors are sized between the ROM's transistor sizes. The oxide layer wasn't fully removed from this part of the die, causing the color swirls in the photo.

Putting all the pieces together, the photo below shows the layout of the microcode ROM components on the chip.12 The bulk of the ROM circuitry is the transistors holding the data. The column decoder circuitry is above and below this. (Half the column select decoders are at the top and half are at the bottom so they fit better.) The output circuitry is on the right. The eight multiplexers reduce the 64 row lines down to eight. The eight rows then go into the comparators, generating the 16 output bits from the ROM at the right. The reference circuit above the comparators generates the three reference voltage. At the bottom right, the small row decoder controls the multiplexers.

Microcode ROM from the Intel 8087 FPU with main components labeled.

Microcode ROM from the Intel 8087 FPU with main components labeled.

While you'd hope for the multi-level ROM to be half the size of a regular ROM, it isn't quite that efficient because of the extra circuitry for the comparators and because the transistors were slightly larger to accommodate the multiple sizes. Even so, the multi-level ROM saved about 40% of the space a regular ROM would have taken.

Now that I have determined the structure of the ROM, I could read out the contents of the ROM simply (but tediously) by looking at the size of each transistor under a microscope. But without knowing the microcode instruction set, the ROM contents aren't useful.

Conclusions

The 8087 floating point chip used an interesting two-bit-per-cell structure to fit the microcode onto the chip. Intel re-used the multi-level ROM structure in 1981 in the doomed iAPX 432 system.11 As far as I can tell, interest in ROMs with multiple-level cells peaked in the 1980s and then died out, probably because Moore's law made it easier to gain ROM capacity by shrinking a standard ROM cell rather than designing non-standard ROMs requiring special analog circuits built to high tolerances.14

Surprisingly, the multi-level concept has recently returned, but this time in flash memory. Many flash memories store two or more bits per cell.13 Flash has even achieved a remarkable 4 bits per cell (requiring 16 different voltage levels) with "quad-level cell" consumer products announced recently. Thus, an obscure technology from the 1980s can show up again decades later.

I announce my latest blog posts on Twitter, so follow me at @kenshirriff for future 8087 articles. I also have an RSS feed. Thanks to Jeff Epler for suggesting that I investigate the 8087's ROM.

Notes and references

  1. The 8087 has 1648 words of microcode (if I counted correctly), with 16 bits in each word, for a total of 26368 bits. The ROM size didn't need to be a power of two since Intel could build it to the exact size required. 

  2. Sources provide inconsistent values for the number of transistors in the 8087: Intel claims 40,000 transistors while Wikipedia claims 45,000. The discrepancy could be due to different ways of counting transistors. In particular, since the number of transistors in a ROM, PLA or similar structure depends on the data stored in it, sources often count "potential" transistors rather than the number of physical transistors. Other discrepancies can be due to whether or not pull-up transistors are counted and if high-current drivers are counted as multiple transistors in parallel or one large transistor. 

  3. The interaction between the 8086 processor and the 8087 floating point unit is somewhat tricky; I'll discuss some highlights. The simplified view is that the 8087 watches the 8086's instruction stream, and executes any instructions that are 8087 instructions. The complication is that the 8086 has an instruction prefetch buffer, so the instruction being fetched isn't the one being executed. Thus, the 8087 duplicates the 8086's prefetch buffer (or the 8088's smaller prefetch buffer), so it knows that the 8086 is doing. Another complication is the complex addressing modes used by the 8086, which use registers inside the 8086. The 8087 can't perform these addressing modes since it doesn't have access to the 8086 registers. Instead, when the 8086 sees an 8087 instruction, it does a memory fetch from the addressed location and ignores the result. Meanwhile, the 8087 grabs the address off the bus so it can use the address if it needs it. If there is no 8087 present, you might expect a trap, but that's not what happens. Instead, for a system without an 8087, the linker rewrites the 8087 instructions, replacing them with subroutine calls to the emulation library. 

  4. The reason ROMs typically use multiplexers on the row outputs is that it is inefficient to make a ROM with many columns and just a few output bits, because the decoder circuitry will be bigger than the ROM's data. The solution is to reshape the ROM, to hold the same bits but with more rows and fewer columns. For instance, the ROM can have 8 times as many rows and 1/8 the columns, making the decoder 1/8 the size.

    In addition, a long, skinny ROM (e.g. 1K×16) is inconvenient to lay out on a chip, since it won't fit as a simple block. However, a serpentine layout could be used. For example, Intel's early memories were shift registers; the 1405 held 512 bits in a single long shift register. To fit this onto a chip, the shift register wound back and forth about 20 times (details). 

  5. Some IBM computers used an unusual storage technique to hold microcode: Mylar cards had holes punched in them (just like regular punch cards), and the computer sensed the holes capacitively (link). Some computers, such as the Xerox Alto, had some microcode in RAM. This allowed programs to modify the microcode, creating a new instruction set for their specific purposes. Many modern processors have writeable microcode so patches can fix bugs in the microcode. 

  6. I didn't notice the four transistor sizes in the microcode ROM until a comment on Hacker News mentioned that the 8087 used two-bit-per-cell technology. I was skeptical, but after looking at the chip more closely I realized the comment was correct. 

  7. Several other approaches were used in the 1980s to store multiple bits per cell. One of the most common was used by Mostek and other companies: transistors in the ROM were doped to have different threshold voltages. By using four different threshold voltages, two bits could be stored per cell. Compared to Intel's geometric approach, the threshold approach was denser (since all the transistors could be as small as possible), but required more mask layers and processing steps to produce the multiple implantation levels. This approach used the new (at the time) technology of ion implantation to carefully tune the doping levels of each transistor.

    Ion implantation's biggest impact on integrated circuits was its use to create depletion transistors (transistors with a negative threshold voltage), which worked much better as pull-up resistors in logic gates. Ion implantation was also used in the Z-80 microprocessor to create some transistor "traps", circuits that looked like regular transistors under a microscope but received doping implants that made them non-functional. This served as copy protection since a manufacturer that tried to produce clones on the Z-80 by copying the chip with a microscope would end up with a chip that failed in multiple ways, some of them very subtle. 

  8. The current through the transistor is proportional to the ratio between the width and length of the gate. (The length is the distance between the source and drain.) The ROM transistors (and all but the smallest reference transistor) keep the length constant and modify the width, so shrinking the width reduces the current flow. For MOSFET equations, see Wikipedia

  9. The gate of the smallest reference transistor is made longer rather than narrower, due to the properties of MOS transistors. The problem is that the reference transistors need to have sizes between the sizes of the ROM transistors. In particular, Reference 0 needs a transistor smaller than the smallest ROM transistor. But the smallest ROM transistor is already as small as possible using the manufacturing techniques. To solve this, note that the polysilicon crossing the middle reference transistor is much thicker horizontally. Since a MOS transistor's properties are determined by the width to height ratio of its gate, expanding the polysilicon is as good as shrinking the silicon for making the transistor act smaller (i.e. lower current). 

  10. The ROM logic decodes the transistor size to bits as follows: No transistor = 00, small transistor = 01, medium transistor = 11, large transistor = 10. This bit ordering saves a few gates in the decoding logic; since the mapping from transistor to bits is arbitrary, it doesn't matter that the sequence is not in order. (See "Two Bits Per Cell ROM", Stark for details.)  

  11. Intel's iAPX 43203 interface processor (1981) used a multiple-level ROM very similar to the one in the 8087 chip. For details, see "The interface processor for the Intel VLSI 432 32 bit computer," J. Bayliss et al., IEEE J. Solid-State Circuits, vol. SC-16, pp. 522-530, Oct. 1981.
    The 43203 interface processor provided I/O support for the iAPX 432 processor. Intel started the iAPX 432 project in 1975 to produce a "micromainframe" that would be Intel's revolutionary processor for the 1980s. When the iAPX 432 project encountered delays, Intel produced the 8086 processor as a stopgap, releasing it in 1978. While the Intel 8086 was a huge success, leading to the desktop PC and the current x86 architecture, the iAPX 432 project ended up a failure and ended in 1986. 

  12. The schematic below (from "Multiple-Valued ROM Output Circuits") provides details of the circuitry to read the ROM. Conceptually the ROM uses a pull-up resistor to convert the transistor's current to a voltage. The circuit actually uses a three transistor circuit (T3, T4, T5) as the pull-up. T4 and T5 are essentially an inverter providing negative feedback via T3, making the circuit less sensitive to perturbations (such as manufacturing variations). The comparator consists of a simple differential amplifier (yellow) with T6 acting as the current source. The differential amplifier output is converted into a stable logic-level signal by the latch (green).

    Diagram of 8087 ROM output circuit.

    Diagram of 8087 ROM output circuit.

  13. Flash memories are categorized as SLC (single level cell—one bit per cell), MLC (multi level cell—two bits per cell), TLC (triple level cell—three bits per cell) and QLC (quad level cell—four bits per cell). In general, flash with more bits per cell is cheaper but less reliable, slower, and wears out faster due to the smaller signal margins. 

  14. The journal Electronics published a short article "Four-State Cell Doubles ROM Bit Capacity" (p39, Oct 9, 1980), describing Intel's technique, but the article is vague to the point of being misleading. Intel published a detailed article "Two bits per cell ROM" in COMPCON (pp209-212, Feb 1981). An external group attempted to reverse engineer more detailed specifications of the Intel circuits in "Multiple-valued ROM output circuits" (Proc. 14th Int. Symp. Multivalue Logic, 1984). Two papers describing multiple-value memories are A Survey of Multivalued Memories (IEEE Transactions on Computers, Feb 1986, pp 99-106) and A review of multiple-valued memory technology (IEEE Symposium on Multiple-Valued Logic, 1998).