Ken Shirriff's blog

Energizing a vacuum-tube flip-flop module from a 1948 IBM system

In 1948, IBM introduced the 604 Electronic Calculating Punch. This machine was a programmable calculator, about the size of a double refrigerator. It was not quite a computer, but was programmed by plugging wires into a plugboard. This machine read numbers from a punch card, performed up to 60 calculations on these numbers, and then recorded the results by punching holes in the card.1 It processed 100 cards per minute—over one card per second—and IBM advertised it as the equivalent of 150 engineers. The machine rented for $550 a month, making it very popular, with over 5600 units produced.2

The IBM 604 Electronic Calculating Punch. Photo from Ed Thelen's IBM 604 page.

The IBM 604 came out just after the transistor was invented, too early to use transistors. At the time, calculators and computers were moving from slow electromechanical components to fast vacuum tubes. One of the innovations of the 604 was to combine a vacuum tube and its associated circuitry into a pluggable module. Along the left side of the photo above, you can see rows of these modules with the handles sticking out, making it easy to replace a faulty module. More modules are behind the silver metal covers. In total, the IBM 604 used about 1300 vacuum tubes.

The photo below shows a pluggable tube module, with a vacuum tube underneath the insulated handle. The nine pins at the bottom of the module plugged into a socket in the 604, with the sockets connected by backplane wiring. The vacuum tube was also socketed, so a bad tube could be quickly replaced. At the left, the resistors and capacitors are mounted on insulating wafers. Modules provided a dense way to implement circuits, packing components into three dimensions.

The TR-3 trigger module from the IBM 604 Electronic Calculating Punch.

Each pluggable tube module implemented a specific function, such as an inverter, amplifier, or power driver. The module above is a "trigger" module, type TR-3. A trigger is a circuit with two states—on and off—and can be switched from one state to the other, providing one bit of temporary storage. (In modern terminology, this is called a flip-flop.) Triggers were important building blocks in the 604, generating timing signals and storing pulses. Arithmetic in the IBM 604 was implemented with decimal counters, built from TR3 triggers.

In this article, I describe the circuitry of the TR-3 trigger module. (I recently wrote about a thyratron module in the 604; this is a different module.) After reverse-engineering the module, I powered it up. The video above shows the module in operation. By pressing buttons, I switch the trigger from one state to the other. Glowing orange neon bulbs show the state of the trigger module. The fundamental feature of the trigger is that it stays in a state until I push the other button. This might appear trivial, but the ability to store information is vitally important for computation.

How a vacuum tube works

The trigger module uses a common type of vacuum tube called a triode, which amplifies a weak signal to control a stronger signal. The diagram below shows the construction of a triode vacuum tube. The heater is a filament, similar to an incandescent light bulb, that heats the cathode to roughly 750 ºC. At this high temperature, the cathode emits electrons. If a large positive voltage (say, 150 volts) is put on the plate, the negatively charged electrons are attracted to the plate. The stream of electrons from the cathode to the plate causes a current to flow through the tube. Since air would block the electrons, the fragile glass envelope holds a vacuum, giving the vacuum tube its name. The current is controlled by the grid: if a small negative voltage is placed on the grid, it repels the negative electrons, preventing them from reaching the plate and blocking the current through the tube.3 Thus, a small signal on the grid controls the large current through the tube.

The components of a triode vacuum tube. From IBM 604 Customer Engineering manual.

The advantage of vacuum tubes was that they could switch on and off millions of times per second, phenomenally faster than electromechanical devices such as relays. The clock speed of the IBM 604 was 50 kilohertz, much below what a tube could handle, but three orders of magnitude faster than the 50 hertz pulses in an electromechanical accounting machine like the contemporaneous IBM 407.

The tube that I used in the module is called a 2033.6 This tube is a dual triode, combining two triodes into one physical glass tube. Dual triodes were very popular because they doubled the density of the circuitry. In the photo below, the two vertical black structures are the plates of the two triodes; the other structures are not visible as they are inside the plates.

The 2033 dual-triode vacuum tube.

This tube is a "miniature" vacuum tube, about 5 cm long including the seven pins at the bottom of the glass envelope.4 Since a single triode has five connections, you might wonder how a dual triode manages with seven pins instead of 10. The trick is that both triodes share the cathode and heater connections, which limits the tube to applications that don't require separate cathodes.

One disadvantage of vacuum tubes is that the heater uses considerable power. This tube's heater requires 6.3 volts at 300 milliamps—almost 2 watts per tube. Using 6.3 volts may seem a bit random, but many vacuum tubes used this voltage for historical reasons: this was the typical voltage provided by a 6-volt automobile battery.5 In the photo below, you can see the orange glow from the two heaters, mostly hidden by the plates but visible at the top and bottom.

The tube powered up, showing the glowing filaments.

Inverters and the trigger circuit

The trigger circuit is based on two inverters, so I'll start by explaining the tube inverter circuit.7 The idea of an inverter is to amplify and invert the input signal: a "low" input results in a "high" output and vice versa. First, consider a low input: if a negative voltage is applied to the grid, the flow of electrons is blocked, turning off the tube. In this case, the resistor pulls the output high with 150 volts. However, if a positive voltage is applied to the grid, the tube turns on and conducts current. This current pulls the output down, due to the voltage drop across the resistor, producing a low output of 50 volts. Thus, a low input causes a high (150 V) output, while a high input causes a low (50 V) output, providing the desired inverter action. Note that the input signal has a swing of over 50 volts, very large compared to a transistor circuit. Moreover, the output voltages are much higher than the input voltages, which is somewhat inconvenient when connecting circuits.

An inverter circuit. Adapted from IBM 604 CE Manual.

A trigger is constructed from two inverters connected in a loop. The output of the first inverter is fed into the second inverter, and the output of the second inverter is looped back to the first inverter. If the first inverter has a high output, the second inverter has a low output, which is fed back to the first, maintaining the high output from the first inverter. The situation is similar but opposite if the first inverter has a low output. Thus, this circuit has two stable states, with one inverter on and the other off.8 Once the circuit is placed into a state, it will remain in that state until forced into the other state.

Two inverters in a loop can store a 0 or a 1.

I reverse-engineered the TR-3 module, creating the schematic below.9 It's a bit tricky to see the loop of inverters because the two inverters share one tube and are wired in a cross-coupled arrangement. In brief, one inverter uses the left half of the tube and the other uses the right half. The plate output from one side is wired to the grid input on the other side, through 200K and 1K resistors. The two module outputs (pins 7 and 8) are taken from the plates, but output 8 has a resistor between it and the plate. As a result, the two outputs provide different voltage levels, making the module more flexible to use.10 The two inputs force the trigger into one state or the other. The inputs are connected through 40 pF capacitors, providing AC coupling so the inputs can use different voltage levels from the outputs.

Reverse-engineered schematic of the TR-3 trigger module. Note that the pin numbers for the module are different from the pin numbers for the tube.

One tricky part is the connection between one inverter's output and the other inverter's input. The problem is that the output voltage is 50 to 150 volts, but the input grid voltage must be close to zero (a bit positive or a bit negative). The solution is to use a large negative voltage (-100 volts) and a resistor divider as a level shifter. With a large positive voltage from the plate and a large negative bias voltage, the resulting grid voltage ends up being moderately positive or moderately negative. As a result, the circuit requires both a high positive voltage (for the plate) and a high negative voltage (for the bias), complicating the power supply requirements.

The inputs are fed into the grid through capacitors, allowing a pulse to pass through the capacitor to the grid. You might expect that a positive pulse would turn on the triode, but the module was used in the opposite way, with a negative pulse to turn off the triode. (This direction is more sensitive, because a tube has more gain when it is on.) Thus, a negative pulse on the left input will turn off the left side. The plate output of the left side goes high, pulling the gate of the right side high, turning the right side on. The plate output from the right side goes low, pulling the gate of the left side low, keeping the left tube off. Similarly, a negative pulse on the right input turns off the right side, causing the left side to turn on. Multiple pulses have no effect; that side remains off. Positive pulses also have no effect; the circuit is designed so a positive pulse is not sufficient to turn a triode on.11

I found the trigger circuit to be somewhat temperamental: the trigger needs to be stable enough to stay in one state or the other, while also unstable enough that an input pulse will reliably flip it to the other state. The circuit depends on carefully balancing the grid voltages and the input voltages. I experimented with different supply voltages and found that in some cases the trigger would oscillate, while in other cases, the trigger would get stuck in one state. Interestingly, the later IBM 650 computer abandoned this type of trigger circuit, instead using diode logic (AND and OR gates) to set and reset a loop of two inverters. With this type of trigger, the state is determined by reliable Boolean logic, rather than analog interactions of changing voltages.

Conclusion

The development of the trigger is an under-appreciated step in the history of digital computers. Because the trigger holds information—state—it can be used to create a state machine. This allows a computer to perform operations step by step, rather than a jumble of actions all happening at the same time.

The trigger circuit dates back to 1918, when two British physicists, William Eccles and Frank Jordan, invented a circuit that used two cross-coupled triodes to create a circuit with two stable states. They viewed this circuit as a type of relay, triggered by a small signal and retaining its state until it was reset. They patented the circuit (Improvements in ionic relays) and wrote about it: A Trigger Relay Utilising Three-Electrode Thermionic Vacuum Tubes. (The Eccles-Jordan trigger circuit below is conceptually similar to the TR-3 trigger module, using cross-coupled triodes. One difference is that the input is coupled with a transformer.)

A diagram of the Eccles-Jordan trigger relay, from their 1919 paper.

The Eccles-Jordan trigger eventually led to digital counters. In 1939, the journal Electronics published an article Trigger Circuits, describing how trigger circuits could be combined to construct high-speed counters. One problem was that triggers can be easily combined to count in binary, but in the 1940s, calculating and accounting machines generally used decimal numbers, not binary. In the groundbreaking ENIAC computer (1945), bulky counters were constructed by putting ten triggers in a ring to count each decimal digit. IBM engineers invented a more efficient decimal counter that used four triggers instead of ten, coming up with binary-coded decimal (BCD) and obtaining a 1946 patent: Electronic Counting Circuit. IBM used this counting circuit in the 603 Electronic Multiplier (1946), followed by the 604 Electronic Calculating Punch (1948).

Modern computers use triggers—albeit under the modern name "flip-flops"—by the millions, but now they are microscopic transistor circuits instead of vacuum-tube modules.

For updates, follow me on Bluesky (@righto.com), Mastodon (@[email protected]), or RSS. Thanks to Robert Garner for providing the module and to CuriousMarc for hardware support. AI statement: Despite the presence of the em dash, no AI was used in the writing of this article (details).

A 1951 advertisement for the IBM 604, describing how the system was like having 150 extra engineers. Slide rules were the common calculating tool at the time. Notice that diversity amongst engineers was limited to hairstyle. From Fortune, December 1951 via Wikimedia, scanned by Michael Holley.

Notes and references

The punch cards were read and punched by a separate unit, the IBM 521 Card Reader/Punch, which was connected to the IBM 604 through a thick cable. The 521 had a card magazine on the upper left to hold cards to be read. After cards were processed, they were collected in the hopper in the middle of the 521. Note the plugboard control panels in both the 604 and the 521.

The IBM 521 Card Reader/Punch to the right of the IBM 604 Electronic Calculating Punch. Photo from Customer Engineering Manual of Instruction.

The punch cards were standard IBM 80-column cards, introduced back in 1928. The position of a hole in a column indicated the digit value for that column. For a particular task, the 80 columns would be divided into fields to hold various numbers. The 604 only supported numbers, not other alphanumeric symbols. A negative number was indicated by punching an additional hole over the units digit, using the second row from the top. (This was called an "X-punch", unrelated to the letter X.)

Punch card code, from IBM 29 Card Punch Reference Manual. This code is somewhat later, with a variety of special characters.

↩
An interesting video showing the manufacturing and operation of the IBM 604 is here. For information on the IBM 604, see the Operating Manual. The Customer Engineering Manual of Instruction explains the 604 in detail, showing a TR-3 tube module on page 20. See IBM's Early Computers for information on the development of the 604. ↩
You can think of a triode as analogous to an NPN transistor, with the grid as the base, the plate as the collector, and the cathode as the emitter. ↩
The IBM 604 also used dual-triode vacuum tubes with nine pins, rather than seven, such as the type 5965. (Seven and nine pins were standard sizes for tubes.) The nine-pin tubes had separate connections for each cathode; this allowed the tubes to be used in circuits such as "cathode followers". The two filaments were in series, sharing a common "middle" pin, which is why the tube used nine pins instead of 10. ↩
For a discussion of filament voltages, see Valves, 1939. This article discusses how car radios motivated the use of 6.3 volt filaments in the United States, where 6-volt car batteries were common. ↩
The 2033 tube is very similar to the popular 6J6 tube, but optimized for computer circuits. The IBM 1684 tube is also very similar. The tube module that I examined was missing its tube, so I can't guarantee that the 2033 is the correct tube. ↩
The standardized tube modules weren't as standardized as one might expect. For instance, the 604 used 27 different types of inverter modules in total. For a detailed discussion of the tube inverter, see IBM 604 CE Manual, pages 26-41. ↩
A trigger circuit is symmetrical, so how do you define whether a trigger circuit is on or off? IBM's convention was that if the left triode was conducting, the trigger was on, while if the right triode was conducting, the trigger was off. See IBM 604 CE Manual, page 54. ↩
The 604 manual includes a schematic of the TR-3 module (and other modules). Inconveniently, I didn't find this schematic until I had reverse-engineered the module; I made minor adjustments to my schematic based on this. This schematic is a bit tricky to interpret. All the resistances are in thousands of ohms (e.g. 1 is 1 KΩ), and capacitances are in "micromicrofarads" (i.e. pF). The circled numbers indicate pins of the module, while numbers in square boxes indicate voltages according to an obscure standard: 2 is +150V, and 5 is -100 V. "2-110" and "300638" are IBM part numbers. "6J" indicates that the tube is in the 6J family, where 6 indicates the heater voltage and J indicates a triode. The 604 documentation used cryptic boxes as symbols for the modules, with the arrows indicating the inputs and outputs; note that output 7 is at a lower position than output 8, indicating a lower voltage level.

Schematic from the CE Manual of Instruction, page 260.

↩
The two outputs from the trigger module are at different voltage levels. The idea is that a circuit could use either output, depending on which voltage level was more convenient. The asymmetrical outputs caused me great trouble, however, since I wanted to attach neon bulb indicators to show the state of both outputs. I had to carefully adjust the voltages so that the bulbs had enough voltage in the "on" state to turn on, but also a sufficiently low voltage in the "off" state to turn off. (When a neon bulb turns on, the neon gas ionizes, so it requires a significantly lower voltage to turn the bulb off.)

The IBM 604 used neon bulbs to show the state of various circuits, both in the front panel and internally. However, unlike me, IBM used a single bulb for each trigger, either on or off, so the inconsistent voltage levels didn't cause problems. ↩
The 604 used 12 different types of trigger modules, from TR-1 through TR-42. The different trigger circuits were similar, but had different component values to tune the characteristics, as well as different resistors for the output levels. A few types used resistive inputs instead of capacitively coupled inputs.

The trigger modules were used in a variety of different ways. Briefly removing the negative bias from one side would turn that side on; this was used for reset circuits. Second, a plate could be pulled low, turning off the tube on the other side. Third, the input could be connected directly to the input, rather than going through a capacitor, with a negative voltage turning the triode off and a positive voltage turning the triode on. Other triggers used a capacitor between the plate and grid on each side to filter out noise and contact bounce. Some triggers used capacitor inputs (as in the module I described), but fed the same negative input pulse to both sides. The pulse is ignored by the triode that is on, but flips the triode that is off. The result is that the trigger switches state on each pulse—analogous to a toggle flip-flop—and divides the input pulses by two. ↩

Examining circuit boards from the Space Shuttle's I/O Processor

The Space Shuttle's five1 general-purpose computers played a critical role in each flight: controlling the engines, monitoring thousands of sensors, displaying data to the astronauts, and navigating the Shuttle. Each computer consisted of two 60-pound aluminum-alloy boxes: the box on the right is the CPU, a 32-bit processor that executed 420,000 instructions per second. These computers were designed before microprocessors became popular, so the processor was built from multiple boards crammed with simple chips and they used magnetic core memory rather than DRAM chips.

The Space Shuttle IOP and CPU (AP-101B). Photo courtesy of RR Auction.

The box on the left is the I/O Processor (IOP): the link between the CPU and the rest of the Shuttle. It implemented the input/output capabilities for the computer, primarily 24 high-speed networks that connected the computer to the Shuttle's systems and sensors. But the IOP wasn't just a peripheral; it was a separate programmable computer, more complicated than the main CPU. The IOP had an unusual architecture: it was one of the first multi-threaded computers, implementing 25 virtual processors (with two completely different instruction sets) that ran on one physical processor.

I obtained two circuit cards from the I/O Processor,2 each a 9"×3" rectangle packed with tiny chips and other components. In IBM lingo, each card is called a "page" (remember this term). The top page is a network interface, providing four network connections, each handling 1 million bits per second. (The IOP contained six of these cards for its 24 network connections.) The bottom page held the microcode for the IOP's processors, the low-level code that defined each instruction. The rows of white-and-gold chips stored the microcode's bits in tiny metal fuses, programmed by blowing a fuse for each 1 bit. In this article, I'll explain how the I/O Processor worked, and the roles of these two pages.

Two pages from the Space Shuttle I/O Processor: the "MIA" interface page and the PROM page.

The MIA interface page

The Space Shuttle had 28 data bus networks that linked the computers to the rest of the Shuttle, with each computer attached to 24 of the networks.3 The large number of networks provided both high performance and reliability, with at least two networks between a computer and any Shuttle system. Eight networks were assigned to flight-critical systems, with each CRT display and engine controller connected to four networks for redundancy.

The page below is one of the six network interface pages in the I/O Processor. Space Shuttle engineers loved acronyms, so this page has the cryptic name MIA for "Multiplexer Interface Adapter". (Many of the networks were connected to boxes called Multiplexer/Demultiplexers, which provided the link between the network and the diverse analog and digital components of the Space Shuttle.5) The MIA interface page is tightly packed with integrated circuits and other components. The page holds two printed-circuit boards, one on each side of the page. The boards on both sides are almost identical,4 as you can see by comparing the photo above and the photo below. (Main difference: the connector switches sides.)

The network interface page, called the MIA (Multiplex Interface Adapter). The page has extensive rework; thin brown "bodge" wires snake around the page to repair errors or implement updates.

Each board implements two network interfaces, so the page supports four networks. Each network transmits data across a pair of wires, twisted together and shielded, rather than a coaxial cable. Although the network transmits digital data, the signals transmitted across the network are physical voltages that will weaken with distance and will have distortion and noise. Thus, the interface page must convert these analog signals back to 0's and 1's.

The right half of the board holds the analog circuitry. It is dominated by a large golden module labeled "IBM", with 46 pins. This is a hybrid module, consisting of tiny components such as transistor dies, resistors, capacitors, and potentially IC dies, connected by bond wires thinner than a hair. It's not quite an integrated circuit, but a collection of individual components mounted on a ceramic wafer. Hybrid modules were popular for aerospace applications, since a board of analog components could be shrunk down to a single (expensive) module. This module contains the analog circuitry for two I/O ports: the drivers to transmit network signals along with the amplifiers and comparators to receive signals.

Various discrete components are mounted next to the hybrid module: resistors, glass capacitors6, inductors, and small square transformers. The transformers provide the coupling between the interface board and the network. As with Ethernet, transformers provide isolation between the computer and the network, filter electromagnetic interference, and match impedances, all important for reliability.7

The Manchester Mark 1; Prof. Williams is second from the left. Photo from the University of Manchester.

A key part of the Shuttle's networking dates back to the 1940s. In 1946, Frederic Williams became head of the Electrical Engineering department at the University of Manchester. By 1949, his team had created the groundbreaking Manchester Mark 1 computer. Along the way, they invented the stored-program computer, the Williams tube—the best form of computer memory before magnetic core—and the Manchester Carry Chain, still used for addition in modern processors.

But the relevant invention is the patented Manchester encoding, a way of encoding a sequence of 0's and 1's for storage or transmission. In the Manchester encoding, each 0 bit is replaced by a "low-high" sequence and each 1 bit is replaced by a "high-low" sequence, as shown below. This idea may seem trivial, but it is used in everything from floppy disks and remote controls to Ethernet and RFID tags, earning it recognition as an IEEE Milestone.

A diagram illustrating Manchester encoding. From Prototype IOP Functional Description, p82.

The obvious approach—sending binary data unencoded—has two problems. First, in a long string of 0's or 1's, it is hard to tell how many bits were sent: "Was that six bits or only five?" Second, such a sequence is unbalanced, so it has a "DC component". This DC component causes problems if the signal is stored on a magnetic medium or transmitted through a transformer. The Manchester encoding solves both these problems. Since every encoded bit has a transition in the middle, it is straightforward to separate the bits. Moreover, the encoding ensures that 0's and 1's occur in equal numbers, so there is no DC component.

Because of these advantages, the Manchester encoding was selected for the data bus networks in the Space Shuttle.8 One of the key functions9 of the IOP's network interfaces is to convert between serial bits and the Manchester encoding. The digital circuitry for the interface is fairly complicated, but most of the logic is in the four large golden integrated circuits. These are custom Motorola integrated circuits: a transmit chip and a receive chip for each network port. On the transmit side, the chip converts binary data into the Manchester-encoded signals for the network. The circuitry also inserts a sync signal at the beginning of each word and adds parity. The receive chip reverses this process: detecting sync, decoding the Manchester signals, verifying the parity, and reporting any errors.

The smaller black chips are simple TTL chips, mostly shift registers. (Transistor-Transistor Logic was very popular in the 1970s, providing fast, reliable circuits.) There are twelve 4-bit shift register chips and sixteen 8-bit shift registers.10 The Shuttle's networks sent 24-bit words across the network: combining six 4-bit shift register chips produces a 24-bit shift register, which converted these 24-bit words to serial data and vice versa. The remaining chips are simple logic gates, flip-flops, buffers, and four-bit counters.

The physical structure of a page

Around 1967, IBM introduced a line of computers for avionics, called System/4 Pi.11 These systems were constructed from pages:12 two circuit boards sandwiching a metal layer that provided conduction cooling. Flat-pack integrated circuits, smaller than a fingernail, were mounted in rows13 on each circuit board, about 78 ICs on a board. The printed-circuit boards were advanced for the time, with six layers of wiring. Two jack screws at the top tightly secured the page into the system. Two 98-pin connectors connected the page to the backplane. The photo below shows a typical 4 Pi page (top), with its rows of chips.

A comparison of a standard IBM 4 Pi page with the IOP page. 4 Pi page courtesy of Eric Schlaepfer. The 4 Pi page was in a bag labeled "FSD AWACS tester?" suggesting that it was a tester from IBM's Federal Systems Division for the E-3C Airborne Warning and Control System aircraft, which used an IBM 4 Pi computer.

An I/O processor page (above, bottom) is almost identical to a standard 4 Pi page except that it is one inch wider (9" instead of 8"), and has a 120-pin connector or two instead of 98-pin connectors.14 One inch may not seem like much, but a 9-inch page fits 100 ICs rather than 78, a significant increase. I'm surprised that IBM changed from the standard size, but I suspect that the designers couldn't fit the IOP into the available space with standard pages, forcing the change. Likewise, the multiple I/O ports may have required more connections than the smaller connectors could support.

A page has circuit boards on either side, separated by a metal plate. To allow signals to flow between the boards, a special connector is attached to the top of the page to link the two boards. This connector not only provides feed-through connections between the boards, but also provides test points, so signals can be probed while the boards are mounted in the case. The photo below shows a close-up of the feed-through connector. It has three rows of test points. The first row (red) is connected to the top board. The middle row (orange) is connected to both boards and provides the feed-throughs. The bottom row (blue) is connected to the bottom board. The upper arrows show where the connector is soldered to the board.

The test point connector on the MIA page.

The diagram below shows the construction of the I/O Processor, with rows of pages plugged into the backplane.15 Note the 128-pin MIA I/O connector on the front of the IOP; this connects the 24 data buses (along with other signals) to other parts of the Shuttle. The arrows show how cooling air flowed through the sides of the IOP. The air did not flow over the pages. Instead, heat was transmitted by conduction through the metal plate inside each page, flowing to heat exchangers in the sides of the case. The CPU and the IOP both contained magnetic core memory (labeled "Storage Page" below); even though the memory is split between the boxes, it is treated as a unified shared memory, so programs for the CPU and the IOP can reside in memory in either physical box.

Exploded view of the IOP. From Prototype IOP Functional Description.

The IOP's architecture and the PROM page

The high-performance design of the I/O Processor was developed by Peter Kogge, an expert in parallel processing architectures. At the time, he was working at IBM's Federal Systems Division, where the Space Shuttle computer was developed.24 Kogge, now a professor at the University of Notre Dame, is also known for the Kogge-Stone adder, a fast circuit used in processors such as the Pentium. The I/O Processor has a very unusual architecture: although it had one physical processor, it ran 25 virtual processors with two completely different instruction sets. The virtual processors took turns, running for just one clock cycle and then letting the next processor run. The motivation behind this was to ensure that each network port got a predictable and guaranteed portion of the processor, so even if one network port was overloaded, it wouldn't affect the others. This approach, called a barrel processor16, was first used in the CDC 6600 supercomputer, the world's fastest computer from 1964 to 1969.

The I/O Processor has two types of (virtual) processors, which of course have cryptic acronyms: BCE and MSC. Each of the 24 network ports has a BCE, a Bus Control Element, which runs a small program to move data words between the network port and memory. An MSC (Master Sequence Controller) is the executive, running programs to manage the BCEs. The BCE and MSC processors run code that is stored in the computer's core memory. The instruction sets of the MSC and the BCE are completely different from each other and from the instruction set of the main CPU (which is derived from IBM's System/360 mainframes). The (executive) MSC is a 32-bit processor with the standard instructions of a normal processor—addition, logic, branches, and so forth—as well as specialized operations to configure and start BCEs.17 The instruction set of a low-level BCE is much smaller and much stranger, lacking all the basic instructions such as arithmetic and conditional branches. the instructions you'd expect from a processor. Instead, a BCE has I/O instructions such as Transmit Data, Receive Data, Load Timeout Register, Store Status, and Wait. In typical use, the CPU directs the MSC to run a program, the MSC configures the BCEs to execute a program, and the BCEs send and receive data as specified. When the BSE's operation is done, the MSC interrupts the CPU, which processes the data. Thus, the CPU can focus on the high-level algorithms without wasting cycles on network operations.

How do the MSC and BCE processors all run on one physical processor, when they have completely different instruction sets? The trick is microcode: each MSC and BCE instruction was implemented in microcode, through a sequence of 72-bit micro-instructions.18 A simple instruction might take five micro-instructions, while a complex instruction might require 60 micro-instructions. Each micro-instruction directed the action of the IOP's physical processor for one step of the MSC or BCE instruction. After each micro-instruction, the physical processor switched to the micro-instruction for the next virtual processor. The architecture of the physical processor was completely different from the MSC or the BCE: three 16-bit data paths and two ALUs (Arithmetic/Logic Units) that can operate in parallel. The physical processor had a separate register set, including a micro-instruction address register, for each virtual processor, to keep track of the state of each virtual processor.

The PROM page holds the majority of the microcode for the I/O Processor. Although three chips are mounted sideways to avoid wasting space, there is even more wasted space at the left.

The IOP's micro-instructions were stored in the PROM page above. In the photo above, the white chips with gold lids are fusible-link PROM (Programmable Read-Only Memory) chips.19 These unusual chips contain a tiny fuse for each bit. If the fuse is intact, the corresponding bit is a 0, while a burnt-out fuse represents a 1 bit. The chip is programmed by applying 17-volt pulses to destroy fuses one by one, literally burning the PROM. (I discussed fusible PROM chips earlier.)

Each PROM chip holds 512 words of 4 bits, so in total, this page held 1024 72-bit micro-instructions; the remaining 512 micro-instructions were in another page.20 The chips are hand-labeled with numbers, since each chip has unique programming and must be installed in the correct location. With 36 chips, you'd expect the chips to be numbered from 1 to 36. Curiously, although many of the chips are sequentially numbered, others have numbers ranging from 55 to 74 in no obvious pattern.21

Physically, the PROM page is unusual in several ways. Instead of flat-pack integrated circuits, it uses DIP (Dual-Inline Package) ICs, larger integrated circuits with two rows of vertical pins that go through the circuit board. Since this page only has one circuit board, it doesn't have the test-point feed-throughs at the top. It still has the central metal plate, but the integrated circuits sit on top of the metal plate, while the circuit board is underneath—the plate has gaps for the pins. Between the rows of chips, the central plate is the full thickness of the board.

A close-up of the PROM page, showing how the chips are mounted. The black chips are much thicker than the white chips.

Presumably, the fusible-link PROM chips were only available in DIP packages, rather than flat-packs. These DIP packages take up much more space than the regular flat-pack integrated circuits; this page has about a quarter the density of a regular page.22

Conclusions

The Space Shuttle's CPU and IOP were advanced when they were designed, but they rapidly became obsolete. IBM redesigned the computer, combining both the CPU and IOP into a single box called the AP-101S, which first flew in 1991 (details). The improved computer was much faster and had more memory. Moreover, combining two boxes into one saved about 300 pounds in total. The photo below shows three of the updated AP-101S computers mounted in the Shuttle's avionics bays. (The wall hides the fourth computer, and the fifth is behind the camera.) These same positions are where the I/O Processors were mounted previously, with the CPUs installed in the empty spaces to the left.

Avionics bays 1 and 2 are located in the crew cabin middeck, below the flight deck, and looking forward into the nose. The red arrows indicate the AP-101S computers. The remaining computer is in avionics bay 3A, on the aft right side of the middeck. This photo is from 2011, showing Discovery being prepared for display at the Smithsonian. Original photo courtesy of collectSpace; I've adjusted the lighting.

Despite the critical role of the I/O Processor in the Space Shuttle, it doesn't get the attention given to the CPU. For instance, although NASA documents describe the architecture of the IOP in detail, I couldn't find any photos of its pages.23 I hope that this article has convinced you that the architecture and the physical construction of the IOP make it an interesting system.

For updates, follow me on Bluesky (@righto.com), Mastodon (@[email protected]), or RSS. Thanks to Richard for supplying the boards. Thanks to Mike Stewart for documents on the IOP. Thanks to Robert Pearlman of collectSPACE, and RR Auction for photos.

AI statement: I didn't use AI to write this article; the em-dashes are natural (details).

Notes and references

On some flights, a sixth computer was carried in a locker as a spare, providing an additional degree of reliability. If one of the five computers failed, the astronauts could connect the cables to the spare computer and it could take over for the failed one. The spare was put into use on flight STS-30 (1989) after computer #4 encountered a "data parity external storage error", indicating a hardware problem. ↩
I suspected that these pages were from the I/O Processor, but it was difficult to prove this. Fortunately, Mike Stewart found a document, the Prototype Input/Output Processor Function Description, that lists the pages in each IOP slot. The MIA page has a part number on it: 6246523-3, and the PROM page has 6104848-3; these match "MIA" 6246523-1 and "Micro Store (ROM)" 6104848-1 in the document. ↩
The diagram below shows how the 28 data bus networks connect the five computers at the top and various parts of the Shuttle. The networks are categorized as ground interface, mission critical, flight instrumentation, display system, mass memory, intercomputer, and flight critical.

Data bus architecture. Click for a larger version. Adapted from Space Shuttle Avionics Systems.

Why was each computer connected to 24 networks and not all 28? Each Space Shuttle computer was connected to almost all the networks, so they could run in lockstep for reliability. The exception was that each computer sent its own monitoring data to the ground station. Since this data was of no importance to the other computers, it was sent over a private network called Flight Instrumentation to the PCM (Pulse Code Modulation) box, which encoded the data for transmission to the ground. There were 23 shared networks and 5 private networks (one for each computer), so there were 28 networks in total, with 24 networks connected to a particular computer. ↩
Both sides of the interface page are almost identical. However, the connector is on the left or the right side, depending on which side of the page you examine. This forced the decoupling capacitors at the very bottom to move to accommodate the connector. I also found a single integrated circuit that was different between the two sides, for some reason. ↩
While many of the data bus networks are connected to a Multiplexer/Demultiplexer (MDM), this is not always the case. Networks were also connected directly to systems such as an Engine Interface Unit or a Display Electronics Unit. Moreover, the MDM was not necessarily the final step between the network and the Shuttle's sensors. The MDM held cards to support over a dozen types of input and output signals: digital, analog, on/off (discrete), and serial. However, the thousands of signals in the Shuttle were much more diverse; sensors can provide AC signals, pulses, thermocouple values, resistances, and so forth. Other boxes converted the raw sensor signals into forms that the MDM could handle; these boxes were called Dedicated Signal Conditioners (DSC). A DSC had 15 or 30 slots to hold cards to perform the necessary signal conversion. Thus, the MDMs and DSCs combined a fixed architecture with the ability to be customized for each role. ↩
The glass capacitor is an interesting component, with an extremely thin layer of glass as the dielectric. Glass capacitors became popular in the 1960s for aerospace applications because of their stability and reliability (more). These capacitors were manufactured by Corning Glass Works, as indicated by the "CGW" label on the package.

Two glass capacitors on the MIA page.

The capacitor is labeled with a military code. "J" indicates the Joint Army/Navy specification. "CY" indicates a glass capacitor, "4" apparently indicates axial leads, "G" indicates the temperature/voltage, "510" is the value (51×10⁰ = 51 pF), and "G" indicates ±2% tolerance. (I don't know why one capacitor has "0F" and the other has "4G".) ↩
The Space Shuttle had a second layer of transformers between the computer and the network, ensuring a faulty device didn't bring down the network. Each device (such as the IOP) was connected to the network through a tiny device called the Data Bus Coupler. This one-inch cube contains a transformer and a few resistors to match impedance. The coupler acts as a network tap, providing a short stub from the network to a device. The coupler also provides line termination if the device is removed, ensuring signal integrity. ↩
The Space Shuttle's network is very similar to the U.S. military's serial network standard MIL-STD-1553. The 1553B standard is widely used in numerous military aircraft, missiles, tanks, navy systems, the Airbus A350 commercial plane, and the James Webb Space Telescope. However, since the Space Shuttle's network and the 1553 standard were both under development in the early 1970s, the two networks are not the same. The main differences are that the Shuttle uses 24-bit words instead of 16, and has 5.5µs gap between words (details). ↩
The functions of the MIA are described as:
- Transmit and receive data
- DC isolation
- Parallel/serial conversion
- Serial/parallel conversion
- Sync generation and detection
- Manchester encode and decode
- Parity generation and detection
- Bit count detection
- Provide status to BCE.
The functional block diagram below shows the circuitry for one port of the network interface. This circuitry is replicated twice on each board; with a board on each side of the page, the page supports four networks. The dashed Transmitting and Receiving boxes correspond, I think, to the large Motorola chips, except that the "TX" and "RX" amplifiers are in the IBM hybrid module and the transformers are discrete components.

Functional block diagram of the MIA. From Prototype IOP Functional Description, p82. Click for a larger image.

↩
The 4-bit shift register chips are 54LS395 chips. These chips have "tri-state" outputs, allowing them to be connected to a bus. These chips probably provide the interface between the board and the rest of the IOP; the twelve chips on a board would support a 24-bit register for each port, as expected. The 8-bit shift register chips are 54LS1964 shift registers.

I can't figure out why there are so many 8-bit shift register chips; perhaps they act as buffers. My speculation... The Prototype IOP Functional Description states that the IOP has six 28-bit 4-word registers between the 24-bit MIA shift registers and the rest of the IOP. Could the 8-bit shift register chips form these registers, even though shifting is not necessary? The document doesn't make it clear if these registers are on the MIA page or a different page. The shift-register chips provide 256 bits of storage per page, while the register file needs 112 bits, so there are way more bits than required. Moreover, the document says that the registers are structured as 7-4&4 register files for each set of four MIAs, which sounds more like 54LS170 register file chips (for instance) than shift-register chips. Possibly, the design was modified from the Prototype Functional Description, and the 8-bit shift registers provide additional buffering. ↩
The 4 Pi name is a geometry joke based on IBM's wildly popular series of mainframes, the System/360. System/360 revolutionized the computer industry with the concept of one family of computers for all applications: business and scientific. The name symbolized that System/360 covered the full 360º of applications. The 4 Pi name extended the idea of a circle to the 3-dimensional world: 4π is the number of steradians making up a full sphere. As IBM put it, "System/4 Pi also fills a sphere—the full spectrum of military computer needs—for airborne, space, or shipboard use." ↩
The earliest 4 Pi systems (the TC line) used a different style of page, but the following computers used the standard 4 Pi pages, including the Space Shuttle's AP-101B computer. However, IBM moved to much larger pages, starting with the next computer, the AP-101C in the B-1 bomber. The Space Shuttle's upgraded computer, the AP-101S, used these larger pages. For details, see my article on 4 Pi computer history. ↩
The photo below shows how the flat-pack integrated circuits are mounted on the circuit board. 16 pads are allocated to each integrated circuit; 14-pin integrated circuits "waste" two pads, while larger integrated circuits break the regular pattern. Each pad is connected to a via, a plated hole through the circuit board. These vias provide connections to wiring traces on a different layer of the circuit board; some of these traces are visible in the photo. Vias also hold the leads of through-hole components. The circuit cards in IBM System/360 mainframes used a very similar style of printed-circuit board, with a regular grid of vias. This style of board is very different from the circuit boards used in most other systems, which only had holes where necessary and routed traces less regularly. IBM's style presumably made hole drilling more efficient and was easier for automatic routing, but required thin, precise traces and multi-layer circuit boards, which were not common at the time.

IBM's technology was highly advanced compared to consumer electronics. IBM was using six-layer printed-circuit boards and surface-mount components in the 1960s, but Apple, for instance, didn't switch to surface-mount components until two decades later. Specifically, the Apple IIGS (1986) extensively used surface-mount components, but the Macintosh SE (1987) still used entirely through-hole components a year later.

A close-up of the IOP's PROM board.

The photo also illustrates how some integrated circuits are labeled with Specification Control Drawing (SCD) numbers (6088731-1) while others are labeled with standard part numbers (SN54LS151). This SCD number corresponds to a standard 54S10 NAND gate. The chips both have 1974 date codes (74xx), not to be confused with 7400-series part numbers.

The photo below shows three different types of flat-pack ICs. The first type is most common, with leads extending from the top and bottom sides, similar to a modern surface-mount integrated circuit. The second package has a golden case. It is much smaller and thinner, with leads extending from all four sides. The third package also has leads from four sides, but is somewhat larger.

Three types of surface-mount packages.

↩
The change in page size for the IOP is documented in Prototype IOC Functional Description, which says: "Standard 4 Pi Page Extended by Width Change from 8 to 9 inches, New Standard 120 Pin Connector".

The photo below compares the 98-pin connector on a standard IBM 4 Pi page (top) with the 120-pin connector on the IOP page (bottom). The 120-pin has a narrower pin spacing (0.05") than the 98-pin connector (0.06"), allowing more pins in the same width. However, the 120-pin connector has more spacing between the rows of pins (0.150" vs. 0.100").

The connectors on a standard IBM 4 Pi page (top) and the IOP page (bottom). The 4 Pi page is courtesy of Eric Schlaepfer. The slight waviness is just due to bent pins.

Also note that both connectors have a peg on one side and a hollow cylinder on the other. These are used for keying, to make sure that a page cannot be plugged into the wrong slot. Each page type has a different combination; with a double connector, there are 16 possible combinations. ↩
The exploded view shows seven MIA (interface) pages. This doesn't make sense since there are six MIA pages for the 24 network connections, as the same document lists (in Table 4-1). That table also shows one more page in total than on the exploded view. My guess is that the system was still being changed when the document was written (some entries in the table are marked TBD), resulting in inconsistencies. ↩
The virtual MSC and BCE processors take turns executing on the IOP's physical processor. A 16.5 µs time interval is split into 33 slices: each BCE gets one time slice, the MSC gets 8 time slices, and one slice is used for BCE self-tests. Thus, the MSC gets much more execution time than a low-level BCE.

The I/O Processor's slot timer or "wheel". Adapted from Space Shuttle Systems Handbook, 8.3.

Each BCE and the MSC has its own register set (called local store), so the right registers are available for each slot. The physical processor is pipelined, so there are actually four slots active at any time. ↩
For details on the instruction sets of the MSC and BSE processors, see Prototype IOP Functional Description, chapter 2. ↩
The IOP used a micro-instruction that was 72 bits wide. A micro-instruction controlled the physical processor by specifying the data sources, data destinations, the ALU operations, and conditional branch actions. The table below shows the structure of the micro-instruction in detail. Note that a micro-instruction controls each component of the processor separately at a low level, so it is very different from a machine instruction. A micro-instruction also provides a degree of parallelism, since it specifies three operations for each step (ALU 1 operation, ALU 2 operation, and a conditional action).

Format of a 72-bit IOP micro-instruction. From Prototype IOP Functional Description.

↩
The PROM chips are Intersil IM5624C parts. These are similar to the Signetics 82S131 and Intel 3622 parts. The front side of the page also contains nine chips labeled "D1-6605-2", probably manufactured by Harris; perhaps these are buffers. ↩
The Prototype Input/Output Processor Function Description lists two pages associated with microcode: "Micro Store (ROM)" (the page that I examined), and "Micro Store Page". I assume that the second page held the 512 words that didn't fit on the first page, along with the circuitry for the microcode control logic and registers. ↩
Why are the numbers on the PROM chips semi-ordered but also somewhat random? My hypothesis is that the original chips were numbered 1 through 36 in sequence, but when chips needed to be replaced for software patches, each new chip received the next number in sequence, up to 74. ↩
With flat-pack ICs, an IOP board can hold up to 20 ICs per row, so 100 ICS on a board and 200 ICs on a double-sided page. With the larger DIP packages, the PROM page holds just 45 ICs. Since DIPs are taller (thicker), the page has only a single board. This shows the large density advantage of flat-pack ICs over DIP ICs.

The density of this page is slightly better because there are a few (15) flat-pack ICs mounted on the back of the PROM board (below). The flat-pack ICs had to be mounted between the rows of DIPs to avoid the pins of the DIP ICs. Because DIPs use through-hole mounting, their pins exit the back side of the board. The large two-pin packages above and below are decoupling capacitors, filtering the power to the ICs.

Back of the PROM page.

The back side of the board also shows that the printed-circuit board is an inch smaller than the space available; note the gap on the right. Perhaps the circuit board was designed for a standard 8-inch 4 Pi page, but then mounted on the IOP's special 9-inch page. ↩
The NASA Office of Logic Design web page has a photo of a Space Shuttle board that might be from the IOP, but its source is unknown (I asked). This board is puzzling because it has the same unusual 9" form factor as the IOP pages, but it also has many differences, so it probably came from a different Shuttle system.

A Space Shuttle board. Note the broken connector; the plastic on these vintage Burndy connections is very often broken. From Space Shuttle Computers and Avionics.

The board is a dual MIA interface; it is labeled "ADPTR. INTFC. DUAL MUX", part number "A538A762-02". This part number does not appear in the IOP documentation, and has a different format from IOP part numbers. The circuitry on the board is very similar to the IOP's interface board, with hybrid modules, transformers, and analog components. Physically, the board has the same dimensions, mounting hardware, and 120-pin connector as the IOP boards. However, the board doesn't have the test point connector at the top and the ICs are arranged haphazardly, instead of in uniform rows, so it doesn't look like it was manufactured by IBM. Moreover, the number of ICs is much smaller. On the other hand, it uses the same 54LS395 4-bit shift register chips (labeled 6088913). I would think that this was a prototype board for the IOP's board, except both boards are from 1976, based on the component dates.

My current hypothesis is that this board was the MIA network interface in a different Space Shuttle component, probably the MDM (Multiplexer/Demultiplexer); the MDM contained a "Serial MIA" board built by Singer-Kearfott. Note that the board has Singer hybrid modules; since Singer-Kearfott invented the MIA network, it makes sense that their modules would be on an interface board. Another possibility is that this board was part of the Shuttle's IMU (Inertial Measurement Unit), which was built by Singer-Kearfott. The IMU communicated with the MDM via a serial I/O line that was very similar to the MIA protocol, but had some differences.

Singer, by the way, is the same Singer that builds sewing machines. How did they end up making advanced components for the Space Shuttle? (Not to mention nuclear missile guidance systems.) In the 1960s, Singer diversified into defense and computers; in 1968, Singer acquired Kearfott, a defense company that built inertial navigation systems. The Singer-Kearfott SKC-2000 computer was considered for the Space Shuttle, but IBM's AP-101 was selected instead. Singer-Kearfott built the Inertial Measurement Units (IMUs) for the Space Shuttle. In 1987, Singer sold its Kearfott Guidance & Navigation division to the Astronautics Corporation. Kearfott still produces guidance and navigation systems, such as the inertial navigation system for the Global Hawk UAV and the Trident II submarine-launched ballistic missile. After a 1987 takeover and two bankruptcies, Singer is back to just sewing machines, now part of the SVP Worldwide sewing machine company. ↩
Bonus photo of Peter Kogge working on the I/O Processor:

↩

The adder at the heart of Intel's 8087 floating-point chip

In 1980, Intel released the Intel 8087 floating-point coprocessor, a chip that could make math up to 100 times faster. As well as arithmetic and square roots, the 8087 computed transcendental functions including tangent, exponentiation, and logarithms. But it all depended on a 69-bit adder: "The arithmetic heart of the floating-point execution unit is centered about a nanomachine comprised of the adder and its related registers, shifters and control circuitry," as the patent describes it. In this article, I explain the circuitry of this adder.

The photo below shows the 8087 die under a microscope. Around the edges of the die, hair-thin bond wires connect the chip to its 40 external pins. The complex patterns on the die are formed by its metal wiring, as well as the polysilicon and silicon underneath. At the top of the chip, the Bus Interface Unit connects to the rest of the system: coordinating with the main 8086 processor and memory. The chip's instructions are defined by the large microcode ROM in the middle.

Die of the Intel 8087 floating-point unit chip, with relevant functional blocks labeled. The die is 5mm×6mm. Click for a larger image.

The bottom half of the die is the "datapath", the circuitry that performs calculations; it is split into the exponent datapath, which handles the exponent of a floating-point number, and the fraction datapath, which handles the fractional part (or significand). The adder (red) sits in the middle of the fraction datapath; to perform addition on the exponent, the exponent must be copied over to the fraction datapath.

Structure of the adder

Building a binary adder is easy; the hard part is making it fast. The key problem is how to handle the carries from a bit position to the next. Each carry potentially depends on all the lower carries, but you don't want to wait as a carry ripples through the logic for all 69 bits. (It's similar to doing 999999+1 with long addition: you need to carry the one, carry the one, ...)

The 8087's adder speeds up performance by breaking addition into 4-bit blocks, using two techniques to make computation inside each block fast. The carry needs to ripple from block to block, but this reduces the number of carry steps by a factor of four.

Simplified diagram of a four-bit block in the 8087's adder.

The diagram above shows the structure of one 4-bit block, with the carry generation circuits abstracted out for now. The adder takes two inputs: one (F) is from the chip's fraction bus, a bus that connects the components of the fraction datapath. The second input (B) comes from a register called the B register. Each bit of the sum is produced by XORing a F input, a B input, and the carry into that bit position.1 For reasons that will be explained below, the intermediate value (F XOR B) is called "propagate". The carry-out from each block is tied to the carry-in of the next block. But what happens inside the carry circuits?

In 1959, researchers at the University of Manchester developed a fast carry technique for a computer called Atlas. This technique, named the Manchester carry chain, computes the carry values by setting up switches in parallel and then letting the carry quickly propagate through the wires, controlled by the switches. Although the carry still needs to travel from bit to bit, it travels at the speed of a signal in a wire, not slowed by logic gates.2

The Manchester carry chain is built around the concepts of Generate, Propagate, and Delete (also known as Kill), which arise when adding two bits and a carry. If you add 1+1, a carry-out is generated, whether there is a carry-in or not. In contrast, if you add 0+0, there is no carry-out, regardless of the carry-in; any carry-in is deleted. The interesting case is if you add 0+1: a carry-out results only if there is a carry-in; that is, the carry-in is propagated to the carry-out. In logic terms, the generate signal is the AND of the two input bits, the delete signal is the NOR, and the propagate signal is the XOR. The important thing is that these signals can be computed for all bit positions in parallel, in constant time.

The idea behind the Manchester carry chain. Note that the low bit is on the left, so the carry flows left to right.

The Manchester carry chain is constructed as above, with the switches at each bit set according to the Generate/Propagate/Delete values. Once the switches are set, the carry status quickly flows through the circuit, producing the carry value at each position without any logic delays. If the propagate switch is closed, the previous carry passes through. But if the generate or delete switch is closed, the carry is set or cleared, respectively. Once the carry values are available, the final sum can be computed in parallel with XORs.

The 8087 uses an optimized circuit for the Manchester carry chain, combining the Generate and Delete cases. One stage of the adder's carry chain is shown below. For the propagate case, the carry-in Cin passes through the top switch, propagated to the carry out Cout. For the generate and delete cases, the bottom switch is closed, passing the input bit F. The trick is that the generate case corresponds to 1+1, so F is 1, resulting in Cout getting set. The delete case corresponds to 0+0, so F is 0, and Cout is cleared. (Note that both inputs, F and B, are the same in these cases, so using F instead of B is arbitrary.)

One stage of the Manchester carry chain.

The middle of the diagram shows how the switches correspond to a multiplexer (mux) selecting the top signal Cin if prop is set, or the bottom signal F if prop is clear. The right side of the diagram shows the physical implementation with two NMOS transistors. These transistors function as switches (pass transistors), controlled by the prop signals on the gate.

The problem is that pass transistors aren't perfect switches, but lose a bit of voltage at each step. To fix this, the carry chain is broken into blocks of four bits (as shown earlier) and each block produces a "fresh" carry. This refresh is done by a "carry-skip" circuit, which can skip the carry processing inside the block. Specifically, the carry-skip mechanism checks if all positions inside the block are Propagate. In this case, the carry-out will have the same value as the carry-in (since the carry-in propagates through all the bit positions of the block). The carry-skip circuit detects this case and produces a carry-out signal matching the carry-in.

Putting this all together, the schematic below shows the adder circuitry for a typical block of four bits. The four multiplexers form the Manchester carry chain, while the NOR gate detects the carry-skip case.

Reverse-engineered schematic for a 4-bit block of the adder.

To optimize performance, there is a complication for electrical reasons.3 The 8087 uses NMOS transistors, which are much faster to pull a signal low than to pull a signal high. To improve performance, the carry lines are precharged to 5V at the start of an addition, and then the circuitry pulls the lines low if needed. In order to start in the no-carry state, the carry lines are all negated, so the initial 5V state corresponds to no carry, and the ground state corresponds to a carry.

The last multiplexer in the block has four inputs instead of two4. The third input pulls the (inverted) carry line low for the carry skip case.5 The fourth input is the precharge signal; it puts 5V on the carry line to precharge it. (A control circuit activates the precharge signal at the start of an addition cycle.) Note that this only precharges one of the carry lines; to precharge the rest, the propagate signal is forced high during precharge.

Reverse-engineered schematic for the propagate circuit. This shows an arbitrary bit n.

The circuit to generate the propagate signal (above) is conceptually the XOR of the two inputs, but there are (of course) complications. When the precharge signal is high, propagate is forced high, tying all the carry lines together so the precharge can propagate to all of them. The second feature is that the B inputs can be blocked by the forceZero signal, so the value 0 is added instead of the B value.

To summarize, the adder is divided into blocks of four bits. Each block uses a Manchester carry chain and a carry-skip circuit to optimize the performance. Even with these optimizations, though, the large number of blocks requires the 8087 to take two clock cycles to complete an addition.

The adder in silicon

The image below shows how the circuitry for a block of four bits appears on the die. These blocks are stacked vertically to create the complete adder as seen in the earlier die photo. In this image, the metal layer is visible as white lines, mostly obscuring the circuitry underneath. The 8087 has a single metal layer, which constrains the layout. Note that metal wiring is tightly packed, occupying almost the complete area. The thick vertical metal trace at the left is ground, while the thick metal trace at the right is power, supplying the adder circuitry. The horizontal traces provide wiring inside the adder block, as well as allowing the fraction bus to pass across the adder. The vertical lines on either side are control signals for the adder (precharge and forceZero) as well as connections to circuitry at the bottom of the chip.

A block of four bits in the adder.

The photo below shows the silicon and polysilicon circuitry underneath the metal layer. (To take this photo, I dissolved the metal layer with acid.) The thin lines are polysilicon wiring, while the pinkish areas that appear raised are doped silicon. A transistor is formed when polysilicon crosses doped silicon. The circuitry is complex and irregular, connected by the horizontal metal wires above. The white circuits are contacts between the silicon and the metal wiring, while the white squares are contacts between the polysilicon and metal. Roughly speaking, if you divide the circuitry above into quarters, each quarter adds one bit. The carry-skip circuitry is in the middle.

A block of four bits in the adder with the metal layer removed.

The left and the right sides of the image don't have any transistors, just polysilicon lines that pass under the vertical metal wiring. Many of these polysilicon lines are widened to reduce their resistance and thus tune performance. The silicon in these regions is "wasted", just providing a channel for the vertical wiring.

The size of the adder

Although the 8087 nominally has 64-bit values for the fraction (significand), the adder is slightly larger: it takes 69 bits as input and generates 70 output bits. One reason is that the 8087 uses three extra low-order bits for rounding, called Guard, Round, and Sticky. These bits ensure that a value is always rounded in the right direction. Handling of the rounding bits is fairly complicated, with multiple modes, but from the adder's perspective they are just three input bits.6

As will be explained below, the value from the B register can be doubled, requiring one more bit. Finally, the fraction bus and the B value can be negated. (This is used for subtraction, among other things.) A negative value is represented in two's complement, requiring one more bit. In total, the inputs to the adder are 69 bits wide.

When adding two large numbers, the result can require one additional bit. Thus, the output of the adder is 70 bits wide. The Sum Shifter (explained below) can shift the output two bits to the right, cutting the result down to 68 bits. This is still one bit larger than 64 bits with 3 rounding bits; the "extra" bit is supported by a few special-purpose registers, such as the tmpC register7 and the Skip Shifter.

The surrounding circuitry

The inputs and outputs of the adder are tied to some special registers and circuits. I'll leave a detailed explanation of this circuitry to another post, but I'll provide a brief description here.8 The adder has two inputs: one input is from the fraction bus and the other input is from the B register. The adder's output is stored in the Sum Register. To make multiplication faster, the 8087 uses radix-4 Booth multiplication, which multiplies by two bits at a time. The multiplier is stored in the Skip Shifter, a register that allows two bits to be shifted out at a time. Based on these bits, one of the values 2B, B, 0, or -B is added. (The -B path is also used for subtraction.) The adder's output is shifted right two bits by the Sum Shifter (not to be confused with the Skip Shifter) and stored in the Sum Register.

The adder and associated registers. Based on the patent.

Division is implemented by repeated subtraction, addition, and shifting. The bits of the result are accumulated in the quotient register. The implementation of square root is similar to the pencil-and-paper long square root, except in binary. The skip shifter provides two bits from the left, which are appended to the right side of the adder input. A subtract or add takes place, similar to division, and the square root is formed in the B register.

Multiplication, division, and square root require multiple steps to process all the bits. For performance, this looping is implemented in hardware, not in microcode. These instructions require a lot of microcode to prepare the arguments, handle exponents, handle special cases, and store the results, but the inner loop is hardware.

Conclusions

The 8087 patent expresses the importance of the adder: "Ultimately, all arithmetical operations are reduced at one point to a binary addition." Thus, the performance of the adder is vital to the performance of the 8087. There are faster ways to add, such as the Kogge-Stone adder in the Pentium, but these approaches require much more hardware, too much for the constrained transistor count of the 8087. The 8087 balanced complexity against performance, using the Manchester carry chain with a carry-skip adder.

I plan to write more about the 8087; for updates, follow me on Bluesky (@righto.com), Mastodon (@[email protected]), or RSS. Thanks to the members of the "Opcode Collective" for their hard work, especially Smartest Blob and Gloriouscow.

AI statement: I didn't use AI to write this article; the em-dashes are natural (details).

Notes and references

I hope it's clear how the XOR of the two input bits and the carry in each position produces the corresponding sum bit. It's similar to long addition with pencil-and-paper: in each column, you have the two digits that you're adding, along with the carry (0 or 1) from the column to the right. XOR—exclusive or—functions like one-bit addition but discarding the carry out. ↩
The Intel 386 processor also uses a Manchester carry chain, which I described here. ↩
The 8087 uses NMOS transistors, unlike modern CMOS processors that use both NMOS and PMOS transistors. An NMOS transistor is much better at pulling a signal low than pulling a signal high. Thus, a frequent NMOS trick is to precharge a line high and then pull it low with a transistor; this is considerably faster than precharging a line low and pulling it high. This often requires a signal to be inverted, if 0 is the desired default value. ↩
Strictly speaking, the 4-input carry-skip multiplexer isn't exactly a multiplexer since it is possible to have two inputs selected at the same time, such as propagate and skip. You might worry about a conflict if one selected input is 0 and the other selected input is 1. If the carry-skip input is selected, the carry from the carry chain will have the same value, since carry-skip is just an optimization. In the precharge case, both the Propagate and the +5V inputs are active; the Propagate inputs are rapidly pulled high, so again there is no conflict. ↩
The carry-skip circuit uses a 5-input NOR gate. Since the inputs are all inverted, this is logically equivalent to a 5-input AND gate, testing if the four propagate signals are high and the carry-in is high. It's faster, however, to use a NOR gate in NMOS logic because the transistors are in parallel. This is another example of how the low level (using NMOS transistors) affects the higher-level circuitry. ↩
Carry-skip is not used for the bottom three bits. The carry-in to the adder is controlled by bits in the microcode instruction; it can either be explicitly set or be set based on the B register sign to handle subtraction properly. ↩
The fraction datapath has three temporary registers that are almost identical but have different sizes. tmpA and tmpB hold 64 bits, but tmpC holds 68 bits (including three rounding bits and one high-order bit).

The tmpC register has circuitry for bit 63, but tmpA and tmpB do not.

You can see the extra tmpC bits on the die. The photo above shows the high-order bits for the three registers. For the most part, the registers are mirror images of each other. But looking at the yellow box, tmpC has a NAND gate for bit 68, which is missing from tmpB and tmpA. At the low end (not shown), tmpC has three bits for rounding that are missing from the other bits. ↩
The patent describes the arithmetic operations in some detail. See Section III (page 13). ↩

Energizing a vacuum-tube flip-flop module from a 1948 IBM system

How a vacuum tube works

Inverters and the trigger circuit

Conclusion

Notes and references

Examining circuit boards from the Space Shuttle's I/O Processor

The MIA interface page

The physical structure of a page

The IOP's architecture and the PROM page

Conclusions

Notes and references

The adder at the heart of Intel's 8087 floating-point chip

Structure of the adder

The adder in silicon

The size of the adder

The surrounding circuitry

Conclusions

Notes and references

Don't miss a post!