Showing posts with label electronics. Show all posts
Showing posts with label electronics. Show all posts

How flip-flops are implemented in the Intel 8086 processor

A key concept for a processor is the management of "state", information that persists over time. Much of a computer is built from logic gates, such as NAND or NOR gates, but logic gates have no notion of time. Processors also need a way to hold values, along with a mechanism to move from step to step in a controlled fashion. This is the role of "sequential logic", where the output depends on what happened before. Sequential logic usually operates off a clock signal,1 a sequence of regular pulses that controls the timing of the computer. (If you have a 3.2 GHz processor, for instance, that number is the clock frequency.)

A circuit called the flip-flop is a fundamental building block for sequential logic. A flip-flop can hold one bit of state, a "0" or a "1", changing its value when the clock changes. Flip-flops are a key part of processors, with multiple roles. Several flip-flops can be combined to form a register, holding a value. Flip-flops are also used to build "state machines", circuits that move from step to step in a controlled sequence. A flip-flops can also delay a signal, holding it from from one clock cycle to the next.

Intel introduced the groundbreaking 8086 microprocessor in 1978, starting the x86 architecture that is widely used today. In this blog post, I take a close look at the flip-flops in the 8086: what they do and how they are implemented. In particular, I will focus on the dynamic flip-flop, which holds its value using capacitance, much like DRAM.2 Many of these flip-flops use a somewhat unusual "enable" input, which allows the flip-flop to hold its value for multiple clock cycles.

The 8086 die under the microscope, with the main functional blocks.
I count 184 flip-flops with enable and 53 without enable.
Click this image (or any other) for a larger version.

The 8086 die under the microscope, with the main functional blocks. I count 184 flip-flops with enable and 53 without enable. Click this image (or any other) for a larger version.

The die photo above shows the silicon die of the 8086. In this image, I have removed the metal and polysilicon layers to show the silicon transistors underneath. The colored squares indicate the flip-flops: blue flip-flops have an enable input, while red lack enable. Flip-flops are used throughout the processor for a variety of roles. Around the edges, they hold the state for output pins. The control circuitry makes heavy use of flip-flops for various state machines, such as moving through the "T states" that control the bus cycle. The "loader" uses a state machine to start each instruction. The instruction register, along with some special-purpose registers (N, M, and X) are built with flip-flops. Other flip-flops track the instructions in the prefetch queue. The microcode engine uses flip-flops to hold the current microcode address as well as to latch the 21-bit output from the microcode ROM. The ALU (Arithmetic/Logic Unit) uses flip-flops to hold the status flags, temporary input values, and information on the operation.

The flip-flop circuit

In this section, I'll explain how the flip-flop circuits work, starting with a basic D flip-flop. The D flip-flop (below) takes a data input (D) and stores that value, 0 or 1. The output is labeled Q, while the inverted output is called Q (Q-bar). This flip-flop is "edge triggered", so the storage happens on the edge when the clock changes from low to high.4 Except at this transition, the input can change without affecting the output.

The symbol for a D flip-flop.

The symbol for a D flip-flop.

The 8086 implements most of its flip-flops dynamically, using pass transistor logic. That is, the capacitance of the wiring (in particular the transistor gate) holds the 0 or 1 state. The dynamic implementation is more compact than the typical static flip-flop implementation, so it is often used in processors. However, the charge on the capacitance will eventually leak away, just like DRAM (dynamic RAM). Thus, the clock must keep going or the values will be lost.3 This behavior is different from a typical flip-flop chip, which will hold its value until the next clock, whether that is a microsecond later or a day later.

The D flip-flop is built from two latch5 stages, each consisting of a pass transistor and an inverter.6 The first pass transistor passes the input value through while the clock is low. When the clock switches high, the first pass transistor turns off and isolates the inverter from the input, but the value persists due to the capacitance (blue arrow). Meanwhile, the second pass transistor switches on, passing the value from the first inverter through the second inverter to the output. Similarly, when the clock switches low, the second transistor switches off but the value is held by capacitance at the green arrow. (The circuit does not need an explicit capacitor; the wiring has enough capacitance to hold the value.) Thus, the output holds the value of the D input that was present at the moment when the clock switched from low to high. Any other changes to the D input do not affect the output.

Schematic of a D flip-flop built from pass transistor logic.

Schematic of a D flip-flop built from pass transistor logic.

The basic flip-flop can be modified by adding an "enable" input that enables or blocks the clock.7 When the enable input is high, the flip-flop records the D input on the clock edge as before, but when the enable input is low, the flip-flop holds its previous value. The enable input allows the flip-flop to hold its value for an arbitrarily long period of time.

The symbol for the D flip-flop with enable.

The symbol for the D flip-flop with enable.

The enable flip-flop is constructed from a D flip-flop by feeding the flip-flop's output back to the input as shown below. When the enable input is 0, the multiplexer selects the current Q output as the new flip-flop D input, so the flip-flop retains its previous value. But when the enable input is 1, the multiplexer selects the new D value. (You can think of the enable input as selecting "hold" versus "load".)

Block diagram of a flip-flop with an enable input.

Block diagram of a flip-flop with an enable input.

The multiplexer is implemented with two more pass transistors, as shown on the left below.8 When enable is low, the upper pass transistor switches on, passing the current Q output back to the input. When enable is high, the lower pass transistor switches on, passing the D input through to the flip-flop. The schematic below also shows how the inverted Q' output is provided by the first inverter. The circuit "cheats" a bit; since the inverted output bypasses the second transistor, this output can change before the clock edge.

Schematic of a flip-flop with an enable input.

Schematic of a flip-flop with an enable input.

The flip-flops often have a set or clear input, setting the flip-flop high or low. This input is typically connected to the processor's "reset" line, ensuring that the flip-flops are initialized to the proper state when the processor is started. The symbol below shows a flip-flop with a clear input.

The symbol for the D flip-flop with enable and clear inputs.

The symbol for the D flip-flop with enable and clear inputs.

To support the clear function, a NOR gate replaces the inverter as shown below (red). When the clear input is high, it forces the output from the NOR gate to be low. Note that the clear input is asynchronous, changing the Q output immediately. The inverted Q output, however, doesn't change until clk is high and the output cycles around. A similar modification implements a set input that forces the flip-flop high: a NOR gate replaces the first inverter.

This schematic shows the circuitry for the clear flip-flop.

This schematic shows the circuitry for the clear flip-flop.

Implementing a flip-flop in silicon

The diagram below shows two flip-flops as they appear on the die. The bright gray regions are doped silicon, the bottom layer of the chip The brown lines are polysilicon, a layer on top of the silicon. When polysilicon crosses doped silicon, a transistor is formed with a polysilicon gate. The black circles are vias (connections) to the metal layer. The metal layer on top provides wiring between the transistors. I removed the metal layer with acid to make the underlying circuitry visible. Faint purple lines remain on the die, showing where the metal wiring was.

Two flip-flops on the 8086 die.

Two flip-flops on the 8086 die.

Although the two flip-flops have the same circuitry, their layouts on the die are completely different. In the 8086, each transistor was carefully shaped and positioned to make the layout compact, so the layout depends on the surrounding logic and the connections. This is in contrast to modern standard-cell layout, which uses a standard layout for each block (logic gate, flip-flop, etc.) and puts the cells in orderly rows. (Intel moved to standard-cell wiring for much of the logic in the the 386 processor since it is much faster to create a standard-cell design than to perform manual layout.)

Conclusions

The flip-flop with enable input is a key part of the 8086, appearing throughout the processor. However, the enable input is a fairly obscure feature for a flip-flop component; most flip-flop chips have a clock input, but not an enable.9 Many FPGA and ASIC synthesis libraries, though, provide it, under the name "D flip-flop with enable" or "D flip-flop with clock enable".

I plan to write more on the 8086, so follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @[email protected] so you can follow me there too.

Notes and references

  1. Some early computers were asynchronous, such as von Neumann's IAS machine (1952) and its numerous descendants. In this machine, there was no centralized clock. Instead, a circuit such as an adder would send a pulse to the next circuit when it was done, triggering the next circuit in sequence. Thus, instruction execution would ripple through the computer. Although almost all later computers are synchronous, there is active research into asynchronous computing which is potentially faster and lower power. 

  2. I'm focusing on the dynamic flip-flops in this article, but I'll mention that the 8086 has a few latches built from cross-coupled NOR gates. Most 8086 registers use cross-coupled inverters (static memory cells) rather than flip-flops to hold bits. I explained the 8086 processor's registers in this article

  3. Dynamic circuitry is why the 8086 and many other processors have minimum clock speeds: if the clock is too slow, signals will fade away. For the 8086, the datasheet specifies a maximum clock period of 500 ns, corresponding to a minimum clock speed of 2 megahertz. The CMOS version of the Z80 processor, however, was designed so the clock could be slowed or even stopped. 

  4. Some flip-flops in the 8086 use the inverted clock, so they transition when the clock switches from high to low. Thus, there are two sets of transitions in the 8068 for each clock cycle. 

  5. The terminology gets confusing between flip-flops and latches, which sometimes refer to the same thing and sometimes different things. The term "latch" is often used for a flip-flop that operates on the clock level, not the clock edge. That is, when the clock input is high, the input passes through, and when the clock input is low, the value is retained. Confusingly, the clock for a latch is often called "enable". This is different from the enable input that I'm discussing, which is separate from the clock. 

  6. I asked an Intel chip engineer if they designed the circuitry in the 8086 era in terms of flip-flops. He said that they typically designed the circuitry in terms of the underlying pass transistors and gates, rather than using the flip-flop as a fundamental building block. 

  7. You might wonder why the clock and enable are separate inputs. Why couldn't you just AND them together so when enable is low, it will block the clock and the flip-flop won't transition? That mostly works, but three factors make it a bad idea. First, the idea of using a clock is so everything changes state at the same time. If you start putting gates in the clock path, the clock gets a bit delayed and shifts the timing. If the delay is too large, the input value might change before the flip-flop can latch it. Thus, putting gates in the clock path is frowned upon. The second factor is that combining the clock and enable signals risks race conditions. For instance, suppose that the enable input goes low and high while the clock remains high. If you AND the two signals together, this will yield a spurious clock edge, causing the flip-flop to latch its input a second time. Finally, if you block the clock for too long, a dynamic flip-flop will lose its value. (Note that the flip-flop circuit used in the 8086 will refresh its value on each clock even if the enable input is held low for a long period of time.) 

  8. A multiplexer can be implemented with logic gates. However, it is more compact to implement it with pass transistors. The pass transistor implementation takes four transistors (two fewer if the inverted enable signal is already available). A logic gate implementation would take about nine transistors: an AND-OR-INVERT gate, an inverter on the output, and an inverter for the enable signal. 

  9. The common 7474 is a typical TTL flip-flop that does not have an enable input. Chips with an enable are rarer, such as the 74F377. Strangely, one manufacturer of the 74HC377 shows the enable as affecting the output; I think they simply messed up the schematic in the datasheet since it contradicts the function table.

    Some examples of standard-cell libraries with enable flip-flops: Cypress SoC, Faraday standard cell library, Xilinx Unified Libraries, Infineon PSoC 4 Components, Intel's CHMOS-III cell library (probably used for the 386 processor), and Intel Quartus FPGA

Reverse-engineering the 8086 processor's address and data pin circuits

The Intel 8086 microprocessor (1978) started the x86 architecture that continues to this day. In this blog post, I'm focusing on a small part of the chip: the address and data pins that connect the chip to external memory and I/O devices. In many processors, this circuitry is straightforward, but it is complicated in the 8086 for two reasons. First, Intel decided to package the 8086 as a 40-pin DIP, which didn't provide enough pins for all the functionality. Instead, the 8086 multiplexes address, data, and status. In other words, a pin can have multiple roles, providing an address bit at one time and a data bit at another time.

The second complication is that the 8086 has a 20-bit address space (due to its infamous segment registers), while the data bus is 16 bits wide. As will be seen, the "extra" four address bits have more impact than you might expect. To summarize, 16 pins, called AD0-AD15, provide 16 bits of address and data. The four remaining address pins (A16-A19) are multiplexed for use as status pins, providing information about what the processor is doing for use by other parts of the system. You might expect that the 8086 would thus have two types of pin circuits, but it turns out that there are four distinct circuits, which I will discuss below.

The 8086 die under the microscope, with the main functional blocks and address pins labeled. Click this image (or any other) for a larger version.

The 8086 die under the microscope, with the main functional blocks and address pins labeled. Click this image (or any other) for a larger version.

The microscope image above shows the silicon die of the 8086. In this image, the metal layer on top of the chip is visible, while the silicon and polysilicon underneath are obscured. The square pads around the edge of the die are connected by tiny bond wires to the chip's 40 external pins. The 20 address pins are labeled: Pins AD0 through AD15 function as address and data pins. Pins A16 through A19 function as address pins and status pins.1 The circuitry that controls the pins is highlighted in red. Two internal busses are important for this discussion: the 20-bit AD bus (green) connects the AD pins to the rest of the CPU, while the 16-bit C bus (blue) communicates with the registers. These buses are connected through a circuit that can swap the byte order or shift the value. (The lines on the diagram are simplified; the real wiring twists and turns to fit the layout. Moreover, the C bus (blue) has its bits spread across the width of the register file.)

Segment addressing in the 8086

One goal of the 8086 design was to maintain backward compatibility with the earlier 8080 processor.2 This had a major impact on the 8086's memory design, resulting in the much-hated segment registers. The 8080 (like most of the 8-bit processors of the early 1970s) had a 16-bit address space, able to access 64K (65,536 bytes) of memory, which was plenty at the time. But due to the exponential growth in memory capacity described by Moore's Law, it was clear that the 8086 needed to support much more. Intel decided on a 1-megabyte address space, requiring 20 address bits. But Intel wanted to keep the 16-bit memory addresses used by the 8080.

The solution was to break memory into segments. Each segment was 64K long, so a 16-bit offset was sufficient to access memory in a segment. The segments were allocated in a 1-megabyte address space, with the result that you could access a megabyte of memory, but only in 64K chunks.3 Segment addresses were also 16 bits, but were shifted left by 4 bits (multiplied by 16) to support the 20-bit address space.

Thus, every memory access in the 8086 required a computation of the physical address. The diagram below illustrates this process: the logical address consists of the segment base address and the offset within the segment. The 16-bit segment register was shifted 4 bits and added to the 16-bit offset to yield the 20-bit physical memory address.

The segment register and the offset are added to create a 20-bit physical address.  From iAPX 86,88 User's Manual, page 2-13.

The segment register and the offset are added to create a 20-bit physical address. From iAPX 86,88 User's Manual, page 2-13.

This address computation was not performed by the regular ALU (Arithmetic/Logic Unit), but by a separate adder that was devoted to address computation. The address adder is visible in the upper-left corner of the die photo. I will discuss the address adder in more detail below.

The AD bus and the C Bus

The 8086 has multiple internal buses to move bits internally, but the relevant ones are the AD bus and the C bus. The AD bus is a 20-bit bus that connects the 20 address/data pins to the internal circuitry.4 A 16-bit bus called the C bus provides the connection between the AD bus, the address adder and some of the registers.5 The diagram below shows the connections. The AD bus can be connected to the 20 address pins through latches. The low 16 pins can also be used for data input, while the upper 4 pins can also be used for status output. The address adder performs the 16-bit addition necessary for segment arithmetic. Its output is shifted left by four bits (i.e. it has four 0 bits appended), producing the 20-bit result. The inputs to the adder are provided by registers, a constant ROM that holds small constants such as +1 or -2, or the C bus.

My reverse-engineered diagram showing how the AD bus and the C bus interact with the address pins.

My reverse-engineered diagram showing how the AD bus and the C bus interact with the address pins.

The shift/crossover circuit provides the interface between these two buses, handling the 20-bit to 16-bit conversion. The busses can be connected in three ways: direct, crossover, or shifted.6 The direct mode connects the 16 bits of the C bus to the lower 16 bits of the address/data pins. This is the standard mode for transferring data between the 8086's internal circuitry and the data pins. The crossover mode performs the same connection but swaps the bytes. This is typically used for unaligned memory accesses, where the low memory byte corresponds to the high register byte, or vice versa. The shifted mode shifts the 20-bit AD bus value four positions to the right. In this mode, the 16-bit output from the address adder goes to the 16-bit C bus. (The shift is necessary to counteract the 4-bit shift applied to the address adder's output.) Control circuitry selects the right operation for the shift/crossover circuit at the right time.7

Two of the registers are invisible to the programmer but play an important role in memory accesses. The IND (Indirect) register specifies the memory address; it holds the 16-bit memory offset in a segment. The OPR (Operand) register holds the data value.9 The IND and OPR registers are not accessed directly by the programmer; the microcode for a machine instruction moves the appropriate values to these registers prior to the write.

Overview of a write cycle

I hesitate to present a timing diagram, since I may scare off my readers, but the 8086's communication is designed around a four-step bus cycle. The diagram below shows simplified timing for a write cycle, when the 8086 writes to memory or an I/O device.8 The external bus activity is organized as four states, each one clock cycle long: T1, T2, T3, T4. These T states are very important since they control what happens on the bus. During T1, the 8086 outputs the address on the pins. During the T2, T3, and T4 states, the 8086 outputs the data word on the pins. The important part for this discussion is that the pins are multiplexed depending on the T-state: the pins provide the address during T1 and data during T2 through T4.

A typical write bus cycle consists of four T states. Based on The 8086 Family Users Manual, B-16.

A typical write bus cycle consists of four T states. Based on The 8086 Family Users Manual, B-16.

There two undocumented T states that are important to the bus cycle. The physical address is computed in the two clock cycles before T1 so the address will be available in T1. I give these "invisible" T states the names TS (start) and T0.

The address adder

The operation of the address adder is a bit tricky since the 16-bit adder must generate a 20-bit physical address. The adder has two 16-bit inputs: the B input is connected to the upper registers via the B bus, while the C input is connected to the C bus. The segment register value is transferred over the B bus to the adder during the second half of the TS state (that is, two clock cycles before the bus cycle becomes externally visible during T1). Meanwhile, the address offset is transferred over the C bus to the adder, but the adder's C input shifts the value four bits to the right, discarding the four low bits. (As will be explained later, the pin driver circuits latch these bits.) The adder's output is shifted left four bits and transferred to the AD bus during the second half of T0. This produces the upper 16 bits of the 20-bit physical memory address. This value is latched into the address output flip-flops at the start of T1, putting the computed address on the pins. To summarize, the 20-bit address is generated by storing the 4 low-order bits during T0 and then the 16 high-order sum bits during T1.

The address adder is not needed for segment arithmetic during T1 and T2. To improve performance, the 8086 uses the adder during this idle time to increment or decrement memory addresses. For instance, after popping a word from the stack, the stack pointer needs to be incremented by 2. The address adder can do this increment "for free" during T1 and T2, leaving the ALU available for other operations.10 Specifically, the adder updates the memory address in IND, incrementing it or decrementing it as appropriate. First, the IND value is transferred over the B bus to the adder during the second half of T1. Meanwhile, a constant (-3 to +2) is loaded from the Constant ROM and transferred to the adder's C input. The output from the adder is transferred to the AD bus during the second half of T2. As before, the output is shifted four bits to the left. However, the shift/crossover circuit between the AD bus and the C bus is configured to shift four bits to the right, canceling the adder's shift. The result is that the C bus gets the 16-bit sum from the adder, and this value is stored in the IND register.11 For more information on the implemenation of the address adder, see my previous blog post.

The pin driver circuit

Now I'll dive down to the hardware implementation of an output pin. When the 8086 chip communicates with the outside world, it needs to provide relatively high currents. The tiny logic transistors can't provide enough current, so the chip needs to use large output transistors. To fit the large output transistors on the die, they are constructed of multiple wide transistors in parallel.12 Moreover, the drivers use a somewhat unusual "superbuffer" circuit with two transistors: one to pull the output high, and one to pull the output low.13

The diagram below shows the transistor structure for one of the output pins (AD10), consisting of three parallel transistors between the output and +5V, and five parallel transistors between the output and ground. The die photo on the left shows the metal layer on top of the die. This shows the power and ground wiring and the connections to the transistors. The photo on the right shows the die with the metal layer removed, showing the underlying silicon and the polysilicon wiring on top. A transistor gate is formed where a polysilicon wire crosses the doped silicon region. Combined, the +5V transistors are equivalent to about 60 typical transistors, while the ground transistors are equivalent to about 100 typical transistors. Thus, these transistors provide substantially more current to the output pin.

Two views of the output transistors for a pin. The first shows the metal layer, while the second shows the polysilicon and silicon.

Two views of the output transistors for a pin. The first shows the metal layer, while the second shows the polysilicon and silicon.

Tri-state output driver

The output circuit for an address pin uses a tri-state buffer, which allows the output to be disabled by putting it into a high-impedance "tri-state" configuration. In this state, the output is not pulled high or low but is left floating. This capability allows the pin to be used for data input. It also allows external devices to device can take control of the bus, for instance, to perform DMA (direct memory access).

The pin is driven by two large MOSFETs, one to pull the output high and one to pull it low. (As described earlier, each large MOSFET is physically multiple transistors in parallel, but I'll ignore that for now.) If both MOSFETs are off, the output floats, neither on nor off.

Schematic diagram of a "typical" address output pin.

Schematic diagram of a "typical" address output pin.

The tri-state output is implemented by driving the MOSFETs with two "superbuffer"15 AND gates. If the enable input is low, both AND gates produce a low output and both output transistors are off. On the other hand, if enable is high, one AND gate will be on and one will be off. The desired output value is loaded into a flip-flop to hold it,14 and the flip-flop turns one of the output transistors on, driving the output pin high or low as appropriate. (Conveniently, the flip-flop provides the data output Q and the inverted data output Q.) Generally, the address pin outputs are enabled for T1-T4 of a write but only during T1 for a read.16

In the remainder of the discussion, I'll use the tri-state buffer symbol below, rather than showing the implementation of the buffer.

The output circuit, expressed with a tri-state buffer symbol.

The output circuit, expressed with a tri-state buffer symbol.

AD4-AD15

Pins AD4-AD15 are "typical" pins, avoiding the special behavior of the top and bottom pins, so I'll discuss them first. The behavior of these pins is that the value on the AD bus is latched by the circuit and then put on the output pin under the control of the enable signal. The circuit has three parts: a multiplexer to select the output value, a flip-flop to hold the output value, and a tri-state driver to provide the high-current output to the pin. In more detail, the multiplexer selects either the value on the AD bus or the current output from the flip-flop. That is, the multiplexer can either load a new value into the flip-flop or hold the existing value.17 The flip-flop latches the input value on the falling edge of the clock, passing it to the output driver. If the enable line is high, the output driver puts this value on the corresponding address pin.

The output circuit for AD4-AD15 has a latch to hold the desired output value, an address or data bit.

The output circuit for AD4-AD15 has a latch to hold the desired output value, an address or data bit.

For a write, the circuit latches the address value on the bus during the second half of T0 and puts it on the pins during T1. During the second half of the T1 state, the data word is transferred from the OPR register over the C bus to the AD bus and loaded into the AD pin latches. The word is transferred from the latches to the pins during T2 and held for the remainder of the bus cycle.

AD0-AD3

The four low address bits have a more complex circuit because these address bits are latched from the bus before the address adder computes its sum, as described earlier. The memory offset (before the segment addition) will be on the C bus during the second half of TS and is loaded into the lower flip-flop. This flip-flop delays these bits for one clock cycle and then they are loaded into the upper flip-flop. Thus, these four pins pick up the offset prior to the addition, while the other pins get the result of the segment addition.

The output circuit for AD0-AD3 has a second latch to hold the low address bits before the address adder computes the sum.

The output circuit for AD0-AD3 has a second latch to hold the low address bits before the address adder computes the sum.

For data, the AD0-AD3 pins transfer data directly from the AD bus to the pin latch, bypassing the delay that was used to get the address bits. That is, the AD0-AD3 pins have two paths: the delayed path used for addresses during T0 and the direct path otherwise used for data. Thus, the multiplexer has three inputs: two for these two paths and a third loop-back input to hold the flip-flop value.

A16-A19: status outputs

The top four pins (A16-A19) are treated specially, since they are not used for data. Instead, they provide processor status during T2-T4.18 The pin latches for these pins are loaded with the address during T0 like the other pins, but loaded with status instead of data during T1. The multiplexer at the input to the latch selects the address bit during T0 and the status bit during T1, and holds the value otherwise. The schematic below shows how this is implemented for A16, A17, and A19.

The output circuit for AD16, AD17, and AD19 selects either an address output or a status output.

The output circuit for AD16, AD17, and AD19 selects either an address output or a status output.

Address pin A18 is different because it indicates the current status of the interrupt enable flag bit. This status is updated every clock cycle, unlike the other pins. To implement this, the pin has a different circuit that isn't latched, so the status can be updated continuously. The clocked transistors act as "pass transistors", passing the signal through when active. When a pass transistor is turned off, the following logic gate holds the previous value due to the capacitance of the wiring. Thus, the pass transistors provide a way of holding the value through the clock cycle. The flip-flops are implemented with pass transistors internally, so in a sense the circuit below is a flip-flop that has been "exploded" to provide a second path for the interrupt status.

The output circuit for AD18 is different from the rest so the I flag status can be updated every clock cycle.

The output circuit for AD18 is different from the rest so the I flag status can be updated every clock cycle.

Reads

A memory or I/O read also uses a 4-state bus cycle, slightly different from the write cycle. During T1, the address is provided on the pins, the same as for a write. After that, however, the output circuits are tri-stated so they float, allowing the external memory to put data on the bus. The read data on the pin is put on the AD bus at the start of the T4 state. From there, the data passes through the crossover circuit to the C bus. Normally the 16 data bits pass straight through to the C bus, but the bytes will be swapped if the memory access is unaligned. From the C bus, the data is written to the OPR register, a byte or a word as appropriate. (For an instruction prefetch, the word is written to a prefetch queue register instead.)

A typical read bus cycle consists of four T states. Based on The 8086 Family Users Manual, B-16.

A typical read bus cycle consists of four T states. Based on The 8086 Family Users Manual, B-16.

To support data input on the AD0-AD15 pins, they have a circuit to buffer the input data and transfer it to the AD bus. The incoming data bit is buffered by the two inverters and sampled when the clock is high. If the enable' signal is low, the data bit is transferred to the AD bus when the clock is low.19 The two MOSFETs act as a "superbuffer", providing enough current for the fairly long AD bus. I'm not sure what the capacitor accomplishes, maybe avoiding a race condition if the data pin changes just as the clock goes low.20

Schematic of the input circuit for the data pins.

Schematic of the input circuit for the data pins.

This circuit has a second role, precharging the AD bus high when the clock is low, if there's no data. Precharging a bus is fairly common in the 8086 (and other NMOS processors) because NMOS transistors are better at pulling a line low than pulling it high. Thus, it's often faster to precharge a line high before it's needed and then pull it low for a 0.21

Since pins A16-A19 are not used for data, they operate the same for reads as for writes: providing address bits and then status.

The pin circuit on the die

The diagram below shows how the pin circuitry appears on the die. The metal wiring has been removed to show the silicon and polysilicon. The top half of the image is the input circuitry, reading a data bit from the pin and feeding it to the AD bus. The lower half of the image is the output circuitry, reading an address or data bit from the AD bus and amplifying it for output via the pad. The light gray regions are doped, conductive silicon. The thin tan lines are polysilicon, which forms transistor gates where it crosses doped silicon.

The input/output circuitry for an address/data pin. The metal layer has been removed to show the underlying silicon and polysilicon. Some crystals have formed where the bond pad was.

The input/output circuitry for an address/data pin. The metal layer has been removed to show the underlying silicon and polysilicon. Some crystals have formed where the bond pad was.

A historical look at pins and timing

The number of pins on Intel chips has grown exponentially, more than a factor of 100 in 50 years. In the early days, Intel management was convinced that a 16-pin package was large enough for any integrated circuit. As a result, the Intel 4004 processor (1971) was crammed into a 16-pin package. Intel chip designer Federico Faggin22 describes 16-pin packages as a completely silly requirement that was throwing away performance, but the "God-given 16 pins" was like a religion at Intel. When Intel was forced to use 18 pins by the 1103 memory chip, it "was like the sky had dropped from heaven" and he had "never seen so many long faces at Intel." Although the 8008 processor (1972) was able to use 18 pins, this low pin count still harmed performance by forcing pins to be used for multiple purposes.

The Intel 8080 (1974) had a larger, 40-pin package that allowed it to have 16 address pins and 8 data pins. Intel stuck with this size for the 8086, even though competitors used larger packages with more pins.23 As processors became more complex, the 40-pin package became infeasible and the pin count rapidly expanded; The 80286 processor (1982) had a 68-pin package, while the i386 (1985) had 132 pins; the i386 needed many more pins because it had a 32-bit data bus and a 24- or 32-bit address bus. The i486 (1989) went to 196 pins while the original Pentium had 273 pins. Nowadays, a modern Core I9 processor uses the FCLGA1700 socket with a whopping 1700 contacts.

Looking at the history of Intel's bus timing, the 8086's complicated memory timing goes back to the Intel 8008 processor (1972). Instruction execution in the 8008 went through a specific sequence of timing states; each clock cycle was assigned a particular state number. Memory accesses took three cycles: the address was sent to memory during states T1 and T2, half of the address at a time since there were only 8 address pins. During state T3, a data byte was either transmitted to memory or read from memory. Instruction execution took place during T4 and T5. State signals from the 8008 chip indicated which state it was in.

The 8080 used an even more complicated timing system. An instruction consisted of one to five "machine cycles", numbered M1 through M5, where each machine cycle corresponded to a memory or I/O access. Each machine cycle consisted of three to five states, T1 through T5, similar to the 8008 states. The 8080 had 10 different types of machine cycle such as instruction fetch, memory read, memory write, stack read or write, or I/O read or write. The status bits indicated the type of machine cycle. The 8086 kept the T1 through T4 memory cycle. Because the 8086 decoupled instruction prefetching from execution, it no longer had explicit M machine cycles. Instead, it used status bits to indicate 8 types of bus activity such as instruction fetch, read data, or write I/O.

Conclusions

Well, address pins is another subject that I thought would be straightforward to explain but turned out to be surprisingly complicated. Many of the 8086's design decisions combine in the address pins: segmented addressing, backward compatibility, and the small 40-pin package. Moreover, because memory accesses are critical to performance, Intel put a lot of effort into this circuitry. Thus, the pin circuitry is tuned for particular purposes, especially pin A18 which is different from all the rest.

There is a lot more to say about memory accesses and how the 8086's Bus Interface Unit performs them. The process is very complicated, with interacting state machines for memory operation and instruction prefetches, as well as handling unaligned memory accesses. I plan to write more, so follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @[email protected] and Bluesky as @righto.com so you can follow me there too.

Notes and references

  1. In the discussion, I'll often call all the address pins "AD" pins for simplicity, even though pins 16-19 are not used for data. 

  2. The 8086's compatibility with the 8080 was somewhat limited since the 8086 had a different instruction set. However, Intel provided a conversion program called CONV86 that could convert 8080/8085 assembly code into 8086 assembly code that would usually work after minor editing. The 8086 was designed to make this process straightforward, with a mapping from the 8080's registers onto the 8086's registers, along with a mostly-compatible instruction set. (There were a few 8080 instructions that would be expanded into multiple 8086 instructions.) The conversion worked for straightforward code, but didn't work well with tricky, self-modifying code, for instance. 

  3. To support the 8086's segment architecture, programmers needed to deal with "near" and "far" pointers. A near pointer consisted of a 16-bit offset and could access 64K in a segment. A far pointer consisted of a 16-bit offset along with a 16-bit segment address. By modifying the segment register on each access, the full megabyte of memory could be accessed. The drawbacks were that far pointers were twice as big and were slower. 

  4. The 8086 patent provides a detailed architectural diagram of the 8086. I've extracted part of the diagram below. In most cases the diagram is accurate, but its description of the C bus doesn't match the real chip. There are some curious differences between the patent diagram and the actual implementation of the 8086, suggesting that the data pins were reorganized between the patent and the completion of the 8086. The diagram shows the address adder (called the Upper Adder) connected to the C bus, which is connected to the address/data pins. In particular, the patent shows the data pins multiplexed with the high address pins, while the low address pins A3-A0 are multiplexed with three status signals. The actual implementation of the 8086 is the other way around, with the data pins multiplexed with the low address pins while the high address pins A19-A16 are multiplexed with the status signals. Moreover, the patent doesn't show anything corresponding to what I call the AD bus; I made up that name. The moral is that while patents can be very informative, they can also be misleading.

    A diagram from patent US4449184 showing the connections to the address pins. This diagram does not match the actual chip. The diagram also shows the old segment register names: RC, RD, RS, and RA became CS, DS, SS, and ES.

    A diagram from patent US4449184 showing the connections to the address pins. This diagram does not match the actual chip. The diagram also shows the old segment register names: RC, RD, RS, and RA became CS, DS, SS, and ES.

     

  5. The C bus is connected to the PC, OPR, and IND registers, as well as the prefetch queue, but is not connected to the segment registers. Two other buses (the ALU bus and the B bus) provide access to the segment registers. 

  6. Swapping the bytes on the data pins is required in a few cases. The 8086 has a 16-bit data bus, so transfers are usually a 16-bit word, copied directly between memory and a register. However, the 8086 also allows 8-bit operations, in which case either the top half or bottom half of the word is accessed. Loading an 8-bit value from the top half of a memory word into the bottom half of a register uses the crossover circuit. Another case is performing a 16-bit access to an "unaligned" address, that is, an odd address so the word crosses the normal word boundaries. From the programmer's perspective, an unaligned access is permitted (unlike many RISC processors), but the hardware converts this access into two 8-bit accesses, so the bus itself never handles an unaligned access.

    The 8086 has the ability to access a single memory byte out of a word, either for a byte operation or for an unaligned word operation. This behavior has some important consequences on the address pins. In particular, the low address pin AD0 doesn't behave like the rest of the address pins due to the special handling of odd addresses. Instead, this pin indicates which half of the word to transfer. The AD0 line is low (0) when the lower portion of the bus transfers a byte. Another pin, BHE (Bus High Enable) has a similar role for the upper half of the bus: it is low (0) if a byte is transferred over D15-D8. (Keep in mind that the 8086 is little-endian, so the low byte of the word is first in memory, at the even address.)

    The following table summarizes how BHE and A0 work together to select a byte or word. When accessing a byte at an odd address, A0 is odd as you might expect.

    Access typeBHEA0
    Word00
    Low byte10
    High byte01
     

  7. The cbus-adbus-shift signal is activated during T2, when a memory index is being updated, either the instruction pointer or the IND register. The address adder is used to update the register and the shift undoes the 4-bit left shift applied to the adder's output. The shift is also used for the CORR micro-instruction, which corrects the instruction pointer to account for prefetching. The CORR micro-instruction generates a "fake" short bus cycle in which the constant ROM and the address adder are used during T0. I discuss the CORR micro-instruction in more detail in this post

  8. I've made the timing diagram somewhat idealized so actions line up with the clock. In the real datasheet, all the signals are skewed by various amounts so the timing is more complicated. Moreover, if the memory device is slow, it can insert "wait" states between T3 and T4. (Cheap memory was slower and would need wait states.) Moreover, actions don't exactly line up with the clock. I'm also omitting various control signals. The datasheet has pages of timing constraints on exactly when signals can change. 

  9. Instruction prefetches don't use the IND and OPR registers. Instead, the address is specified by the Instruction Pointer (or Program Counter), and the data is stored directly into one of the instruction prefetch registers. 

  10. A single memory operation takes six clock cycles: two preparatory cycles to compute the address before the four visible cycles. However, if multiple memory operations are performed, the operations are overlapped to achieve a degree of pipelining. Specifically, the address calculation for the next memory operation takes place during the last two clock cycles of the current memory operation, saving two clock cycles. That is, for consecutive bus cycles, T3 and T4 overlap with TS and T0 of the next cycle. In other words, during T3 and T4 of one bus cycle, the memory address gets computed for the next bus cycle. This pipelining improves performance, compared to taking 6 clock cycles for each bus cycle. 

  11. The POP operation is an example of how the address adder updates a memory pointer. In this case, the stack address is moved from the Stack Pointer to the IND register in order to perform the memory read. As part of the read operation, the IND register is incremented by 2. The address is then moved from the IND register to the Stack Pointer. Thus, the address adder not only performs the segment arithmetic, but also computes the new value for the SP register.

    Note that the increment/decrement of the IND register happens after the memory operation. For stack operations, the SP must be decremented before a PUSH and incremented after a POP. The adder cannot perform a predecrement, so the PUSH instruction uses the ALU (Arithmetic/Logic Unit) to perform the decrement. 

  12. The current from an MOS transistor is proportional to the width of the gate divided by the length (the W/L ratio). Since the minimum gate length is set by the manufacturing process, increasing the width of the gate (and thus the overall size of the transistor) is how the transistor's current is increased. 

  13. Using one transistor to pull the output high and one to pull the output low is normal for CMOS gates, but it is unusual for NMOS chips like the 8086. A normal NMOS gate only has active transistor to pull the output low and uses a depletion-mode transistor to provide a weak pull-up current, similar to a pull-up resistor. I discuss superbuffers in more detail here

  14. The flip-flop is controlled by the inverted clock signal, so the output will change when the clock goes low. Meanwhile, the enable signal is dynamically latched by a MOSFET, also controlled by the inverted clock. (When the clock goes high, the previous value will be retained by the gate capacitance of the inverter.) 

  15. The superbuffer AND gates are constructed on the same principle as the regular superbuffer, except with two inputs. Two transistors in series pull the output high if both inputs are high. Two transistors in parallel pull the output low if either input is low. The low-side transistors are driven by inverted signals. I haven't drawn these signals on the schematic to simplify it.

    The superbuffer AND gates use large transistors, but not as large as the output transistors, providing an intermediate amplification stage between the small internal signals and the large external signals. Because of the high capacitance of the large output transistors, they need to be driven with larger signals. There's a lot of theory behind how transistor sizes should be scaled for maximum performance, described in the book Logical Effort. Roughly speaking, for best performance when scaling up a signal, each stage should be about 3 to 4 times as large as the previous one, so a fairly large number of stages are used (page 21). The 8086 simplifies this with two stages, presumably giving up a bit of performance in exchange for keeping the drivers smaller and simpler. 

  16. The enable circuitry has some complications. For instance, I think the address pins will be enabled if a cycle was going to be T1 for a prefetch but then got preempted by a memory operation. The bus control logic is fairly complicated. 

  17. The multiplexer is implemented with pass transistors, rather than gates. One of the pass transistors is turned on to pass that value through to the multiplexer's output. The flip-flop is implemented with two pass transistors and two inverters in alternating order. The first pass transistor is activated by the clock and the second by the complemented clock. When a pass transistor is off, its output is held by the gate capacitance of the inverter, somewhat like dynamic RAM. This is one reason that the 8086 has a minimum clock speed: if the clock is too slow, these capacitively-held values will drain away. 

  18. The status outputs on the address pins are defined as follows:
    A16/S3, A17/S4: these two status lines indicate which relocation register is being used for the memory access, i.e. the stack segment, code segment, data segment, or alternate segment. Theoretically, a system could use a different memory bank for each segment and increase the total memory capacity to 4 megabytes.
    A18/S5: indicates the status of the interrupt enable bit. In order to provide the most up-to-date value, this pin has a different circuit. It is updated at the beginning of each clock cycle, so it can change during a bus cycle. The motivation for this is presumably so peripherals can determine immediately if the interrupt enable status changes.
    A19/S6: the documentation calls this a status output, even though it always outputs a status of 0. 

  19. For a read, the enable signal is activated at the end of T3 and the beginning of T4 to transfer the data value to the AD bus. The signal is gated by the READY pin, so the read doesn't happen until the external device is ready. The 8086 will insert Tw wait states in that case. 

  20. The datasheet says that a data value must be held steady for 10 nanoseconds (TCLDX) after the clock goes low at the start of T4. 

  21. The design of the AD bus is a bit unusual since the adder will put a value on the AD bus when the clock is high, while the data pin will put a value on the AD bus when the clock is low (while otherwise precharging it when the clock is low). Usually the bus is precharged during one clock phase and all users of the bus pull it low (for a 0) during the other phase. 

  22. Federico Faggin's oral history is here. The relevant part is on pages 55 and 56. 

  23. The Texas Instruments TMS9900 (1976) used a 64-pin package for instance, as did the Motorola 68000 (1979). 

Reverse-engineering an electromechanical Central Air Data Computer

Determining the airspeed and altitude of a fighter plane is harder than you'd expect. At slower speeds, pressure measurements can give the altitude, air speed, and other "air data". But as planes approach the speed of sound, complicated equations are needed to accurately compute these values. The Bendix Central Air Data Computer (CADC) solved this problem for military planes such as the F-101 and the F-111 fighters, and the B-58 bomber.1 This electromechanical marvel was crammed full of 1955 technology: gears, cams, synchros, and magnetic amplifiers. In this blog post I look inside the CADC, describe the calculations it performed, and explain how it performed these calculations mechanically.

The Bendix MG-1A Central Air Data Computer with the case removed, showing the complex mechanisms inside. Click this image (or any other) for a larger version.

The Bendix MG-1A Central Air Data Computer with the case removed, showing the complex mechanisms inside. Click this image (or any other) for a larger version.

This analog computer performs calculations using rotating shafts and gears, where the angle of rotation indicates a numeric value. Differential gears perform addition and subtraction, while cams implement functions. The CADC is electromechanical, with magnetic amplifiers providing feedback signals and three-phase synchros providing electrical outputs. It is said to contain 46 synchros, 511 gears, 820 ball bearings, and a total of 2,781 major parts. The photo below shows a closeup of the gears.

A closeup of the complex gears inside the CADC,

A closeup of the complex gears inside the CADC,

What it does

For over a century, aircraft have determined airspeed from air pressure. A port in the side of the plane provides the static air pressure,2 which is the air pressure outside the aircraft. A pitot tube points forward and receives the "total" air pressure, a higher pressure due to the speed of the airplane forcing air into the tube. (In the photo below, you can see the long pitot tube sticking out from the nose of a F-101.) The airspeed can be determined from the ratio of these two pressures, while the altitude can be determined from the static pressure.

The F-101 "Voodoo", USAF photo.

The F-101 "Voodoo", USAF photo.

But as you approach the speed of sound, the fluid dynamics of air change and the calculations become very complicated. With the development of supersonic fighter planes in the 1950s, simple mechanical instruments were no longer sufficient. Instead, an analog computer to calculate the "air data" (airspeed, altitude, and so forth) from the pressure measurements. One option would be for each subsystem (instruments, weapons control, engine control, etc) to compute the air data separately. However, it was more efficient to have one central system perform the computation and provide the data electrically to all the subsystems that need it. This system was called a Central Air Data Computer or CADC.

The Bendix CADC has two pneumatic inputs through tubes: the static pressure3 and the total pressure. It also receives the total temperature from a platinum temperature probe. From these, it computes many outputs: true air speed, Mach number, log static pressure, differential pressure, air density, air density × the speed of sound, total temperature, and log true free air temperature.

The CADC implemented a surprisingly complex set of functions derived from fluid dynamics equations describing the behavior of air at various speeds and conditions. First, the Mach number is computed from the ratio of total pressure to static pressure. Different equations are required for subsonic and supersonic flight. Although this equation looks difficult to solve mathematically, fundamentally M is a function of one variable ($P_t / P_s$), and this function is encoded in the shape of a cam. (You are not expected to understand the equations below. They are just to illustrate the complexity of what the CADC does.)

\[M<1:\] \[~~~\frac{P_t}{P_s} = ( 1+.2M^2)^{3.5}\]

\[M > 1:\]

\[~~~\frac{P_t}{P_s} = \frac{166.9215M^7}{( 7M^2-1)^{2.5}}\]

Next, the temperature is determined from the Mach number and the temperature indicated by a temperature probe.

\[T = \frac{T_{ti}}{1 + .2 M^2} \]

The indicated airspeed and other outputs are computed in turn, but I won't go through all the equations. Although these equations may seem ad hoc, they can be derived from fluid dynamics principles. These equations were standardized in the 1950s by various government organizations including the National Bureau of Standards and NACA (the precursor of NASA). While the equations are complicated, they can be computed with mechanical means.

How it is implemented

The Air Data Computer is an analog computer that determines various functions of the static pressure, total pressure and temperature. An analog computer was selected for this application because the inputs are analog and the outputs are analog, so it seemed simplest to keep the computations analog and avoid conversions. The computer performs its computations mechanically, using the rotation angle of shafts to indicate values. For the most part, values are represented logarithmically, which allows multiplication and division to be implemented by adding and subtracting rotations. A differential gear mechanism provides the underlying implementation of addition and subtraction. Specially-shaped cams provide the logarithmic and exponential conversions as necessary. Other cams implement various arbitrary functions.

The diagram below, from patent 2,969,210, shows some of the operations. At the left, transducers convert the pressure and temperature inputs from physical quantities into shaft rotations, applying a log function in the process. Subtracting the two pressures with a differential gear mechanism (X-in-circle symbol) produces the log of the pressure ratios. Cam "CCD 12" generates the Mach number from this log pressure ratio, still expressed as a shaft rotation. A synchro transmitter converts the shaft rotation into a three-phase electrical output from the CADC. The remainder of the diagram uses more cams and differentials to produce the other outputs. Next, I'll discuss how these steps are implemented.

A diagram showing how values are computed by the CADC. Source: Patent 2969910A">Patent 2969910.

A diagram showing how values are computed by the CADC. Source: Patent 2969910.

The pressure transducer

The CADC receives the static and total pressure through tubes connected to the front of the CADC. (At the lower right, one of these tubes is visible.) Inside the CADC, two pressure transducers convert the pressures into rotational signals. The pressure transducers are the black domed cylinders at the top of the CADC.

The pressure transducers are the two black domes at the top. The circuit boards next to each pressure transducer are the amplifiers. The yellowish transformer-like devices with three windings are the magnetic amplifiers.

The pressure transducers are the two black domes at the top. The circuit boards next to each pressure transducer are the amplifiers. The yellowish transformer-like devices with three windings are the magnetic amplifiers.

Each pressure transducer contains a pair of bellows that expand and contract as the applied pressure changes. They are connected to opposite sides of a shaft so they cause small rotations of the shaft.

Inside the pressure transducer. The two disc-shaped bellows are connected to opposite sides of a shaft so the shaft rotates as the bellows expand or contract.

Inside the pressure transducer. The two disc-shaped bellows are connected to opposite sides of a shaft so the shaft rotates as the bellows expand or contract.

The pressure transducer has a tricky job: it must measure tiny pressure changes, but it must also provide a rotational signal that has enough torque to rotate all the gears in the CADC. To accomplish this, the pressure transducer uses a servo loop. The bellows produce a small shaft motion that is detected by an inductive pickup. This signal is amplified and drives a motor with enough power to move all the gears. The motor is also geared to counteract the movement of the bellows. This creates a feedback loop so the motor's rotation tracks the air pressure, but provides much more force. A cam is used so the output corresponds to the log of the input pressure.

This diagram shows the structure of the transducer. From "Air Data Computer Mechanization."

This diagram shows the structure of the transducer. From "Air Data Computer Mechanization."

Each transducer signal is amplified by three circuit boards centered around a magnetic amplifier, a transformer-like amplifier circuit that was popular before high-power transistors came along. The photo below shows how the amplifier boards are packed next to the transducers. The boards are complex, filled with resistors, capacitors, germanium transistors, diodes, relays, and other components.

This end-on view of the CADC shows the pressure transducers, the black cylinders. Next to each pressure transducer is a complex amplifier consisting of multiple boards with transistors and other components. The magnetic amplifiers are the yellowish transformer-like components.

This end-on view of the CADC shows the pressure transducers, the black cylinders. Next to each pressure transducer is a complex amplifier consisting of multiple boards with transistors and other components. The magnetic amplifiers are the yellowish transformer-like components.

Temperature

The external temperature is an important input to the CADC since it affects the air density. A platinum temperature probe provides a resistance4 that varies with temperature. The resistance is converted to rotation by an electromechanical transducer mechanism. Like the pressure transducer, the temperature transducer uses a servo mechanism with an amplifier and feedback loop. For the temperature transducer, though, the feedback signal is generated by a resistance bridge using a potentiometer driven by the motor. By balancing the potentiometer's resistance with the platinum probe's resistance, a shaft rotation is produced that corresponds to the temperature. The cam is configured to produce the log of the temperature as output.

This diagram shows the structure of the temperature transducer. From "Air Data Computer Mechanization."

This diagram shows the structure of the temperature transducer. From "Air Data Computer Mechanization."

The temperature transducer section of the CADC is shown below. The feedback potentiometer is the red cylinder at the lower right. Above it is a metal-plate adjustment cam, which will be discussed below. The CADC is designed in a somewhat modular way, with the temperature section implemented as a removable wedge-shaped unit, the lower two-thirds of the photo. The temperature transducer, like the pressure transducer, has three boards of electronics to implement the feedback amplifier and drive the motor.

The temperature transducer section of the CADC.

The temperature transducer section of the CADC.

The differential

The differential gear assembly is a key component of the CADC's calculations, as it performs addition or subtraction of rotations: the rotation of the output shaft is the sum or difference of the input shafts, depending on the direction of rotation.5 When rotations are expressed logarithmically, addition and subtraction correspond to multiplication and division. This differential is constructed as a spur-gear differential. It has inputs at the top and bottom, while the body of the differential rotates to produce the sum. The two visible gears in the body mesh with the internal input gears, which are not visible. The output is driven by the body through a concentric shaft.

A closeup of a differential mechanism.

A closeup of a differential mechanism.

The cams

The CADC uses cams to implement various functions. Most importantly, cams perform logarithms and exponentials. Cams also implement more complex functions of one variable such as ${M}/{\sqrt{1 + .2 M^2}}$. The photo below shows a cam (I think exponential) with the follower arm in front. As the cam rotates, the follower moves in and out according to the cam's radius, providing the function value.

A cam inside the CADC implements a function.

A cam inside the CADC implements a function.

The cams are combined with a differential in a clever way to make the cam shape more practical, as shown below.6 The input (23) drives the cam (30) and the differential (37-41). The follower (32) tracks the cam and provides a second input (35) to the differential. The sum from the differential produces the output (26).

This diagram, from Patent 2969910, shows how the cam and follower are connected to a differential.

This diagram, from Patent 2969910, shows how the cam and follower are connected to a differential.

The warped plate cam

Some functions are implemented by warped metal plates acting as cams. This type of cam can be adjusted by turning the 20 setscrews to change the shape of the plate. A follower rides on the surface of the cam and provides an input to a differential underneath the plate. The differential adds the cam position to the input rotation, producing a modified rotation, as with the solid cam. The pressure transducer, for instance, uses a cam to generate the desired output function from the bellows deflection. By using a cam, the bellows can be designed for good performance without worrying about its deflection function.

A closeup of a warped-plate cam.

A closeup of a warped-plate cam.

The synchro outputs

Most of the outputs from the CADC are synchro signals.7 A synchro is an interesting device that can transmit a rotational position electrically over three wires. In appearance, a synchro is similar to an electric motor, but its internal construction is different, as shown below. In use, two synchros have their stator windings connected together, while the rotor windings are driven with AC. Rotating the shaft of one synchro causes the other to rotate to the same position. I have a video showing synchros in action here.

Cross-section diagram of a synchro showing the rotor and stators.

Cross-section diagram of a synchro showing the rotor and stators.

Internally, a synchro has a moving rotor winding and three fixed stator windings. When AC is applied to the rotor, voltages are developed on the stator windings depending on the position of the rotor. These voltages produce a torque that rotates the synchros to the same position. In other words, the rotor receives power (26 V, 400 Hz in this case), while the three stator wires transmit the position. The diagram below shows how a synchro is represented schematically, with rotor and stator coils.

The schematic symbol for a synchro.

The schematic symbol for a synchro.

Before digital systems, synchros were very popular for transmitting signals electrically through an aircraft. For instance, a synchro could transmit an altitude reading to a cockpit display or a targeting system. For the CADC, most of the outputs are synchro signals, which convert the rotational values of the CADC to electrical signals. The three stator windings from the synchro inside the CADC are wired to an external synchro that receives the rotation. For improved resolution, many of these outputs use two synchros: a coarse synchro and a fine synchro. The two synchros are typically geared in an 11:1 ratio, so the fine synchro rotates 11 times as fast as the coarse synchro. Over the output range, the coarse synchro may turn 180°, providing the approximate output, while the fine synchro spins multiple times to provide more accuracy.

The front of the CADC has multiple output synchros with anti-backlash springs.

The front of the CADC has multiple output synchros with anti-backlash springs.

The air data system

The CADC is one of several units in the system, as shown in the block diagram below.8 The outputs of the CADC go to another box called the Air Data Converter, which is the interface between the CADC and the aircraft systems that require the air data values: fire control, engine control, navigation system, cockpit display instruments, and so forth. The motivation for this separation is that different aircraft types have different requirements for signals: the CADC remains the same and only the converter needed to be customized. Some aircraft required "up to 43 outputs including potentiometers, synchros, digitizers, and switches."

This block diagram shows how the Air Data Computer integrates with sensors and other systems. The unlabeled box on the right is the converter. From MIL-C-25653C(USAF).

This block diagram shows how the Air Data Computer integrates with sensors and other systems. The unlabeled box on the right is the converter. From MIL-C-25653C(USAF).

The CADC was also connected to a cylindrical unit called the "Static pressure and angle of attack compensator." This unit compensates for errors in static pressure measurements due to the shape of the aircraft by producing the "position error correction". Since the compensation factor depended on the specific aircraft type, the compensation was computed outside the Central Air Data Computer, again keeping the CADC generic. This correction factor depends on the Mach number and angle of attack, and was implemented as a three-dimensional cam. The cam's shape (and thus the correction function) was determined empirically, rather than from fundamental equations.

The CADC was wired to other components through five electrical connectors as shown in the photo below.9 At the bottom are the pneumatic connections for static pressure and total pressure. At the upper right is a small elapsed time meter.

The front of the CADC has many mil-spec round connectors.

The front of the CADC has many mil-spec round connectors.

Conclusions

The Bendix MG-1A Central Air Data Computer is an amazingly complex piece of electromechanical hardware. It's hard to believe that this system of tiny gears was able to perform reliable computations in the hostile environment of a jet plane, subjected to jolts, accelerations, and vibrations. But it was the best way to solve the problem at the time,10 showing the ingenuity of the engineers who developed it.

The CADC inside its case. From the outside, its mechanical marvels are hidden.

The CADC inside its case. From the outside, its mechanical marvels are hidden.

I plan to continue reverse-engineering the Bendix CADC and hope to get it operational,11 so follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @oldbytes.space@kenshirriff. Until then, you can check out CuriousMarc's video below to see more of the CADC. Thanks to Joe for providing the CADC. Thanks to Nancy Chen for obtaining a hard-to-find document for me.

Notes and references

  1. I haven't found a definitive list of which planes used this CADC. Based on various sources, I believe it was used in the F-86, F-101, F-104, F-105, F-106, and F-111, and the B-58 bomber. 

  2. The static air pressure can also be provided by holes in the side of the pitot tube. I couldn't find information indicating exactly how these planes received static pressure. 

  3. The CADC also has an input for the "position error correction". This provides a correction factor because the measured static pressure may not exactly match the real static pressure. The problem is that the static pressure is measured from a port on the aircraft. Distortions in the airflow may cause errors in this measurement. A separate box, the "compensator", determines the correction factor based on the angle of attack. 

  4. The platinum temperature probe is type MA-1, defined by specification MIL-P-25726. It apparently has a resistance of 50 Ω at 0 °C. 

  5. Strictly speaking, the output of the differential is the sum of the inputs divided by two. I'm ignoring the factor of 2 because the gear ratios can easily cancel it out. 

  6. Cams are extensively used in the CADC to implement functions of one variable, including exponentiation and logarithms. The straightforward way to use a cam is to read the value of the function off the cam directly, with the radius of the cam at each angle representing the value. This approach encounters a problem when the cam wraps around, since the cam's profile will suddenly jump from one value to another. This poses a problem for the cam follower, which may get stuck on this part of the cam unless there is a smooth transition zone. Another problem is that the cam may have a large range between the minimum and maximum outputs. (Consider an exponential output, for instance.) Scaling the cam to a reasonable size will lose accuracy in the small values. The cam will also have a steep slope for the large values, making it harder to track the profile.

    The solution is to record the difference between the input and the output in the cam. A differential then adds the input value to the cam value to produce the desired value. The clever part is that by scaling the input so it matches the output at the start and end of the range, the difference function drops to zero at both ends. Thus, the cam profile matches when the angle wraps around, avoiding the sudden transition. Moreover, the difference between the input and the output is much smaller than the raw output, so the cam values can be more accurate. (This only works because the output functions are increasing functions; this approach wouldn't work for a sine function, for instance.)

    This diagram, from Patent 2969910, shows how a cam implements a complex function.

    This diagram, from Patent 2969910, shows how a cam implements a complex function.

    The diagram above shows how this works in practice. The input is \(log~ dP/P_s\) and the output is \(log~M / \sqrt{1+.2KM^2}\). (This is a function of Mach number used for the temperature computation; K is 1.) The small humped curve at the bottom is the cam correction. Although the input and output functions cover a wide range, the difference that is encoded in the cam is much smaller and drops to zero at both ends. 

  7. The US Navy made heavy use of synchros for transmitting signals throughout ships. The synchro diagrams are from two US Navy publications: US Navy Synchros (1944) and Principles of Synchros, Servos, and Gyros (2012). These are good documents if you want to learn more about synchros. The diagram below shows how synchros could be used on a ship.

    A Navy diagram illustrating synchros controlling a gun on a battleship.

    A Navy diagram illustrating synchros controlling a gun on a battleship.

     

  8. To summarize the symbols, the outputs are: log TFAT: true free air temperature (the ambient temperature without friction and compression); log Ps: static pressure; M: Mach number; Qc: differential pressure; ρ: air density; ρa: air density times the speed of sound; Vt: true airspeed. Tt: total temperature (higher due to compression of the air). Inputs are: TT: total temperature (higher due to compression of the air). Pti: indicated total pressure (higher due to velocity); Psi: indicated static pressure; log Psi/Ps: the position error correction from the compensator. The compensator uses input αi: angle of attack; and produces αT: true angle of attack; aT: speed of sound. 

  9. The electrical connectors on the CADC have the following functions: J614: outputs to the converter, J601: outputs to the converter, J603: AC power (115 V, 400 Hz), J602: to/from the compensator, and J604: input from the temperature probe. 

  10. An interesting manual way to calculate air data was with a circular slide rule, designed for navigation and air data calculation. It gave answers for various combinations of pressure, temperature, Mach number, true airspeed, and so forth. See the MB-2A Air Navigation Computer instructions for details. Also see patent 2528518. I'll also point out that from the late 1800s through the 1940s and on, the term "computer" was used for any sort of device that computed a value, from an adding machine to a slide rule (or even a person). The meaning is very different from the modern usage of "computer". 

  11. It was very difficult to find information about the CADC. The official military specification is MIL-C-25653C(USAF). After searching everywhere, I was finally able to get a copy from the Technical Reports & Standards unit of the Library of Congress. The other useful document was in an obscure conference proceedings from 1958: "Air Data Computer Mechanization" (Hazen), Symposium on the USAF Flight Control Data Integration Program, Wright Air Dev Center US Air Force, Feb 3-4, 1958, pp 171-194.