How the 8086 processor handles power and clock internally

One under-appreciated characteristic of early microprocessors is the difficulty of distributing power inside the integrated circuit. While a modern processor might have 15 layers of metal wiring, chips from the 1970s such as the 8086 had just a single layer of metal, making routing a challenge. Similarly, clock signals must be delivered to all parts of the chip to keep it in synchronization.

The photo below shows the 8086's die under a microscope. The metal layer on top of the chip is visible, with the silicon substrate and polysilicon wiring hidden underneath. Around the outside of the die, tiny bond wires connect pads on the die to the external pins. The 8086 has a power pad at the top and ground pads at the top and bottom. Each power and ground pad has two bond wires connected to support twice the current. You can see the wide metal traces from the power and ground pads; these distribute power throughout the chip.

Die photo of the 8086 showing power connection (top) and ground connections (top, bottom). The clock circuitry is at the bottom.

Die photo of the 8086 showing power connection (top) and ground connections (top, bottom). The clock circuitry is at the bottom.

Timing in the 8086 is controlled by two internal clock signals. An external oscillator provides a clock signal to the 8086 through the clock input pad at the bottom. The on-chip clock driver circuitry generates two high-current clock signals from this external clock. Note that the clock driver takes up a not-insignificant part of the chip.

In this blog post, I'll discuss how the 8086 routes power and clock signals through the chip, and how the clock driver circuit generates the necessary clock pulses.

Power distribution

The 8086 is constructed with three layers that can be used for wiring. The metal layer on top is best for wiring, since metal has low resistance. Underneath the metal is a layer of polysilicon wiring, made from a special type of silicon. Polysilicon has higher resistance than metal, but can still be used to transmit signals across the chip. The silicon substrate is where the transistors are formed. Silicon has relatively high resistance, so it is only used for short-distance connections, such as inside a gate.

Power routing in a chip like the 8086 creates a topological puzzle of sorts: The metal layer is the only practical layer for routing power and ground, due to its low resistance. Power and ground must be provided to nearly every gate in the chip.1 And since the chip has a single metal layer, power and ground can't cross.

The diagram below highlights these metal wiring networks in the 8086. Power, connected to the power pin at the top, is shown in red, traveling throughout the chip. A major branch flows down and to the right from the power pin, then splitting into multiple paths. Power also travels around the border of the entire chip, supplying the I/O pins.

Power (red) and ground (blue, green) on the metal layer.

Power (red) and ground (blue, green) on the metal layer.

There are two ground pins. The wiring in blue is connected to the upper ground pin, while the wiring in green is connected to the lower ground pin. The blue ground wiring has a large branch downwards through the center of the chip, branching in complex directions. The green ground wiring flows along the bottom, left, and right sides of the chip, supporting the I/O pins, as well as connected to the microcode ROM in the lower right.

The power wires get thinner from their source to their final destination as they branch or deliver power along the way and the current diminishes. This is visible in the ground wire to the address / data pins, below. At the left, the ground wire below the pins is very wide, but it tapers off to the right. In other words, at the left, the wire must handle current for all the pins, but at the right the wire is supporting just the remaining pin.

The ground connection to the Address/Data pins gets progressively thinner. (Left side of chip, rotated 90°)

The ground connection to the Address/Data pins gets progressively thinner. (Left side of chip, rotated 90°)

The metal layer is used for many signals besides power and ground; it is the best layer for delivering signals due to its low resistance. However, the extensive power and ground wiring constrains the other uses of the metal layer. To avoid intersections, most of the metal signal lines run parallel to the power lines; the polysilicon layer underneath is used to run perpendicular signals. But what happens if metal wires need to cross a power or ground line? The solution is to use a "crossunder", where the signal goes down to the polysilicon layer and crosses under the power line, popping back up on the other side,3 as shown below.

Signals in the metal layer crossing under the power line by using polysilicon crossunders.

Signals in the metal layer crossing under the power line by using polysilicon crossunders.

While power and ground are almost entirely routed in the metal layer, there are a few places where this breaks down and a crossunder is used for power. This typically happens near the end of the line, where the current is small. One example is shown below, where ground passes through two polysilicon crossunders. To reduce the resistance, these crossunders are much wider than the crossunders for signals and also use the silicon and polysilicon layers together. The small circles are connections (called vias) between the metal layer and the polysilicon layer.

Composite photo showing polysilicon crossunders for ground that pass under signal lines.

Composite photo showing polysilicon crossunders for ground that pass under signal lines.

The silicon layer plays a minor part in routing power. In particular, many gates are stretched out to reach the power and ground on either side. The photo below shows some gates in the 8086. Note the large doped silicon regions (white) that extend to reach the power and ground lines. Only a small part of this silicon is used for transistors, while the rest looks like wasted space. However, these empty silicon regions connect the gate to the metal power and ground wires. Since silicon has relatively high resistance, wide regions are used for these connections, and over short distances.

The doped silicon forming gates can be extended to reach the power and ground lines. The metal layer was removed for this photo so the power and ground lines are illustrated.

The doped silicon forming gates can be extended to reach the power and ground lines. The metal layer was removed for this photo so the power and ground lines are illustrated.

Other power routing issues arose as the 8086 was revised and became physically smaller. As manufacturing technology improved, Intel performed "die shrinks", keeping the same circuitry but scaling it down uniformly to produce a smaller die. Unfortunately, shrinking the power lines reduces the current they can handle. The solution was beef up the power lines around the edge of the chip, while allowing the internal circuitry and wiring to shrink. This can be seen in the photo below; the lower-right corner of the smaller 8086 has much more power wiring, for instance. (I wrote more about the 8086 die shrink here.)

Two versions of the 8086 die, at the same scale. The die on the right is a later version of the 8086, reduced in size.

Two versions of the 8086 die, at the same scale. The die on the right is a later version of the 8086, reduced in size.

The processor clock

Almost all computers use a clock signal to control the timing of the processor.4 Like many microprocessors, the 8086 uses a two-phase clock internally.5 In a two-phase clock, there are two clock signals: when the first clock is high, the second is low, and vice versa, as shown below. One set of circuitry is enabled by the first clock, while a second set of circuitry is enabled by the second clock. The 8086's circuitry requires that the two clock phases are non-overlapping —there is a gap after one goes low before the other goes high—and asymmetrical.6

A two-phase clock consists of two clock signals with opposite polarity.

A two-phase clock consists of two clock signals with opposite polarity.

In modern processors, clock routing is complex because the clock signals must reach all parts of the chip at the same time. Modern processors use a hierarchy of clock paths, balancing the time along each path, and often provide separate buffering for each path. In comparison, the 8086's clock routing is straightforward because its 5 to 10 MHz clock7 is orders of magnitude slower than modern processors. At these comparatively low speeds, the length of the path doesn't make much difference, so the 8086's clock signals can meander around the chip.

Clock routing in the 8086. Green is clock while red is the opposite phase clock.

Clock routing in the 8086. Green is clock while red is the opposite phase clock.

The diagram above shows the 8086's clock routing. Phase 1 is in green and phase 2 is in red. At the bottom of the chip, the circuitry that generates the clocks appears as large blobs. From there, the clock signals branch wind around the chip. For the most part, the two clock phases are routed parallel to each other, unlike power and ground, which form opposing branches.

Because the clock signals go to all parts of the chip, they require much more current than typical signals and are routed in the metal layer for the most part. When the clock signals must cross the power lines, they use large crossunders as shown below. Note that the irregularly-shaped clock crossunders are much larger than the crossunders for other signals, such as the Q bus below.

The clock has large crossunders to cross the power wire. The Q bus (which transfers instructions from the instruction queue to the decoder) has much smaller crossunders.

The clock has large crossunders to cross the power wire. The Q bus (which transfers instructions from the instruction queue to the decoder) has much smaller crossunders.

To provide the high-current clock signals, the clock signals have special driver circuitry built from large transistors. The photo below compares one of these driver transistors to a typical logic transistor. The driver transistor is about 300 times as large, so it can provide about 300 times the current. This transistor is constructed as 10 transistors in parallel; the 10 vertical polysilicon lines form the 10 gates. Each clock signal is driven by a pair of large transistors, one to pull the signal high and one to pull the signal low.

A large transistor in the clock driver compared to a neighboring logic transistor.

A large transistor in the clock driver compared to a neighboring logic transistor.

The photo below shows the clock driver circuitry. This circuit splits the external clock signal into two phases, makes the phases non-overlapping, and amplifies them. At the left, the pink square is the pad for the externally-supplied clock. The signal passes through a series of transistors, ending with the large driver transistors at the right for the clock signal. The brownish wiring is the polysilicon that forms the gates. Many transistors have zig-zagging gates to fit a larger transistor into the available space.

The clock driver circuitry on the die. The metal has been removed, revealing the large transistors in the circuit. The clock input pin is the purple square on the left.

The clock driver circuitry on the die. The metal has been removed, revealing the large transistors in the circuit. The clock input pin is the purple square on the left.

The schematic below shows the driver circuitry, slightly simplified. The triangles indicate high-current drivers, built from two or three transistors; an inverting input (indicated by a bubble) pulls the output low. At the left, the clock input pin has a small resistor and a diode to provide some protection (like the other input pins). Next, the clock is split into an uninverted phase (top) and an inverted phase (bottom).

Simplified schematic of the clock driver circuitry in the 8086.

Simplified schematic of the clock driver circuitry in the 8086.

The additional circuitry keeps the clocks from overlapping: when one clock is high, it forces the other side low, through the inverted inputs. To see how this works, let's start with the clk in pin high, so clk in and clock are high while clk in and clock are low. Now, suppose the clk in pin input goes low, causing clk in to go low and clk in to go high. However, the output clock can't go high until clock goes low, due to the negative inputs on the buffers. Once that happens, clk in proceeds through the lower drivers, pulling clock high after two gate delays.8 The point of this is that clock and clock don't switch at the same time; after one goes low, there is a delay before the other goes high. This generates the desired non-overlapping clock signals.

Conclusions

The 8086 uses some interesting routing for power, but modern processors operate at a whole different level. While the 8086 required 350 milliamps of current, a modern processor might require over a hundred amps. The 8086 used 3 of its 40 pins for power and ground, compared to a modern Intel Core i5 processor with 128 power pins and 377 ground pins (out of 1151 pins). Although the numerous metal layers in modern chips solved the 8086's routing issues, modern chips have new complications such as multiple power domains that allow unused parts of the chip to be powered down.

Clock routing is much harder on modern processors since at multi-gigahertz speeds, even an extra millimeter of path can affect the clock. To deal with this, modern processors use techniques such as H-trees or grids to distribute the clock, rather than the 8086's meandering paths. While the 8086 has a simple circuit to generate the two-phase clock, modern processors often use a phase-locked loop (PLL) to synthesize the clock and use multiple circuits scattered across the chip to generate and control clock signals.

Even though the 8086 is much simpler than modern processors, it contains a lot of interesting circuitry. I plan to reverse-engineer more of the 8086, so follow me on Twitter at @kenshirriff for updates. I also have an RSS feed.

Notes and references

  1. Power and ground must be provided to almost every gate in the chip since a standard NMOS gate requires ground for its pull-down network and power for its pull-up resistor. There are a few exceptions, though. The 8086 uses some dynamic logic gates, especially in the ALU for speed. These gates are pulled high by the clock, so they don't need a direct power connection. The 8086 also uses some pass-transistor XOR gates, which are pulled low by the inputs, so they don't need ground.

    The microcode ROM forms a large region with no power connections, just ground. This is because each row in the ROM is implemented as a very large NOR gate with the power pull-up on the right-hand edge. Thus, the ROM gates all have power and ground, even though it looks like the ROM lacks power connections. 

  2. Integrated circuits often have power and ground on opposite corners or opposite sides of the chip. This placement makes it easier to construct the non-intersecting power and ground networks in the chips. The 8086 is slightly unusual to have power and ground on diagonally-opposite pins, but then a second ground pin close to the power pin. The solution is to have tree-like branching networks for power and ground. These networks are interdigitated, meshed like fingers to reach all parts of the chip.2 

  3. Crossunders are used for many wire crossings, not just power, but power wiring is a key contributor. Typically, metal wiring is used for signals in one direction, while polysilicon wiring is used for signals in the perpendicular direction. (These directions vary in different parts of the chip, depending on the predominant direction for signals.) Thus, signals for the most part can travel unimpeded. Even so, signals often bounce from layer to layer to make the routing work. 

  4. While almost all computers are synchronous and operate with a clock, the IAS machine architecture (popular in the 1950s) was asynchronous, operating without a clock. Instead, each circuit would send a pulse to the next when it was done, triggering the next step. Many early computers of the 1950s were based on the IAS machine architecture, including CYCLONE, ILLIAC, JOHNNIAC, MANIAC, SEAC, and the IBM 701. Research into asynchronous computing continues (link, link), but synchronous designs are dominant. 

  5. Among other things, processors use the clock to prevent unwanted feedback in the circuitry. For instance, consider a program counter with a circuit to increment it and feed the result back to the program counter. You don't want the new value to get repeatedly incremented.

    One approach is to use edge-sensitive circuits (flip flops) that will update that value in the program counter at the moment the clock goes high. Thus, there will be a single update as desired. However, with a two-phase clock, the circuit can be built from level-sensitive latches, which are much simpler than edge-sensitive flip flops. The idea is that when the first clock is high, the first half of the circuit receives input and does its logic calculations When the second clock is high, the second half of the circuit receives input from the first half and does any necessary calculations, while the first half is blocked. The point is that only half of the circuitry can update at any time, preventing uncontrolled feedback. 

  6. The 8086 has strict requirements on its input clock, which must be high for 1/3 of the time. The clock signal into the 8086 was typically produced by an 8284 chip and a quartz crystal. This chip divided its input clock by 3 to generate the 33% duty cycle clock required by the 8086.  

  7. Because the 8086 used dynamic logic, it also had a minimum clock speed of 2 MHz. If the clock ran slower than this, there was a risk of charges leaking away before they were refreshed, causing failures. The minimum clock speed was inconvenient for debugging, since you couldn't slow down or stop the clock. 

  8. This is a somewhat handwaving description of the clock driver circuit. In particular, I'm not sure what happens when one transistor is pulling a signal high and another is pulling the same signal low. An accurate simulation would depend on the relative sizes of the two transistors. 

3 comments:

Ed said...

Another great article! However:

> The idea is that when the first clock is high...

I'm not sure about this bit. We sometimes used to call the two clocks capture and relaease. When the release clock rises, the new value starts to propagate towards the receiving latch. When the capture clock falls, the final value from the logic is frozen. The evaluation time is from the rise of one clock to the fall of the other. The non-overlap time is dead time, and also is the safety margin for clock skew. Notably, it's possible to use the two clocks in either sense, in different places for different purposes, which can, for example, help to provide stable control signals for a datapath. See the 6502.

Julz said...

When it comes to asynchronous processors, it's probably worth mentioning the Amulet research Lab at Manchester University; which as well as implementing a number of asynchronous ARM processors (Amulet 1, Amulet 2); pioneered asynchronous Hardware Design Languages, such as Balsa.

I was privileged enough to be part of the Amulet research group over the period 1996 to 1998.

http://apt.cs.manchester.ac.uk/ftp/pub/apt/papers/FDL00.pdf

Sly said...

Note 2 have a recursive link to itself.