Showing posts with label electronics. Show all posts
Showing posts with label electronics. Show all posts

Inside a counterfeit 8086 processor

Intel introduced the 8086 processor in 1978, leading to the x86 architecture in use today. I'm currently reverse-engineering the circuitry of the 8086 so I've been purchasing vintage 8086 chips off eBay. One chip I received is shown below. From the outside, it looks like a typical Intel 8086.

The package of the fake 8086. It is labeled as an Intel 8086 from 1978.

The package of the fake 8086. It is labeled as an Intel 8086 from 1978.

I opened up the chip and looked at it under the microscope, creating the die photo below. The whitish lines are the metal layer, connecting the chip's circuitry. Underneath, the silicon has a purple hue. Around the outside of the die, bond wires connect the square pads to the 40 external pins on the IC.

Die photo of the fake 8086, showing the metal layer on top. The thick horizontal and vertical strips provide power and ground, while the other wiring connects the components.

Die photo of the fake 8086, showing the metal layer on top. The thick horizontal and vertical strips provide power and ground, while the other wiring connects the components.

I quickly noticed, however, that this wasn't an 8086 processor but something entirely different! For comparison, look at my die photo of a genuine 8086 below. As you can see, the chips are entirely different and the 8086 is much more complex. Someone had taken a random 40-pin chip and relabeled it as an Intel 8086 processor. The genuine 8086 has various functional blocks visible: the 16-bit registers and ALU on the left, the large microcode ROM in the lower right, and various other blocks of circuitry throughout the chip. (The genuine chip also has a tiny Intel copyright and the 8086 part number in the lower right. Click the image to magnify.) The fake chip above, on the other hand, is an irregular grid of horizontal and vertical wiring, with thicker horizontal and vertical lines for power.

Die photo of a genuine 8086 chip.

Die photo of a genuine 8086 chip.

The ULA or Uncommitted Logic Array

If the chip isn't an 8086, what is it? I believe the fake chip is an Uncommitted Logic Array, a type of gate array. A gate array is a way of making semi-custom integrated circuits without the expense of a fully-custom design. The idea behind a gate array is that the silicon die has a standard array of transistors that can be wired up to create the desired logic functions. This wiring is done in the chip's metal layers, which are designed for the customer's requirements.2 Although a gate array doesn't provide the flexibility of a fully-custom design, it was considerably cheaper and faster to design.

Ferranti invented the ULA in 1972, claiming that it was the first "to turn the logic array concept into a practical proposition." A ULA allowed a single LSI chip to replace hundreds or even thousands of gates that otherwise would be implemented in a board full of 7400-series TTL chips. The most well-known users of a ULA are the popular Sinclair ZX 81 and ZX Spectrum home computers.3

A ULA was based on a matrix of identical cells that were wired to form the logic gates. Around the edges of the chip, standardized peripheral cells provided the desired I/O capabilities. The diagram below shows a typical cell in the matrix. The cell contains multiple transistors and resistors, which are mostly unconnected by default. The ULA is customized by creating connections between the components to build a set of logic gates.

Layout and schematic of a ULA matrix cell. From Ferranti Quick Reference Guide.

Layout and schematic of a ULA matrix cell. From Ferranti Quick Reference Guide.

The photo below shows the fake chip with the metal layers removed, revealing the transistor array underneath. Each small green/yellow rectangle is a transistor; there are nearly 1000 of them. Note the repeated pattern of cells in the matrix,1 as well as the different peripheral cells around the outside. The density of transistors is fairly low; the chip has empty columns to provide room to route the metal layer.

Die photo of the fake 8086 showing the underlying silicon. The metal layers were removed for this photo.

Die photo of the fake 8086 showing the underlying silicon. The metal layers were removed for this photo.

The fake chip uses bipolar transistors,4 completely different from the NMOS transistors in the 8086 processor. The closeup below shows transistors (the striped rectangles) and the two layers of metal wiring connecting them. (The genuine 8086 only has one metal layer, so the fake chip is probably more recent, from the 1980s.)

A closeup of the fake chip showing transistors.

A closeup of the fake chip showing transistors.

There is no manufacturer printed on the die of the fake chip. The matrix cells don't look like the Ferranti cells. The photo below shows a ULA built by Plessey, another ULA manufacturer. That die has a smaller transistor matrix than my chip, but the overall structure is roughly similar, so Plessey might be the manufacturer.

A Plessey ULA die. From "Computer Aided Design and New Manufacturing Methods for Electronic Materials", 1985.

A Plessey ULA die. From "Computer Aided Design and New Manufacturing Methods for Electronic Materials", 1985.

The photo below shows another detail of the fake chip. Matrix cells are at the top. The peripheral cell below has much larger transistors for I/O. (There are also resistors in the brownish regions, but they aren't really visible.) The upper metal layer consists of horizontal wiring, while the lower metal layer is mostly vertical. The thick metal line at the right is for power (or perhaps ground) and is connected to a horizontal power distribution trace at the bottom.

Detail of the fake 8086, showing transistors, resistors, and metal wiring.

Detail of the fake 8086, showing transistors, resistors, and metal wiring.

To summarize, the position of the transistors and resistors in the ULA is fixed. This allows the same underlying silicon wafers to be manufactured for all the customers, keeping volume high and costs low. But by customizing the metal wiring layers, the ULA can be completed to fulfill the logic functions each customer needs.

Conclusions

Why would someone go to all the work of relabeling a $3.80 chip? I guess someone had a stack of old custom ICs with no value. By re-labeling them, they could at least get something for them. It hardly seems worth the effort, but I guess they make up for it in volume. The seller has sold over 215 of these 8086's, although I don't know if they were all fake or if I was unlucky. In any case, the seller gave me a prompt refund.

The fake 8086 for sale on eBay.

The fake 8086 for sale on eBay.

The seller's feedback (below) shows a lot of complaints about fake chips. Even so, the seller's feedback is 99.2% positive, so I suspect that there are just a few fake chips mixed in with many types of real chips. It's also possible that most vintage 8086s are purchased by IC collectors who never test the chip.

Feedback on the seller.

Feedback on the seller.

I've been asked if this chip would actually work as an 8086. Sometimes counterfeiters sell a lower-quality chip in place of the real thing, such as the fake expensive op amps found by Zeptobars. But other times the fake chip is unrelated, such as the vintage bipolar RAM chips that I determined was a Touch-Tone dialer. Since an 8086 has 29,000 MOS transistors but the fake chip has under 1000 bipolar transistors, it's clear that this chip won't function as an 8086.

The moral is to always be careful when you're buying chips, since you never know what you might find. Semiconductor counterfeiting is a big business and I've encountered just a tiny piece of it. I plan to write more about reverse-engineering the (real) 8086, so follow me on Twitter at @kenshirriff for updates. I also have an RSS feed.

Notes and references

  1. I think the fake chip has a matrix of 8×12 cells, with each of the large "IXI" patterns composed of four cells. 

  2. At first, a ULA was designed by hand by an engineer drawing the interconnects on paper, but by the 1980s, CAD software automated most of the design and testing. The CAD station below is pretty wild.

    CAD system for designing ULAs at Plessey. From "Computer Aided Design and New Manufacturing Methods for Electronic Materials", 1985.

    CAD system for designing ULAs at Plessey. From "Computer Aided Design and New Manufacturing Methods for Electronic Materials", 1985.

      

  3. The book The ZX Spectrum ULA: How to design a microcomputer discusses Ferranti ULAs in detail along with a complete explanation of the ULA in the ZX Spectrum. 

  4. Early ULAs used bipolar transistors, with CMOS circuitry introduced later. Different logic families were supported, depending on the needs of the application. Ferranti's ULAs had three types of matrix cells: RTL (resistor-transistor logic), CML (current-mode logic), and buffered current-mode logic. Other ULAs supported fast ECL (emitter-coupled logic) or standard TTL (transistor-transistor logic). 

How the 8086 processor handles power and clock internally

One under-appreciated characteristic of early microprocessors is the difficulty of distributing power inside the integrated circuit. While a modern processor might have 15 layers of metal wiring, chips from the 1970s such as the 8086 had just a single layer of metal, making routing a challenge. Similarly, clock signals must be delivered to all parts of the chip to keep it in synchronization.

The photo below shows the 8086's die under a microscope. The metal layer on top of the chip is visible, with the silicon substrate and polysilicon wiring hidden underneath. Around the outside of the die, tiny bond wires connect pads on the die to the external pins. The 8086 has a power pad at the top and ground pads at the top and bottom. Each power and ground pad has two bond wires connected to support twice the current. You can see the wide metal traces from the power and ground pads; these distribute power throughout the chip.

Die photo of the 8086 showing power connection (top) and ground connections (top, bottom). The clock circuitry is at the bottom.

Die photo of the 8086 showing power connection (top) and ground connections (top, bottom). The clock circuitry is at the bottom.

Timing in the 8086 is controlled by two internal clock signals. An external oscillator provides a clock signal to the 8086 through the clock input pad at the bottom. The on-chip clock driver circuitry generates two high-current clock signals from this external clock. Note that the clock driver takes up a not-insignificant part of the chip.

In this blog post, I'll discuss how the 8086 routes power and clock signals through the chip, and how the clock driver circuit generates the necessary clock pulses.

Power distribution

The 8086 is constructed with three layers that can be used for wiring. The metal layer on top is best for wiring, since metal has low resistance. Underneath the metal is a layer of polysilicon wiring, made from a special type of silicon. Polysilicon has higher resistance than metal, but can still be used to transmit signals across the chip. The silicon substrate is where the transistors are formed. Silicon has relatively high resistance, so it is only used for short-distance connections, such as inside a gate.

Power routing in a chip like the 8086 creates a topological puzzle of sorts: The metal layer is the only practical layer for routing power and ground, due to its low resistance. Power and ground must be provided to nearly every gate in the chip.1 And since the chip has a single metal layer, power and ground can't cross.

The diagram below highlights these metal wiring networks in the 8086. Power, connected to the power pin at the top, is shown in red, traveling throughout the chip. A major branch flows down and to the right from the power pin, then splitting into multiple paths. Power also travels around the border of the entire chip, supplying the I/O pins.

Power (red) and ground (blue, green) on the metal layer.

Power (red) and ground (blue, green) on the metal layer.

There are two ground pins. The wiring in blue is connected to the upper ground pin, while the wiring in green is connected to the lower ground pin. The blue ground wiring has a large branch downwards through the center of the chip, branching in complex directions. The green ground wiring flows along the bottom, left, and right sides of the chip, supporting the I/O pins, as well as connected to the microcode ROM in the lower right.

The power wires get thinner from their source to their final destination as they branch or deliver power along the way and the current diminishes. This is visible in the ground wire to the address / data pins, below. At the left, the ground wire below the pins is very wide, but it tapers off to the right. In other words, at the left, the wire must handle current for all the pins, but at the right the wire is supporting just the remaining pin.

The ground connection to the Address/Data pins gets progressively thinner. (Left side of chip, rotated 90°)

The ground connection to the Address/Data pins gets progressively thinner. (Left side of chip, rotated 90°)

The metal layer is used for many signals besides power and ground; it is the best layer for delivering signals due to its low resistance. However, the extensive power and ground wiring constrains the other uses of the metal layer. To avoid intersections, most of the metal signal lines run parallel to the power lines; the polysilicon layer underneath is used to run perpendicular signals. But what happens if metal wires need to cross a power or ground line? The solution is to use a "crossunder", where the signal goes down to the polysilicon layer and crosses under the power line, popping back up on the other side,3 as shown below.

Signals in the metal layer crossing under the power line by using polysilicon crossunders.

Signals in the metal layer crossing under the power line by using polysilicon crossunders.

While power and ground are almost entirely routed in the metal layer, there are a few places where this breaks down and a crossunder is used for power. This typically happens near the end of the line, where the current is small. One example is shown below, where ground passes through two polysilicon crossunders. To reduce the resistance, these crossunders are much wider than the crossunders for signals and also use the silicon and polysilicon layers together. The small circles are connections (called vias) between the metal layer and the polysilicon layer.

Composite photo showing polysilicon crossunders for ground that pass under signal lines.

Composite photo showing polysilicon crossunders for ground that pass under signal lines.

The silicon layer plays a minor part in routing power. In particular, many gates are stretched out to reach the power and ground on either side. The photo below shows some gates in the 8086. Note the large doped silicon regions (white) that extend to reach the power and ground lines. Only a small part of this silicon is used for transistors, while the rest looks like wasted space. However, these empty silicon regions connect the gate to the metal power and ground wires. Since silicon has relatively high resistance, wide regions are used for these connections, and over short distances.

The doped silicon forming gates can be extended to reach the power and ground lines. The metal layer was removed for this photo so the power and ground lines are illustrated.

The doped silicon forming gates can be extended to reach the power and ground lines. The metal layer was removed for this photo so the power and ground lines are illustrated.

Other power routing issues arose as the 8086 was revised and became physically smaller. As manufacturing technology improved, Intel performed "die shrinks", keeping the same circuitry but scaling it down uniformly to produce a smaller die. Unfortunately, shrinking the power lines reduces the current they can handle. The solution was beef up the power lines around the edge of the chip, while allowing the internal circuitry and wiring to shrink. This can be seen in the photo below; the lower-right corner of the smaller 8086 has much more power wiring, for instance. (I wrote more about the 8086 die shrink here.)

Two versions of the 8086 die, at the same scale. The die on the right is a later version of the 8086, reduced in size.

Two versions of the 8086 die, at the same scale. The die on the right is a later version of the 8086, reduced in size.

The processor clock

Almost all computers use a clock signal to control the timing of the processor.4 Like many microprocessors, the 8086 uses a two-phase clock internally.5 In a two-phase clock, there are two clock signals: when the first clock is high, the second is low, and vice versa, as shown below. One set of circuitry is enabled by the first clock, while a second set of circuitry is enabled by the second clock. The 8086's circuitry requires that the two clock phases are non-overlapping —there is a gap after one goes low before the other goes high—and asymmetrical.6

A two-phase clock consists of two clock signals with opposite polarity.

A two-phase clock consists of two clock signals with opposite polarity.

In modern processors, clock routing is complex because the clock signals must reach all parts of the chip at the same time. Modern processors use a hierarchy of clock paths, balancing the time along each path, and often provide separate buffering for each path. In comparison, the 8086's clock routing is straightforward because its 5 to 10 MHz clock7 is orders of magnitude slower than modern processors. At these comparatively low speeds, the length of the path doesn't make much difference, so the 8086's clock signals can meander around the chip.

Clock routing in the 8086. Green is clock while red is the opposite phase clock.

Clock routing in the 8086. Green is clock while red is the opposite phase clock.

The diagram above shows the 8086's clock routing. Phase 1 is in green and phase 2 is in red. At the bottom of the chip, the circuitry that generates the clocks appears as large blobs. From there, the clock signals branch wind around the chip. For the most part, the two clock phases are routed parallel to each other, unlike power and ground, which form opposing branches.

Because the clock signals go to all parts of the chip, they require much more current than typical signals and are routed in the metal layer for the most part. When the clock signals must cross the power lines, they use large crossunders as shown below. Note that the irregularly-shaped clock crossunders are much larger than the crossunders for other signals, such as the Q bus below.

The clock has large crossunders to cross the power wire. The Q bus (which transfers instructions from the instruction queue to the decoder) has much smaller crossunders.

The clock has large crossunders to cross the power wire. The Q bus (which transfers instructions from the instruction queue to the decoder) has much smaller crossunders.

To provide the high-current clock signals, the clock signals have special driver circuitry built from large transistors. The photo below compares one of these driver transistors to a typical logic transistor. The driver transistor is about 300 times as large, so it can provide about 300 times the current. This transistor is constructed as 10 transistors in parallel; the 10 vertical polysilicon lines form the 10 gates. Each clock signal is driven by a pair of large transistors, one to pull the signal high and one to pull the signal low.

A large transistor in the clock driver compared to a neighboring logic transistor.

A large transistor in the clock driver compared to a neighboring logic transistor.

The photo below shows the clock driver circuitry. This circuit splits the external clock signal into two phases, makes the phases non-overlapping, and amplifies them. At the left, the pink square is the pad for the externally-supplied clock. The signal passes through a series of transistors, ending with the large driver transistors at the right for the clock signal. The brownish wiring is the polysilicon that forms the gates. Many transistors have zig-zagging gates to fit a larger transistor into the available space.

The clock driver circuitry on the die. The metal has been removed, revealing the large transistors in the circuit. The clock input pin is the purple square on the left.

The clock driver circuitry on the die. The metal has been removed, revealing the large transistors in the circuit. The clock input pin is the purple square on the left.

The schematic below shows the driver circuitry, slightly simplified. The triangles indicate high-current drivers, built from two or three transistors; an inverting input (indicated by a bubble) pulls the output low. At the left, the clock input pin has a small resistor and a diode to provide some protection (like the other input pins). Next, the clock is split into an uninverted phase (top) and an inverted phase (bottom).

Simplified schematic of the clock driver circuitry in the 8086.

Simplified schematic of the clock driver circuitry in the 8086.

The additional circuitry keeps the clocks from overlapping: when one clock is high, it forces the other side low, through the inverted inputs. To see how this works, let's start with the clk in pin high, so clk in and clock are high while clk in and clock are low. Now, suppose the clk in pin input goes low, causing clk in to go low and clk in to go high. However, the output clock can't go high until clock goes low, due to the negative inputs on the buffers. Once that happens, clk in proceeds through the lower drivers, pulling clock high after two gate delays.8 The point of this is that clock and clock don't switch at the same time; after one goes low, there is a delay before the other goes high. This generates the desired non-overlapping clock signals.

Conclusions

The 8086 uses some interesting routing for power, but modern processors operate at a whole different level. While the 8086 required 350 milliamps of current, a modern processor might require over a hundred amps. The 8086 used 3 of its 40 pins for power and ground, compared to a modern Intel Core i5 processor with 128 power pins and 377 ground pins (out of 1151 pins). Although the numerous metal layers in modern chips solved the 8086's routing issues, modern chips have new complications such as multiple power domains that allow unused parts of the chip to be powered down.

Clock routing is much harder on modern processors since at multi-gigahertz speeds, even an extra millimeter of path can affect the clock. To deal with this, modern processors use techniques such as H-trees or grids to distribute the clock, rather than the 8086's meandering paths. While the 8086 has a simple circuit to generate the two-phase clock, modern processors often use a phase-locked loop (PLL) to synthesize the clock and use multiple circuits scattered across the chip to generate and control clock signals.

Even though the 8086 is much simpler than modern processors, it contains a lot of interesting circuitry. I plan to reverse-engineer more of the 8086, so follow me on Twitter at @kenshirriff for updates. I also have an RSS feed.

Notes and references

  1. Power and ground must be provided to almost every gate in the chip since a standard NMOS gate requires ground for its pull-down network and power for its pull-up resistor. There are a few exceptions, though. The 8086 uses some dynamic logic gates, especially in the ALU for speed. These gates are pulled high by the clock, so they don't need a direct power connection. The 8086 also uses some pass-transistor XOR gates, which are pulled low by the inputs, so they don't need ground.

    The microcode ROM forms a large region with no power connections, just ground. This is because each row in the ROM is implemented as a very large NOR gate with the power pull-up on the right-hand edge. Thus, the ROM gates all have power and ground, even though it looks like the ROM lacks power connections. 

  2. Integrated circuits often have power and ground on opposite corners or opposite sides of the chip. This placement makes it easier to construct the non-intersecting power and ground networks in the chips. The 8086 is slightly unusual to have power and ground on diagonally-opposite pins, but then a second ground pin close to the power pin. The solution is to have tree-like branching networks for power and ground. These networks are interdigitated, meshed like fingers to reach all parts of the chip.2 

  3. Crossunders are used for many wire crossings, not just power, but power wiring is a key contributor. Typically, metal wiring is used for signals in one direction, while polysilicon wiring is used for signals in the perpendicular direction. (These directions vary in different parts of the chip, depending on the predominant direction for signals.) Thus, signals for the most part can travel unimpeded. Even so, signals often bounce from layer to layer to make the routing work. 

  4. While almost all computers are synchronous and operate with a clock, the IAS machine architecture (popular in the 1950s) was asynchronous, operating without a clock. Instead, each circuit would send a pulse to the next when it was done, triggering the next step. Many early computers of the 1950s were based on the IAS machine architecture, including CYCLONE, ILLIAC, JOHNNIAC, MANIAC, SEAC, and the IBM 701. Research into asynchronous computing continues (link, link), but synchronous designs are dominant. 

  5. Among other things, processors use the clock to prevent unwanted feedback in the circuitry. For instance, consider a program counter with a circuit to increment it and feed the result back to the program counter. You don't want the new value to get repeatedly incremented.

    One approach is to use edge-sensitive circuits (flip flops) that will update that value in the program counter at the moment the clock goes high. Thus, there will be a single update as desired. However, with a two-phase clock, the circuit can be built from level-sensitive latches, which are much simpler than edge-sensitive flip flops. The idea is that when the first clock is high, the first half of the circuit receives input and does its logic calculations When the second clock is high, the second half of the circuit receives input from the first half and does any necessary calculations, while the first half is blocked. The point is that only half of the circuitry can update at any time, preventing uncontrolled feedback. 

  6. The 8086 has strict requirements on its input clock, which must be high for 1/3 of the time. The clock signal into the 8086 was typically produced by an 8284 chip and a quartz crystal. This chip divided its input clock by 3 to generate the 33% duty cycle clock required by the 8086.  

  7. Because the 8086 used dynamic logic, it also had a minimum clock speed of 2 MHz. If the clock ran slower than this, there was a risk of charges leaking away before they were refreshed, causing failures. The minimum clock speed was inconvenient for debugging, since you couldn't slow down or stop the clock. 

  8. This is a somewhat handwaving description of the clock driver circuit. In particular, I'm not sure what happens when one transistor is pulling a signal high and another is pulling the same signal low. An accurate simulation would depend on the relative sizes of the two transistors. 

Reverse-engineering the adder inside the Intel 8086

The Intel 8086 processor contains many interesting components that can be understood through reverse engineering. In this article, I'll discuss the adder that is used for address calculations. The photo below shows the tiny silicon die of the 8086 processor under a microscope. The left part of the chip has the 16-bit datapath including the registers and the Arithmetic-Logic Unit (ALU); you can see the pattern of circuitry repeated 16 times. The rectangle in the lower-right is the microcode ROM, defining the execution of each instruction.

Die photo of the 8086 microprocessor, highlighting the 16-bit address adder.  The microcode ROM is in the lower right. The metal layer has been removed for this photo, revealing the silicon and polysilicon underneath. The colors are due to thin-film effects from partially-removed oxide layers.

Die photo of the 8086 microprocessor, highlighting the 16-bit address adder. The microcode ROM is in the lower right. The metal layer has been removed for this photo, revealing the silicon and polysilicon underneath. The colors are due to thin-film effects from partially-removed oxide layers.

The 16-bit adder, the topic of this post, is in the upper left. The magnified view shows how the adder is constructed from 16 stages, one for each bit. The upper row handles the top bits (15-8) and the lower row handles the low bits (7-0).1 Studying the die reveals how this 16-bit adder was optimized through clever circuit design, specialized logic gates, and careful layout techniques.

How the adder is used in the 8086

You might wonder why the 8086 contains both an adder and an ALU (arithmetic-logic unit). The reason is that the adder is used for address calculations, while the ALU is used for data calculations. The 8086 prefetches instructions using a "Bus Interface Unit", which runs semi-independently from the "Execution Unit" that executed instructions. It would have been difficult for the Bus Interface Unit and the Execution Unit to share the ALU without conflicts. By providing both an adder2 and the ALU, the two calculations can take place in parallel.

Microprocessors of the early 1970s typically had 16-bit addresses, capable of accessing 64 kilobytes of memory. At first, 64 kilobytes seemed like more memory than anyone would need (or afford), but as the price of memory chips plunged, the demand for memory grew.4 To support a larger address space, Intel added segment registers to the 8086, a hack that allowed the processor to access a megabyte of memory but led to years of gnashed teeth. The concept is to break memory into 64-kilobyte segments. A segment register specifies the start of the memory segment, and a 16-bit address indicates an address within that segment. These are combined in the adder, as shown below, to obtain the memory address. One downside is that accessing regions of memory larger than 64 kilobytes is difficult; the segment register must be modified to get outside the current segment.3

The segment register and the offset are added to create a 20-bit physical address.
From iAPX 86,88 User's Manual, page 2-13.

The segment register and the offset are added to create a 20-bit physical address. From iAPX 86,88 User's Manual, page 2-13.

How does the 16-bit adder compute a 20-bit address? The trick is that since the segment register is shifted 4 bits, the adder sums the 16 bits of the segment register and the top 12 bits of the offset. The four low bits of the offset bypass the adder since they are unchanged. For other purposes (such as incrementing the instruction counter), the adder operates on unshifted 16-bit addresses. Thus, the register circuitry has logic to feed either shifted or non-shifted values to the adder.

The diagram below, from the 8086 patent, shows how the adder sits between the segment registers and the address pins, computing the address. In the patent, the segment registers were named RC, RD, RS, and RA, not their current names: CS, DS, SS, and ES.

The adder, highlighted in yellow, is a key part of the Bus Interface Unit.
The upper register file (separate from the general-purpose registers) is connected to the adder.
IND and OPR are internal registers, not visible to the programmer.
From the 8086 patent.

The adder, highlighted in yellow, is a key part of the Bus Interface Unit. The upper register file (separate from the general-purpose registers) is connected to the adder. IND and OPR are internal registers, not visible to the programmer. From the 8086 patent.

The adder implementation

If you've studied digital logic, you may be familiar with the full adder, a building-block for adding binary numbers. Specifically, a full adder takes two bits and a carry-in bit. It adds these three bits and outputs the 1-bit sum, as well as a carry-out bit. (For instance 1+0+1 = 10 in binary, so the carry-out is 1 and the sum bit is 0.) A 16-bit adder can be created by joining 16 full-adders, with the carry-out from one feeding into the carry-in of the next. Just as you add two decimal numbers, moving carries to the next column on the left, each full adder adds one column in the binary numbers, and the carry is passed on to the left.

A full adder can be implemented in different ways; the 8086's circuit is shown below. (This circuit is repeated 16 times in the 16-bit adder.) Each adder stage takes two inputs (at the bottom) and the carry-in (inverted, at the right). These are summed to form a 1-bit sum output (bottom) and a carry-out (at the left). The sum bit is formed by the two exclusive-NOR gates that combine the two inputs and the carry-in.5 The output passes through a tri-state buffer (at the top), allowing it to be connected to an internal data bus.6

Schematic of one stage of the 8086's adder. The schematic layout corresponds to the physical layout on the chip.

Schematic of one stage of the 8086's adder. The schematic layout corresponds to the physical layout on the chip.

The carry computation uses an optimization called the Manchester carry chain7, dating back to 1959. The problem with addition is carries are slow; in the straightforward approach, each bit sum can't be computed until the carry to the right has been computed. (Similar to computing 99999999+1 with long addition; each digit requires you to carry the one.) If each bit must wait for the previous carry, addition becomes a slow, serial process.

The idea behind the Manchester carry chain is to decide, in parallel, if each stage will generate a carry, propagate an existing carry, or block any carry. Then, the carry can rapidly flow through the "carry chain" without sequential evaluation. To understand this, consider the cases when adding two bits and a carry-in. For 0+0, there will be no carry, regardless of any carry-in. On the other hand, adding 1+1 will always produce a carry, regardless of any carry-in; this case is called "carry generate". The interesting cases are 0+1 and 1+0; there will be a carry-out if there was a carry-in. This case is called "carry propagate" since the carry-in propagates through the stage unchanged.

The "carry generate" and "carry propagate" signals are used to open or close switches (i.e. transistors) in the carry line. For "carry propagate", carry-in is connected to carry-out, so the carry can flow through. Otherwise, the incoming carry is disconnected. For "carry generate", a carry signal is sent to carry-out. Since these switches can all be set in parallel, carry computation is quick. There is still some propagation delay as the carry-in flows through the switches, potentially from bit 0 all the way to bit 15, but this is much faster than computing the carry through a sequence of logic gates.

Four stages of the adder, with the carry chain indicated. In this photo, the metal layer on top of the chip is visible, mostly obscuring the polysilicon and silicon underneath. The input and output wiring for each stage is at the bottom.

Four stages of the adder, with the carry chain indicated. In this photo, the metal layer on top of the chip is visible, mostly obscuring the polysilicon and silicon underneath. The input and output wiring for each stage is at the bottom.

The carry chain is visible on the die; the photo above shows four stages of the adder. The horizontal lines are the metal wiring: control signals, ground, and power (the thick line near the bottom). The silicon circuitry is barely visible underneath the metal. The carry chain wires are interrupted at each stage, to connect to the transistors underneath, and the new carry continues on to the next stage.

Carry-skip

Careful examination of the adder shows that while the 16 single-bit stages are very similar, they are not all identical. The extra circuitry indicated below turns out to be a performance optimization called the carry-skip adder.

These two stages of the 8086's adder are almost identical, except for the circuitry indicated by the arrow. In this photo, the metal and polysilicon layers were removed, showing the underlying silicon.

These two stages of the 8086's adder are almost identical, except for the circuitry indicated by the arrow. In this photo, the metal and polysilicon layers were removed, showing the underlying silicon.

The idea of carry-skip is to skip over some of the stages in the carry chain if possible, reducing the worst-case delay through the chain. For example, if there is a carry-in to bit 8, and the carry propagate is set for bits 8, 9, 10, and 11, then it can be immediately determined that there is a carry-in to bit 12. Thus, by ANDing together the carry-in and the four carry-propagate values, the carry-in to bit 12 can be calculated immediately for this case. In other words, the carry skips from bit 8 to bit 12. Likewise, similar carry-skip circuits allow the carry to skip from bit 2 to bit 4, and bit 4 to bit 8. These carry-skip circuits reduced the adder's worst-case computation time.8

Regular logic vs dynamic logic

The performance of the adder is critical to the overall speed of the 8086, so it uses some interesting techniques to implement fast logic gates. Some of the adder's gates are built with dynamic logic. A standard logic gate is straightforward: you put signals in and you get the result out. In contrast, a dynamic logic gate uses a periodic clock signal to compute the logic function.9 Since dynamic logic can be faster and more compact, it is used in modern processors, in the form of domino logic.

Dynamic logic depends on a two-phase clock, commonly used for timing in microprocessors of that era. The two-phase clock consists of two clock signals that are active in alternation. First, phase 1 (ɸ1) is high and phase 2 (ɸ2) is low. Then phase 1 is low and phase 2 is high. This cycle repeats at the clock frequency, such as 5 MHz.

The schematic below shows a dynamic NAND gate from the adder. In phase 1, the clock ɸ1 turns on the lower transistor, pulling the input to the inverter low. Phase 2 is the evaluation phase, where the logic function is computed. If both inputs are high, the two input transistors will turn on, allowing clock ɸ2 to pass through to the inverter input, pulling it high and causing the output to be low. On the other hand, if either input is low, the clock ɸ2 cannot pass through the transistors. Instead, the inverter input remains low from the previous phase, due to the stray capacitance of the wire, so the output is high. Thus, in either case, the circuit implements the NAND functionality, with a low output only if the inputs are both high. Note that unlike a standard logic gate, the dynamic logic gate's output is only valid during clock phase 2.

Implementation of a NAND gate using dynamic logic. The gate is controlled by the two-phase clock signals.

Implementation of a NAND gate using dynamic logic. The gate is controlled by the two-phase clock signals.

The diagram below shows how the dynamic NAND gate is physically implemented on the die; the layout of the schematic corresponds to the physical layout. In the photo, the metal layer has been removed, showing the silicon underneath. The yellowish regions are doped, conductive silicon. The brownish, metallic lines are polysilicon, a special type of silicon used as wiring. A transistor is formed when polysilicon crosses doped silicon; the polysilicon is the gate, controlling conduction between the silicon on either side. The transistors have complex, twisted shapes to fit the circuitry in as little space as possible. Each transistor was given a particular size for the best balance between speed and power consumption. For example, the input transistors are small, while the inverter transistor is much larger.

A dynamic NAND gate in the 8086, with corresponding schematic. The metal layer has been removed for this photo, revealing the silicon and polysilicon.
The layout is slightly different between the lower stages (shown) and the upper stages.

A dynamic NAND gate in the 8086, with corresponding schematic. The metal layer has been removed for this photo, revealing the silicon and polysilicon. The layout is slightly different between the lower stages (shown) and the upper stages.

The diagram below shows the location of a NAND gate in the 8086 chip. The first box zooms in on one of the 16 single-bit adder circuits. The second box shows the position of the NAND gate within the adder. The NAND gate is almost visible in the overall die photo showing how large the features are, compared to a modern chip.

Each stage of the adder has a dynamic NAND gate. The NAND gate in one of these stages is highlighted.

Each stage of the adder has a dynamic NAND gate. The NAND gate in one of these stages is highlighted.

Another interesting dynamic logic gate in the adder is exclusive-NOR (XNOR, the complement of XOR), which outputs 1 if both inputs are the same, and 0 otherwise. The schematic below shows the implementation of XNOR.10 As before, during phase 1, the inverter input is pulled to ground. In the evaluation phase, clock ɸ2 can pull the inverter input high through either the upper pair of transistors or the lower pair of transistors. This will happen if the inputs are different (input 2 is high and input 1 is low, or if input 1 is high and input 2 is low), causing the inverter output to be low. Otherwise, the inverter input will remain low from phase 1, and the inverter output will be high. Thus, the output is high if the two inputs are equal, and low otherwise, the desired XNOR behavior.

A dynamic XNOR gate, as implemented on the 8086.

A dynamic XNOR gate, as implemented on the 8086.

Conclusions

The adder in the 8086 has a critical role, computing addresses for every memory access. A 16-bit adder may seem like a straightforward circuit, but the adder in the 8086 was highly optimized so it wouldn't be a performance bottleneck. To speed up carry processing, the adder uses a Manchester carry chain, with carry-skip circuitry on top of that. The adder uses three different designs for logic gates: standard NMOS gates, pass-transistor logic, and dynamic logic. Even at the transistor level, the circuit is highly optimized, with transistors of all shapes and sizes carefully packed together.

The Intel 8086 is an interesting processor with complex circuits but still simple enough that its circuits can be studied under a microscope. The 8086 has 29,000 transistors and features that are a few micrometers large. In comparison, modern processors have billions of transistors and transistors that are measured in nanometers. While the progress of Moore's law has yielded great improvements in modern processors, the processors of the 1970s are much better for reverse engineering.

If you're interested in the 8086, I wrote about the 8086 die, its die shrink process and the 8086 registers earlier. I plan to write more about the 8086 so follow me on Twitter @kenshirriff or RSS for updates.

Notes and references

  1. The adder's layout has bits 15-8 in the top and bits 7-0 below. This layout is a consequence of the bit ordering in the data path: the bits are interleaved 15-7-14-6-...-8-0, instead of linearly 15-14-...-0. The reason behind this interleaving is that it makes it easy to swap the two bytes in the 16-bit word, by swapping pairs of bits. The adder is split into two rows so it fits into the horizontal space available. Even with the tall, narrow layout of an adder stage, a bit of the adder is wider than a bit of the register file. Splitting the adder into two rows keeps the bit spacing approximately the same, avoiding long wires between the register file and the adder. 

  2. Many early microprocessors (such as the 6502 and Z-80) had an incrementer for the program counter, separate from the ALU. (One motivation was the ALU was 8 bits while the program counter was 16 bits.) The 68000 had address adders, separate from the ALU. 

  3. The 8086's segmented architecture led to programming with near pointers and far pointers. A near pointer was a 16-bit pointer that could be held in a register and manipulated easily, but couldn't access more than 64 kilobytes. A far pointer was the combination of an offset and a segment value, resulting in a pointer that could access the full memory but required twice the storage for each pointer. Comparing far pointers was problematic, since they were not unique; multiple offset/segment combinations could address the same physical memory address. 

  4. In contrast to the 8086, the Motorola 68000 microprocessor (1979) had 32-bit registers. Its address bus was 24 bits wide, allowing it to access 16 megabytes of memory directly, without segment registers. The 68020 (1984) extended the address bus to 32 bits, allowing 4 gigabytes of memory to be accessed.

    The 68000 was provided in a 64-pin package, providing plenty of pins for the 24 address lines and 16 data lines. In comparison, Intel didn't like large IC packages and used a 40-pin package for the 8086. As a result, the 8086 used 20 pins for the address lines, and reused (i.e. multiplexed) 16 of these pins for data lines. The 8086 also multiplexed many of the control pins, complicating system design. 

  5. The desired sum output is input1⊕input2⊕carry-in. In the 8086 adder, the carry-in is inverted, there are two exclusive-NOR gates, and an inverter in the path. Thus, the circuit has four inversions in total; since this number is even, they cancel out and the circuit produces the desired exclusive-OR of the three values. 

  6. A tri-state buffer has three different outputs: high (1), low (0), or high-impedance (hi-Z). In the hi-Z state, the buffer is not outputting anything and is electrically disconnected. The motivation for this is that multiple signals can be connected to a bus through tri-state buffers. By enabling one buffer and disabling the rest, the desired signal can be output to the bus. (Regular buffers wouldn't work because electrical problems would arise if one buffer outputs a 1 and another outputs a 0.) Open-collector outputs are an alternative for connecting multiple signals to a bus. 

  7. The Manchester carry chain was developed by the University of Manchester and described in the article Parallel addition in digital computers: a new fast 'carry' circuit, 1959. It was used in the Atlas supercomputer (1962).

    The original diagram showing how the Manchester carry chain is implemented, from 1959.

    The original diagram showing how the Manchester carry chain is implemented, from 1959.

    The diagram above, from the original article, shows the structure of the Manchester carry chain. Although the switches look like relay contacts, the carry chain was implemented with transistors (2N501 micro-alloy diffused-base transistors). The structure of the carry chain in the 8086 is similar to the diagram above, but the top switches are replaced by XNOR gates. 

  8. A few notes on the carry-skip implementation. Conceptually the signals are ANDed together, but the implementation uses a NOR gate since the carry and propagate signal inputs are inverted. For carry-skip to be useful, computing the carry with a gate must be faster than the carry chain, which was achieved by skipping four stages at a time. (I don't know why the first stage was implemented with a smaller skip.) Note that carry-skip helps in specific cases (which include the worst-case), so the regular carry circuitry is still required. 

  9. Processors always have a maximum clock speed, the fastest they can run. (The original 8086 ran at up to 5 MHz, while the later 8086-1 supported 10 MHz.) However, due to the use of dynamic logic, the 8086 also had a minimum clock speed of 2 MHz. If the clock ran slower than that, there was a risk of the charge on a wire leaking away before it was used, causing errors. 

  10. Surprisingly, the adder uses a completely different implementation for the upper XNOR gate; it is implemented with pass-transistor logic rather than dynamic logic. I think the motivation is that the carry-in signal to these XNOR gates is not quite synchronous, due to propagation delay through the carry chain. Dynamic logic has the disadvantage that if an input signal switches low after the clock, the gate can't recover; the circuit has been charged and won't be discharged until the next clock phase. In particular, if a carry comes in after clock phase 2 has started, it can't switch the output high. By using non-dynamic logic, the output will switch correctly when the carry arrives, even if it is not aligned with the clock.

    Pass-transistor logic is different from "regular" NMOS logic gates, but provides a more efficient way of implementing XNOR. The circuit is similar to the XNOR in the Z-80 microprocessor, which I've described earlier, so I won't go into more detail here.

    Pass-transistor logic is also used to implement the input and output latches on the adder. On the patent diagram shown earlier, these latches appear as "TMP B" and "TMP C" on the input side of the adder and "TMP ɸ1" on the output side. These latches are necessary because otherwise the adder's output would be connected directly to the input, causing the adder to repeatedly add. The implementation of these latches is simply clocked pass transistors in the path, holding the value by capacitance. 

Inside the 8086 processor, tiny charge pumps create a negative voltage

Introduced in 1978, the revolutionary Intel 8086 microprocessor led to the x86 processors used in most desktop and server computing today. This chip is built from digital circuits, as you would expect. However, it also has analog circuits: charge pumps that turn the 8086's 5-volt supply into a negative voltage to improve performance.1 I've been reverse-engineering the 8086 from die photos, and in this post I discuss the construction of these charge pumps and how they work.

Die photo of the 8086 microprocessor. The ALU and registers are on the left. The microcode ROM is in the lower right. Click for a high-resolution image.

Die photo of the 8086 microprocessor. The ALU and registers are on the left. The microcode ROM is in the lower right. Click for a high-resolution image.

The photo above shows the tiny silicon die of the 8086 processor under a microscope. The metal layer on top of the chip is visible, with the silicon hidden underneath. Around the outside edge, bond wires connect pads on the die to the chip's 40 external pins. However, careful examination shows that the die has 42 bond pads, not 40. Why are there two extra ones?

An integrated circuit starts with a silicon substrate, and transistors are built on this. For high-performance integrated circuits, it is beneficial to apply a negative "bias" voltage to the substrate. 2 To obtain this substrate bias voltage, many chips in the 1970s had an external pin that was connected to -5V,3 but this additional power supply was inconvenient for the engineers using these chips. By the end of the 1970s, however, on-chip "charge pump" circuits were designed that generated the negative voltage internally. These chips used a single convenient +5V supply, making engineers happier.

A closeup of the 8086 chip showing the silicon die and the bond wires connecting it to the lead frame.

A closeup of the 8086 chip showing the silicon die and the bond wires connecting it to the lead frame.

On the 8086 die, the two extra pads feed this negative bias voltage to the substrate. The photo above shows the silicon die as mounted in the chip, with bond wires connected to the lead frame that forms the pins. Looking carefully, there are two small gray squares above and below the die; each connected to one of the "extra" bond pads. The charge pumps on the 8086 die generate a negative voltage, which passes through the bond wires to these squares, and then through the metal plate underneath to the 8086's substrate.

How the charge pumps work

The photo below highlights the two charge pumps in the 8086. I'll discuss the top one; the bottom one has the same circuitry but a different layout to fit in the available space. Each pump has driver circuitry, a large capacitor, and a pad with the bond wire to the substrate. Each pump is located next to one of the 8086's two ground pads, presumably to minimize electrical noise.

Die photo of the 8086 microprocessor, zooming in on the two substrate bias generators.

Die photo of the 8086 microprocessor, zooming in on the two substrate bias generators.

You might wonder how a charge pump can turn a positive voltage into a negative voltage. The trick is a "flying" capacitor, as shown below. On the left, the capacitor is charged to 5 volts. Now, disconnect the capacitor and connect the positive side to ground. The capacitor still has its 5-volt charge, so now the low side must be at -5 volts. By rapidly switching the capacitor between the two states, the charge pump produces a negative voltage.

On the left, the "flying capacitor' is charged to 5 volts. By switching ground to the upper terminal, the capacitor now outputs -5 volts. (source)

On the left, the "flying capacitor' is charged to 5 volts. By switching ground to the upper terminal, the capacitor now outputs -5 volts. (source)

The 8086's charge pump circuit uses MOSFET transistors and diodes to switch the capacitor between the two states, with an oscillator to control the transistors, as shown in the schematic below. The ring oscillator consists of three inverters connected in a loop (or ring). Because the number of inverters is odd, the system is unstable and will oscillate.5 For instance, if the input to the first inverter is 0, its output will be 1, the second output will be 0, and the third output will be 1. This will flip the first inverter, and the "flip" will travel through the loop causing oscillation. To slow down the oscillation rate, two resistor-capacitor networks are inserted into the ring. Since the capacitors will take some time to charge and discharge, the oscillations will be slowed, giving the charge pump time to operate.4

Schematic of the charge pump used in the Intel 8086 to provide negative substrate bias.

Schematic of the charge pump used in the Intel 8086 to provide negative substrate bias.

The outputs from the ring oscillator are fed to the transistors that drive the capacitor. In the first step, the upper transistor is switched on, causing the capacitor to charge through the first diode to 5 volts with respect to ground. The second step is where the magic happens. The lower transistor turns on, connecting the high side of the capacitor to ground. Since the capacitor is still charged to 5 volts, the low side of the capacitor must now be at -5 volts, producing the desired negative voltage. This goes through the second diode and the bond wire to the substrate. When the oscillator flips again, the upper transistor turns on and the cycle repeats. The charge pump gets its name because it pumps charge from the output to ground.6 The diodes are similar to check valves in a water pump, making sure charge moves in the right direction.

The implementation in silicon

The photo below shows the charge pump as it is implemented on the chip. In this photo, the metal wiring is visible on top, with reddish polysilicon underneath and beige silicon at the bottom. The main capacitor is visible in the center, with H-shaped wiring connecting it to the circuitry on the left. (Part of the capacitor is hidden under the wide metal power trace at the top.) On the right, the substrate bond wire is attached to the pad. A test pattern is below the pad; it has a square for each mask used to produce a layer of the chip.

The charge pump, showing the metal layer.

The charge pump, showing the metal layer.

Removing the metal layer shows the circuitry more clearly, below. The large charge pump capacitor takes up the right half of the photo. Although microscopic, this capacitor is huge by chip standards, about the size of a 16-bit register. The capacitor consists of polysilicon over a silicon region, separated by insulating oxide; the polysilicon and silicon form the plates of the capacitor. On the left side are the smaller capacitors and the resistors that provide the R-C delay for the oscillator. Below them is the oscillator circuitry and the drive transistors.7

An 8086 charge pump, showing the key components. The metal has been removed for this photo, to show the silicon and polysilicon underneath.

An 8086 charge pump, showing the key components. The metal has been removed for this photo, to show the silicon and polysilicon underneath.

One interesting feature of the charge pump is the two diodes, each built from eight transistors in a regular pattern. The diagram below shows the structure of a transistor. Regions of the silicon are doped with impurities to create diffusion regions with desired properties. The transistor can be viewed as a switch, allowing current to flow between two diffusion regions called the source and drain. The transistor is controlled by the gate, made of a special type of silicon called polysilicon. A high voltage on the gate lets current flow between the source and drain, while a low voltage blocks current flow. These tiny transistors can be combined to form logic gates, the components of microprocessors and other digital chips. But in this case, the transistors are used as diodes.

Structure of an NMOS transistor (MOSFET) as implemented in an integrated circuit.

Structure of an NMOS transistor (MOSFET) as implemented in an integrated circuit.

The photo below shows a transistor in the charge pump, viewed from above. As in the diagram, polysilicon forms the gate between the silicon diffusion regions on either side. A diode can be formed from a MOSFET by connecting the gate and drain together (details) through the silicon/polysilicon connection at the bottom of the photo. The silicon can also be connected to the metal layer through a "via". The metal layer was removed for this photo, but faint circles indicate the position of silicon/metal vias.

A transistor in the charge pump circuit. The polysilicon gate separates the transistor's source and drain on either side.

A transistor in the charge pump circuit. The polysilicon gate separates the transistor's source and drain on either side.

The diagram below shows how the two diodes are implemented from 16 transistors. To support the relatively high current of the charge pump, eight transistors are used in parallel for each diode. Note that neighboring transistors share source or drain regions, allowing transistors to be packed densely. The blue lines indicate the metal wires; the metal was removed for this photo. The dark circles indicate connections (vias) between the metal and silicon.

The charge pump has two diodes, each implemented with 8 transistors. The source, gate, and drain are indicated with S, G, and D.

The charge pump has two diodes, each implemented with 8 transistors. The source, gate, and drain are indicated with S, G, and D.

Putting this all together, the upper eight transistors have their sources connected to ground by a metal wire. Their gates and drains connected together by the polysilicon below the transistors, making them into diodes, and they are connected to the capacitor by a metal wire. The lower eight transistors form a second diode; their gates and drains are wired together by the lower metal wire loop. Note how the layout has been optimized; for example, the gates have bent shapes to avoid the vias (black dots).

Conclusions

The substrate bias generator on the 8086 chip9 is an interesting combination of digital circuitry (a ring oscillator formed from inverters) and an analog charge pump. While the bias generator may seem like an obscure part of 1970s computer history, bias generation is still part of modern integrated circuits. It is much more complex in modern chips which have multiple carefully regulated biases in multiple power domains. 8 In a sense it is analogous to the x86 architecture, something that started in the 1970s and is even more popular today, but has become unimaginably more complex in the quest for higher performance.

If you're interested in the 8086, I wrote about the 8086 die, its die shrink process and the 8086 registers earlier. I plan to analyze the 8086 in more detail in future blog posts so follow me on Twitter @kenshirriff or RSS for updates.

Notes and references

  1. Strictly speaking, the entire chip is analog: there's an old saying that "Digital computers are made from analog parts". This saying came from DEC engineer Don Vonada and was published in DEC's Computer Engineering in 1978.

    Vonada's Engineering Maxims (text).

    Vonada's Engineering Maxims (text).

     

  2. Putting a negative bias voltage on the substrate had several benefits. It decreased parasitic capacitance making the chip faster, made the transistor threshold voltage more predictable, and reduced leakage current

  3. Early DRAM memory chips and microprocessor chips often required three supplies: +5V (Vcc), +12V (Vdd) and -5V (Vbb) bias voltage. In the late 1970s, improvements in chip technology allowed a single supply to be used instead. For example, Mostek's MK4116 (a 16 kilobit DRAM from 1977) required three voltages while the improved MK4516 (1981) operated on a single +5V supply, simplifying hardware designs. (Amusingly, some of these chips still kept the Vbb and Vcc pins for backward compatibility but left them unconnected.) Intel's memory chips followed a similar path, with the 2116 DRAM (16K, 1977) using three voltages and the improved 2118 (1979) using a single voltage. Similarly, the famous Intel 8080 microprocessor (1974) used enhancement-mode transistors and required three voltages. An improved version, the 8085 (1976), used depletion-mode transistors and was powered by a single +5V supply. The Motorola 6800 microprocessor (1974) used a different approach for a single supply; although the 6800 was built from the older enhancement-load transistors it avoided the +12 supply by implementing an on-chip voltage doubler, a charge pump that increased the voltage. 

  4. I tried to measure the frequency of the charge pump by looking at the chip's current to see fluctuations due to the charge pump. I measured 90 MHz fluctuations, but I suspect I was measuring noise and not the charge pump's oscillations. 

  5. Because the circuit has an odd number of inverters, it oscillates. If, on the other hand, it had an even number of inverters, it would be stable in two different states. This technique is used in the 8086's registers: a pair of inverters stores each bit (details). 

  6. I've simplified the charge pump discussion slightly. Due to voltage drops in the transistors, the substrate voltage will probably be around -3V, not -5V. (If a chip requires a larger voltage drop, charge pump stages can be cascaded.) For the pump direction, I'm referring to current flow. If you think of it as pumping electrons, the negative electrons are being pumped the opposite direction, into the substrate. 

  7. The oscillator is built from 13 transistors. Seven transistors form the 3 inverters (one inverter has an extra transistor to provide extra output current. The six drive transistors consist of two transistors pulling the output high and four transistors pulling the output low. The layout is strangely different from normal inverter circuitry, probably because the current requirements are different from normal digital logic. 

  8. Bias generators are now available as IP blocks that can be licensed and be plugged into a chip design. For more information on bias in modern chips, see Body bias, Multi bias domain implementation, or this presentation. There is even a standard IEEE 1801 power format that allows IC design tools to generate the necessary circuitry. 

  9. The Intel 8087, the math coprocessor chip that goes along with the 8086, also has a substrate bias generator. It uses the same principles, but unexpectedly has a different circuit, using 5 inverters. I wrote about it in detail here

Reverse-engineering and comparing two Game Boy audio amplifier chips

The Nintendo Game Boy contains an audio amplifier chip for sound through a speaker or headphones. In this post, I reverse-engineer this chip and compare it with the later Game Boy Color chip (reverse-engineered earlier). Unexpectedly the Game Boy Color uses an entirely different amplifier design from the original Game Boy, which may explain why the two systems sound different.

The diagram below shows the Game Boy amplifier's silicon die, with the main functional components labeled.1 The upper-left part of the chip has the two large driver transistors for the speaker output (one to pull the signal low and the other to pull the signal high). The headphone amplifier consists of two nearly-identical blocks: one for the left channel and one for the right. The circuitry for the current sources and current mirrors is shared by both headphone channels. The lower-left of the chip contains digital logic to enable either the speaker amp or the headphone amp, switching when the headphones are plugged in.

The chip with pins and key functional blocks labeled.
The hi-res die photo is courtesy of John McMaster.

The chip with pins and key functional blocks labeled. The hi-res die photo is courtesy of John McMaster.

By examining the die closely, components such as transistors and resistors can be identified. From this, the complete circuit can be determined. In the photo above, the white lines are the chip's metal layer, connecting the components. The silicon itself appears greenish and is underneath the metal. The green squares around the outside are the pads where tiny bond wires connected the silicon die to the chip's 18 pins. Regions of the chip are treated (doped) to change the electrical properties of the silicon. The next sections explain how components are created from these different types of silicon.

NPN transistors

The amplifier chip is built from transistors known as NPN and PNP bipolar transistors, different from the low-power MOS transistors used in processors. These transistors have three connections: the emitter, the base, and the collector. The magnified photo below shows an NPN transistor from above. The slightly different tints in the silicon indicate regions that have been doped to form N and P regions, with dark lines separating the regions. The bubbly silverish areas are the metal layer of the chip on top of the silicon—these form the wires connected to the emitter, base, and collector.

An NPN transistor in the Game Boy Color amplifier chip. The collector (C), emitter (E), and base (B) are labeled, along with N and P doped silicon.

An NPN transistor in the Game Boy Color amplifier chip. The collector (C), emitter (E), and base (B) are labeled, along with N and P doped silicon.

Underneath the photo is a vertical cross-section illustrating how the transistor is constructed. The emitter (E) wire is connected to N+ silicon. Below that is a P layer connected to the base contact (B). And below that is an N+ layer connected (indirectly) to the collector (C). If you look at the vertical cross-section below the 'E', you can find the N-P-N layers that form the transistor.

A different structure (below) is used for the high-current output transistors that drive the speaker. These transistors are much larger and have multiple interlocking "fingers" of the emitter and base, surrounded by the large collector. If you look back at the die photo, you can see two of these transistors filling the upper left part of the die.

A large, high-current NPN output transistor in the Game Boy Color audio amplifier chip. The collector (C), base (B), and emitter (E) are labeled.

A large, high-current NPN output transistor in the Game Boy Color audio amplifier chip. The collector (C), base (B), and emitter (E) are labeled.

PNP transistors

The chip also uses PNP transistors, which have an entirely different construction, as shown in the diagram below. The most obvious difference is that the PNP transistors are round.2 A PNP transistor has a small circular emitter (P-silicon), surrounded by a ring-shaped base region (N-silicon), which in turn is surrounded by the collector (P-silicon). (The emitter metal covers both the emitter and the base, but is only connected to the emitter.) These regions form a P-N-P sandwich horizontally (laterally), unlike the vertical structure of the NPN transistors. Note that although the base region physically surrounds the emitter, the metal connection to the base is further away; the base signal passes through the N region underneath the collector to reach the base region.

A PNP transistor in the Game Boy audio amplifier chip. Connections for the collector (C), emitter (E) and base (B) are labeled, along with N and P doped silicon. The base forms a ring around the emitter, and the collector forms a ring around the base.

A PNP transistor in the Game Boy audio amplifier chip. Connections for the collector (C), emitter (E) and base (B) are labeled, along with N and P doped silicon. The base forms a ring around the emitter, and the collector forms a ring around the base.

Resistors

Resistors are an important component of analog chips. The photo below shows some long, zig-zagging resistors, formed from strips of P silicon, which appears beige in the photo. Its resistance is proportional to the length of the resistor, so large-value resistors have a zig-zag shape to fit in the available space. Because resistors are relatively large and inaccurate, chip designs try to minimize the number of resistors required. Even so, an analog chip like this one requires numerous resistors.

Some resistors in the Game Boy audio amplifier chip. In the center, two resistors in parallel provide a low resistance. The long, meandering resistors provide high resistance.

Some resistors in the Game Boy audio amplifier chip. In the center, two resistors in parallel provide a low resistance. The long, meandering resistors provide high resistance.

The photo below shows seven small resistors, but only the two in the middle are connected (in parallel) to the circuit. These extra resistors allow the resistance to be modified by modifying the metal layer, which is much easier than changing the silicon. (These resistors bias the output transistor, and it appears this is a critical resistance that required adjustment.)

This photo shows seven short resistors but some are not used.

This photo shows seven short resistors but some are not used.

Capacitors

This chip has three large capacitors, one for each amplifier. The photo below shows one of the capacitors. The capacitors are simply a large layer of metal over the underlying silicon, separated by a thin insulating oxide layer. At the top and right of the photo, you can see the connections between the metal wiring and the underlying silicon. In this chip, capacitors are used to ensure the stability of the amplifiers. Because they are large, the three capacitors are easy to spot in the chip die photo.

A capacitor on the Game Boy audio amplifier chip.

A capacitor on the Game Boy audio amplifier chip.

The LM380

The Game Boy amplifier chip has a design very similar to the popular LM380 power audio amplifier chip (1972), so I'll start with an overview of how the LM380 works. (See the footnote5 for details.) The LM380 has positive and negative inputs and an output that amplifies the difference between the inputs by a fixed factor of 50. This may sound like an op-amp, but the LM380 is intended as an audio amplifier and is different from an op-amp in several ways: it has a small, fixed gain, it doesn't have a negative power supply, and its internal implementation is different.

The schematic below shows the main functional blocks of the LM380. The inputs go into a differential pair circuit (blue)3. The output from the differential pair (green) goes into a single-transistor amplification stage that provides more gain. The capacitor across the amplification stage stabilizes the amplifier to prevent oscillation. Finally, the output stage (purple) produces the high-current output: power transistor Q7 pulls the output high, while Q8 and Q94 pull the output low. The feedback network controls the gain of the LM380, fixing the gain at a factor of 50. Note that unlike an op-amp, the LM380's feedback network is connected to internal points of the amplifier, not the inputs.

The LM380 audio amplifier. Diagram based on the application note.

The LM380 audio amplifier. Diagram based on the application note.

Game Boy Audio chip: headphone amplifier

Game Boy circuit board. The audio amplifier chip is midway on the right-hand side. © Raimond Spekking / CC BY-SA 4.0 (via Wikimedia Commons).

Game Boy circuit board. The audio amplifier chip is midway on the right-hand side. © Raimond Spekking / CC BY-SA 4.0 (via Wikimedia Commons).

The Game Boy amplifier chip contains three amplifiers: two identical amplifiers for the left and right headphone channels, and a more powerful mono amplifier for the speaker. The Game Boy headphone amplifiers and the speaker amplifier are somewhat different, but they are both similar to the LM380.

The schematic below shows the Game Boy headphone amplifier. Comparing it with the LM380 schematic above shows the similarities between the LM380 and the headphone amplifier, but also some differences. The input stage and feedback circuit of the LM380 are the most distinctive parts of that chip, and the headphone amplifier's circuit is essentially identical.6 The "Amplification" stage of the headphone amplifier has three transistors compared to one in the LM380, probably to produce more gain. The headphone amplifier's output stage is similar but simplified; the PNP/NPN pair that pulls the LM380 output low is replaced with a single PNP transistor. The biggest difference is the "Control" section of the headphone amplifier, which is not present in the LM380. This control circuitry powers down the headphone amplifier if headphones are not plugged in, conserving battery life.

Schematic of the Game Boy headphone amplifier that I created by reverse-engineering the die.

Schematic of the Game Boy headphone amplifier that I created by reverse-engineering the die.

The photo below shows the left headphone amplifier. The output pin (lower-right, next to the part number SBG14) is driven by seven PNP transistors in parallel (top-left) and seven smaller NPN transistors in parallel (lower center). The capacitor is in the upper left, near the center. Many resistors snake around the die.

The left headphone amplifier circuit on the die. The right amplifier is a mirror image. Photo courtesy of John Mcmaster.

The left headphone amplifier circuit on the die. The right amplifier is a mirror image. Photo courtesy of John Mcmaster.

Game Boy Audio chip: speaker amplifier

The next schematic shows the Game Boy speaker amplifier. Unlike the two channels for headphone amplification, there is a single speaker amplifier, producing a mixture of the left and right channels. Again, the input stage and feedback are almost identical to the LM380. The output stage has only minor differences. However, the amplification stage for the speaker is completely different: it includes a four-transistor differential amplifier stage, which will provide much more amplification.7 Although this amplification stage looks very similar to the input stage at first glance, its is wired differently and uses NPN transistors.8

Schematic of the speaker amplifier in the Game Boy audio amplifier chip.

Schematic of the speaker amplifier in the Game Boy audio amplifier chip.

The chip provides pins for bypass capacitors to reduce the effect of power supply fluctuations.9 The headphone amplifiers have external bypass capacitors, but the speaker bypass capacitor is omitted for some reason (see the Game Boy schematic). Lack of this capacitor may contribute to the background hum that people hear in the Game Boy's sound.

Comparison with the Game Boy Color

I recently reverse-engineered the Game Boy Color's amplifier chip, so it's interesting to compare the two chips. The amplifier chips for the Game Boy and the Game Boy Color provide similar functions. Even at the die level (below), the two chips look similar. They both have power transistors in the upper-left for the speaker, control circuitry in the lower-left, and two headphone channels on the right.

Comparison of the audio amplifier chip from the Game Boy (left) and Game Boy Color (right). Photos courtesy of John McMaster.

Comparison of the audio amplifier chip from the Game Boy (left) and Game Boy Color (right). Photos courtesy of John McMaster.

Surprisingly, the implementations of the two chips are completely different. While the Game Boy uses LM380-style audio amplifiers, the Game Boy Color uses power op-amps with more complicated circuitry. The most important difference is that the Game Boy chip has internal feedback to control the gain, while the Game Boy Color also has an external feedback capacitor, which causes it to act as a high-pass filter. For more information, see my Game Boy Color amplifier article and schematic.

Collectors of Game Boy systems have noticed that the different versions have a very different sound (discussion). The original Game Boy has a "warm, bassy sound", while the Game Boy Color has a "thin sound" with background noise and hum. These aren't just subjective differences, but show up in the waveforms:

Graphs courtesy of Herbert Weixelbaum.

Graphs courtesy of Herbert Weixelbaum.

What's interesting is that we can explain much of the sound difference through the analysis of the amplifier chips. The Game Boy's output is close to a square wave, but the waveform drops somewhat due to the speaker's 100µF DC blocking capacitor (schematic). The amplifier in the Game Boy Color, on the other hand, is configured as a high-pass filter, so it outputs higher-frequency spikes, losing the bass sound.

Conclusion

The Game Boy (1989) and Game Boy Color (1998) use custom amplifier chips. By examining die photos, the circuitry can be reverse engineered. The chips are different from common amplifier chips in two main ways, which probably explains why custom chips were created. First, each chip has three amplifiers: two for headphone channels and one for the speaker. Second, to conserve power the chip has circuitry to power-down the unused amplifiers, based on whether or not headphones are plugged in. Reverse-engineering the chips also explains much of the difference in sound between the Game Boy and the Game Boy Color. The Game Boy Color's chip implements a high-pass filter, so the sound is thin and lacks the bass of the Game Boy.

I announce my latest blog posts on Twitter, so follow me @kenshirriff for future articles. I also have an RSS feed. My KiCad files for the schematic are on Github. Thanks to John McMaster for providing the chip photos; his page is here. Thanks to Herbert Weixelbaum for the sound waveforms.

Notes and references

  1. The audio amplifier chip is labeled DMG-AMP, standing for "Dot Matrix Game amplifier". The part number on this 18-pin chip (made by Sharp) is IR3R40.

    The IR3R40 chip. Photo courtesy of John McMaster.

    The IR3R40 chip. Photo courtesy of John McMaster.

    Internally, the chip is labeled SBG14.

    The die is labeled SBG14.

    The die is labeled SBG14.

  2. Most of the PNP transistors on this chip are round. However, when multiple PNP transistors are combined, a square structure is used instead. The square PNP transistors are larger than the square NPN transistors. The chip also has some PNP transistors with multiple collectors. Other PNP transistors have no explicit collector connection but use the substrate (ground). 

  3. The inputs to the LM380 (or Game Boy amplifier) go into a differential pair (Q3, Q4), but this differential pair is different from the standard one used in op-amps. In particular, the emitters receive different, varying currents, and this is where the feedback happens. 

  4. The output stages of the LM380 and the Game Boy speaker amplifier use two transistors for pull-down configured as a Szilaki pair. The combined PNP and NPN transistors act as a higher-performance PNP transistor, somewhat like a Darlington pair. 

  5. The LM380 is explained in detail in the National Semiconductor application note, and Power Audio Amplifier IC LM380. The similar LM386 is discussed in LM386 lecture and LM386 adventures.

    I'll explain the feedback network since the Game Boy chip operates the same way. The diagram below shows how the feedback network in the LM380 operates with no input. In the upper left, the supply voltage VS across R1 creates a current I. Transistors Q5 and Q6 form a current mirror: this forces the current through Q6 to match the current (I) through Q5. The current from Q4 to the rest of the chip must be approximately 0 (since it is strongly amplified by the rest of the chip). Putting this all together, the current through R2 (generated from the output voltage feedback) must also be I. Since R2 is half the resistance of R1, the output voltage must be half of the supply voltage. The conclusion is that the output voltage at idle will be half of the supply voltage, as desired.

    The LM380 with no signal applied. Schematic of the amplifier feedback network, from the LM380 datasheet.

    The LM380 with no signal applied. Schematic of the amplifier feedback network, from the LM380 datasheet.

    When inputs are applied, the feedback network acts as seen below. Suppose a voltage ΔV is applied to the positive input. Emitter-follower transistors Q3 and Q4 buffer and raise the inputs, so the same ΔV appears across resistor R3. This generates a current ΔI through the resistor. This increases the current through Q5 to I+ΔI, and because of the current mirror, the same current will flow through Q6. Adding up the various currents, the current through R2 must be I+2ΔI. Since R2 has 25 times the resistance of R3, 2ΔI corresponds to an increase in the output voltage of 50ΔV. Therefore, the input voltage is multiplied by a factor of 50. The point of this is that the feedback network fixes the gain at 50.

    The LM380 with a small signal applied.

    The LM380 with a small signal applied.

    It seems to me that the best way to understand the LM380 is to consider it as constructed from an operational transresistance amplifier (OTRA), an obscure relative of the op-amp. An OTRA acts like an op-amp, except the two inputs are currents instead of voltages, and the difference between the currents is amplified to produce the output voltage. The two currents (I) into the OTRA must be approximately equal, but the input voltages can diverge (unlike an op-amp).

    My simplified schematic of the LM380, using an operational transresistance amplifier.

    My simplified schematic of the LM380, using an operational transresistance amplifier.

    The schematic above shows the LM380's circuitry represented as an operational transconductance amplifier and feedback network. Equating the two currents yields Vout = Vs/2 + 51V+ - 50.5V- or approximately Vout = Vs/2 + 50*(V+-V-). In other words, the output is centered at half the supply voltage, and the difference in input voltages is amplified by a factor of 50. (Nobody else describes the LM380 in this way, so it's quite possible that I am looking at it wrong, but this analysis makes sense to me.) 

  6. I don't know the exact values of the resistances on the die, but by comparing lengths on the die I can determine ratios of resistances. Looking at resistors R48, R49, R50, and R51, I calculate that the speaker amplifier has a gain factor of 22. From resistors R2, R3, R4, and R7, I calculate that the speaker amplifier has a gain factor of 30, significantly more than the headphone amplifiers. 

  7. Note that the overall amplification of the chip is limited by the feedback network. The idea of an op-amp is the raw gain will be something like 100,000, but the feedback reduces the gain to something reasonable like a factor of 50. The "extra" gain improves performance and reduces distortion. In other words, the additional amplification stage in the Game Boy compared to the LM380 isn't going to make it 100 times louder. 

  8. I'm a bit puzzled by the second amplification stage for the speaker amplifier. It looks like a differential amplifier, except a differential amplifier normally has the emitters connected and this circuit has the collectors connected. 

  9. The bypass capacitors used by the Game Boy chip (and the LM380) help reduce the impact of power supply fluctuations. It's common for chips to have a bypass capacitor between power and ground, but this bypass capacitor is a bit different. It is connected to a point in the feedback network where it is more effective than a regular bypass capacitor.