Teardown of a logic chip from a vintage IBM ES/9000 mainframe

IBM and its large mainframe computers ruled the computer industry for decades. But during the 1980s, mainframes faced increasing competition from microprocessors, workstations, and super-minicomputers. To meet this challenge, IBM pushed technology to the limit to create the ES/9000 in 1991, a family of powerful mainframes with a price tag to match, from $70,500 up to $22 million. The processor of the ES/9000 wasn't a single chip, but a metal and ceramic package called a Thermal Conduction Module (TCM) that held 121 chips. Recently, Dave Jones of EEVBlog created a popular teardown video of a TCM, showing its complex construction. After disassembling the module, he kindly sent me some of these cutting-edge chips to analyze. In this blog post, I examine the circuitry inside one of these logic chips from the ES/9000.

Detail of a bipolar logic chip from the ES/9000 computer. This closeup of the die shows the four layers of metal and the transistors underneath. Click this photo (or any other) for a larger version.

Detail of a bipolar logic chip from the ES/9000 computer. This closeup of the die shows the four layers of metal and the transistors underneath. Click this photo (or any other) for a larger version.

The ES/9000

The ES/9000 family of computers consisted of three lines with performance spanning two orders of magnitude: small entry-level systems for an office, mid-range air-cooled systems (below), and high-end water-cooled systems that could fill a room. The technology of the ES/9000 was very advanced for its time in many ways. Along with the ceramic thermal conduction modules, IBM created new high-speed integrated circuits with state-of-the-art transistors. At the system level, IBM introduced new operating systems as well as ESCON (Enterprise Systems Connection), a high-speed fiber-optic connection between the mainframe and peripherals. An optional cryptographic feature provided high-speed encryption in tamper-resistant hardware. Even the power supplies were innovative; the water-cooled power supplies could be swapped while the computer was running. The innovations of the ES/9000 generated numerous journal articles and patents.1

The ES/9000 type 9121, from Hampage.

The ES/9000 type 9121, from Hampage.

In this article, I'm focusing on the mid-range systems, known as the 9121 processors.2 This system (above) was packaged in a drab frame the size of a large refrigerator.3 It used 7.4 KVA of power, occupied 14.7 square feet of floor space, and weighed 2000 pounds. It could hold up to 1 gigabyte of memory, a large capacity at a time when personal computers typically had 1 to 4 megabytes of RAM. A typical 9121 system cost $1.5 million and had about twice the performance of a contemporary Intel 80486 computer that cost $10,000. This is a bit of an apples-and-oranges comparison, since the mainframe gave you high-speed I/O channels, fast memory access, and an advanced operating system, but it shows the dramatic price/performance advantage of microprocessors.

The TCM (Thermal Conduction Module)

One of the most interesting features of the ES/9000 was the Thermal Conduction Module (TCM) that held the integrated circuits. The high-performance bipolar chips generated a lot of heat, so IBM developed new cooling mechanisms so this computer could function without water cooling. The cut-away photo below shows a TCM with its large heat sink attached. At the bottom, some of the integrated circuit dies are visible along with the copper cooling pistons. The computer's main circuitry consists of five different TCMs.4

Diagram of the TCM with the heat sink on top. Photo from Dr. Chu / IBM, diagram from TCM paper.

Diagram of the TCM with the heat sink on top. Photo from Dr. Chu / IBM, diagram from TCM paper.

The TCM is surprisingly small, 5 inches (127.5mm) on a side, yet it holds 121 integrated circuits. Each integrated circuit has a spring-loaded copper piston on it to remove the heat. These pistons transfer the heat into the TCM's metal case, where the heat passes into the heat sink and then the air flow. The pistons are precision-machined to maximize contact and thus heat transfer. The module is filled with oil (visible below), which also increases heat transfer. The design of the TCM allows it to dissipate 600 watts of heat—imagine holding six 100-watt light bulbs in your hand.

Closeup of the TCM showing the copper cooling pistons on top of the silicon dies. Courtesy of Dave Jones.

Closeup of the TCM showing the copper cooling pistons on top of the silicon dies. Courtesy of Dave Jones.

The integrated circuits in the TCM are not packaged like regular integrated circuits, but consist of a silicon die soldered upside-down to the ceramic substrate, flip-chip style. This ceramic substrate is an incredible feat of engineering. It's essentially a printed-circuit board made out of ceramic, with 63 layers of wiring inside. It has over 80,000 connections on the top to the integrated circuits, 2 million vias, 400 meters of internal wiring, and 2772 pins on the bottom.

The TCM opened up, showing the chips inside. Most of the chips are the bipolar logic chips described in this blog post.
The two slightly-smaller dies on the left are probably also logic chips.
The 16 reddish rectangular chips are 128-kilobit static RAM chips. Six of the 121 positions are unused. The small reddish components between the chips are decoupling capacitors. From EEVBlog Flickr album, © Dave Jones, used with permission.

The TCM opened up, showing the chips inside. Most of the chips are the bipolar logic chips described in this blog post. The two slightly-smaller dies on the left are probably also logic chips. The 16 reddish rectangular chips are 128-kilobit static RAM chips. Six of the 121 positions are unused. The small reddish components between the chips are decoupling capacitors. From EEVBlog Flickr album, © Dave Jones, used with permission.

The manufacturing process for the ceramic substrate was very complex. Each ceramic sheet, the thickness of two sheets of paper (0.2mm), has tens of thousands of via holes punched in it. Next, the wiring was applied in the form of a molybdenum metal paste, forming wires just 100µm wide. The stack of 63 sheets was then laminated under heat and pressure. Next, the stack was sintered at 600°C to decompose the polymer binder, followed by hydrogen treatment at 1560°C for densification. During this process, the substrate shrank by 17%, but the millions of vias must remain aligned. After trimming and polishing, two layers of thin-film wiring were placed on top of the substrate. (The thin-film wiring allowed wiring changes to be made to the module for bug fixes.)5 Finally, the module was protected with a layer of polyimide film, with thousands of openings burned in it with a laser for the chip's connections.

The bipolar logic chip

Most of the chips on the TCM are bipolar logic chips; these are the square black chips in the previous photo. The die photo below shows one of these logic chips, 6.5mm on a side.8 This chip has an unusual appearance because it was connected directly to the substrate instead of the typical approach of putting pads around the perimeter with bond wires attached. The black circles are the 549 solder balls in a 27×27 grid that connect the chip to the substrate. Of these connections, 228 of these are used for signals, while 321 are used for power. The chip is covered with metal conductors that connect the solder balls to the circuitry underneath.

Die photo of a bipolar logic chip, showing the solder balls. (Click for a larger version.)

Die photo of a bipolar logic chip, showing the solder balls. (Click for a larger version.)

The chip is built from a type of transistor called the bipolar transistor, an older type of transistor than the MOS transistors in modern processors. The transistors in this chip used a cutting-edge design with a complex internal structure.6 IBM used bipolar transistors because they provided higher performance at the time, but they had the disadvantages of using higher power and taking up more area on a chip. (This is why the chip needed 321 connections for power and why the ES/9000 required multi-chip modules with a complex cooling system.) The chip contains approximately 85,000 transistors, 40,000 resistors, 10,000 capacitors, and 1000 Schottky diodes. While this may seem like a large number, contemporary CMOS microprocessors (such as the Intel 486) contained over a million transistors, illustrating the much higher density of MOS transistors.7

As shown in the closeup photo below, the chip has four layers of metal wiring on top of the silicon, a lot of layers for the time. The metal layer on top of the chip (called M4) provides power and signal distribution from the solder bumps. Underneath, layer M3 provides horizontal wiring: thick lines to distribute power across the chip and thin lines for signals. Layer M2 provides vertical wiring for both power and signals. The bottom layer (M1) implements the local wiring of the gate circuitry, connecting the transistors and resistors together. The narrowest metal lines are 1.6µm wide. Power distribution uses a hierarchy: the numerous solder balls feed power into the very wide power lines in the top metal layer. These are interconnected with the wide horizontal lines, which connect to the thinner vertical lines, which connect to the circuitry. This hierarchy ensures that voltage drop is minimized across the chip, while providing the multi-amp current it requires.

The chip has four layers of metal. The silicon circuitry is visible underneath, somewhat obscured by the multiple layers of insulating silicon dioxide and silicon nitride on top.

The chip has four layers of metal. The silicon circuitry is visible underneath, somewhat obscured by the multiple layers of insulating silicon dioxide and silicon nitride on top.

The architecture of the chip is IBM's "master slice" approach, building the chip from a gate array of identical cells. To avoid the expense of creating fully-custom chips, IBM built the various logic chips from a common grid of cells that was customized by the wiring on top. In the photo above, you can see some of these cells underneath the metal. The master cell approach has the disadvantage of being less dense than a custom chip. It turns out that roughly half of the cells in each logic chip went unused because the number of I/O pins on the chip was too small.12 You can see that most of the cells are unused in the photo above; while the transistors and resistors are present, they aren't connected to anything.

The chip contains 5240 cells, capable of implementing 2620 DCS logic gates. The structure of a cell is shown below. The cells are very flexible: each cell can implement one gate in the ECL (Emitter-Coupled Logic) family,9 two gates in the NTL (Non-Threshold Logic) family,10 or half a gate in the DCS (Differential Current Switch) family (which this chip uses). The key components are the transistors, which I've colored blue. The resistors are colored yellow.11 At the top are two large capacitors (red). The capacitors are unused in this DCS circuitry, but can be used to speed up ECL gates.

Diagram of the cell layout used by the chip. From patent EP0493989A1.

Diagram of the cell layout used by the chip. From patent EP0493989A1.

The image below shows six of the chip's 5240 cells after removing the metal layers from the chip. You can see how the layout matches the diagram above. (The cells in the middle are upside down.)

Closeup of the logic cells. I stacked multiple photos after removing the metal layers to get this image.

Closeup of the logic cells. I stacked multiple photos after removing the metal layers to get this image.

The logic chips are fabricated with a special technique that allows hundreds of different types of logic chips to be produced from a single set of masks. The transistors and other components in the silicon "master slice" are constructed using masks and photolithography as in most integrated circuits. However, the metal layers are patterned using direct-write electron beam lithography, rather than masks. This electron beam is steered to "write" the desired metal layer patterns on the die to produce the desired type of chip. In other words, the basic pattern of the chip is created using masks, but then the different chip types are manufactured directly from the design files, providing flexibility.

The photo below shows the entire die after dissolving the metal layers. This image shows the grid of cells, as well as three vertical rows holding 360 I/O cells.13 The grid pattern is most clear in the upper-right corner, where I sanded the die down. (Due to the difficulty of removing four layers of metal as well as layers of silicon nitride, I couldn't get the die as clean as I like.)

Die after removing the metal. The rounded corners are from my mechanical planarization processing (by which I mean sanding with 600-grit sandpaper). The original die was not rounded.

Die after removing the metal. The rounded corners are from my mechanical planarization processing (by which I mean sanding with 600-grit sandpaper). The original die was not rounded.

Differential Current Switch logic (DCS)

The chip is built with an uncommon logic family called DCS (Differential Current Switch).15 As the name suggests, DCS operates on differential signals: each input signal is expressed by two wires carrying both the signal and its complement. The voltage difference between the two wires represents a 0 or 1. Thus, a three-input logic gate will have six input wires, as well as two output wires.

Most logic families implement a NAND or NOR gate as their basic gate. The basic DCS gate, however, is the SELECT operation: it outputs either input A or input B, selected by the S input. In other words, SELECT implements the function if S then A else B, or in Boolean logic, SA+S'B. The SELECT operation is surprisingly flexible; with appropriate inputs, it can implement AND, XOR, or even a latch.14

A SELECT gate is shown below at the conceptual level. Three toggle switches are controlled by the S, A, and B inputs. These switches will pull one output to ground, while the other output will be pulled high by a resistor. Starting at the bottom, the S switch will direct the ground current to either the "A" side or the "B" side. With the switches in the indicated positions, the output will be pulled to ground, while the complemented output remains high. But if input A is set to 1, the output levels reverse, with the output pulled high. Now, suppose input S is set to 0, so the current is directed to the B side. In this case, the output is controlled by switch B. You can verify that the output matches A if S is 1 and matches B if S is 0. In other words, the circuit selects between inputs A and B, depending on the value of S. Note that this circuit generates differential outputs: both the output and its complement.

Conceptually, a DCS gate consists of toggle switches that pull one output high and the other low.

Conceptually, a DCS gate consists of toggle switches that pull one output high and the other low.

Next, I'll describe how the current switch is implemented with a pair of transistors. At the bottom, a current sink generates a fixed current, which can be switched to either the left side or the right side of the circuit. The idea is that the transistor with a higher input voltage will direct the current to that side, pulling that output low. Thus, the circuit acts like a toggle switch. An important feature of the circuit is that it provides a high degree of amplification: a slight difference in voltages is enough to switch most of the current to one side. (This circuit is essentially the same as the differential amplifier used in an op-amp.) As a result, a voltage swing of just 200 millivolts is enough to distinguish a logical 0 and 1, reducing power consumption. Another important feature of this circuit is that it is activated by the difference between the input voltages, so it is relatively insensitive to electrical noise. In other words, a voltage fluctuation that affects both inputs will cancel out, rather than causing an erroneous 0 or 1.

A 1 input switches the current through the transistor on the left. A 0 input switches the current through the transistor on the right.

A 1 input switches the current through the transistor on the left. A 0 input switches the current through the transistor on the right.

The schematic below shows the implementation of a DCS gate. The three green boxes are current switches, using transistor pairs as described above. The yellow boxes are buffer circuits, called emitter followers. Two emitter followers buffer the outputs, while two more are used on the select inputs. Finally, the blue box is the current sink circuit, providing the fixed current that gets switched by the circuit.

Components of a DCS gate.

Components of a DCS gate.

The diagram below shows this circuit in action. Starting at the bottom, the S input switches the current to the left. The A input then switches the current to the right. This current pulls the complemented output low, while the pull-up resistor pulls the output high. Note that a 0 input on A would switch the current to the other side, and thus switch the output. The B input has no effect since the current bypasses the B side of the circuit. Pulling the S input low, however, would switch the current to the B side, causing the B input to control the output. Thus, this circuit implements the SELECT operation.

Schematic of a SELECT gate, showing how the current is steered.

Schematic of a SELECT gate, showing how the current is steered.

Reverse-engineering a DCS gate

In this section, I'll look at how a SELECT gate is implemented on the chip. The diagram below zooms in on a corner of the die, and then zooms again on one logic gate, the rectangle at the bottom. As you can see, each logic gate is very small on the die. Because this gate is at the edge of the die, it has less wiring over it so it is easier to see. Even so, the wiring layers on top partially obscure it. A DCS gate is created from four half-cells; I've highlighted the one I will discuss.

Starting from the die, zooming in on a corner and then a cell logic gate.

Starting from the die, zooming in on a corner and then a cell logic gate.

The components on the die can be matched against the diagram below. As before, the transistors are colored blue, the resistors yellow, and the unused capacitor red.

A half-cell as shown in the patent.

A half-cell as shown in the patent.

Below, I've indicated some of the components in the previously-highlighted half-cell. The wiring on the bottom metal layer customizes this cell for a particular function. Looking at this wiring, you can see that the emitters (E) of transistors T-5 and T-6 are connected, as are the emitters of transistors T-7 and T-8. The collectors (C) of transistors T-6 and T-8 are connected to the base of the output transistor T-12. The collector of transistor T-7 is connected to resistor R3. The wiring in the upper metal layers is shadowy and less clear. The vertical wiring along the sides provides power to the circuit. Other faint vertical wires are connected to the bases of transistors T-7 and T-8.

A half-cell as it appears on the die, with components labeled. "B" is a transistor base, "E" emitter, and "C" collector.

A half-cell as it appears on the die, with components labeled. "B" is a transistor base, "E" emitter, and "C" collector.

By studying the die closely, I traced out the circuitry for the gate and found it was a SELECT gate. The schematic below is from the patent; I modified it to match the gate I traced out. Note that IBM used its own symbol for a transistor as I've indicated at the bottom. I've marked the transistors and resistors from the photo above in red. The circuit has six transistors for testing, in the blue box.16 As you can see, one DCS gate takes a lot of components: 17 transistors and 18 resistors. This is one reason the density of the bipolar logic chips is so low.

Schematic of the DCS logic gate, as implemented on the chip. Vcc and Vee are the power supplies for the collector and emitter respectively. Vx controls the current sink. Vt is the pull-down voltage for the emitter-followers, but I'm not sure what Vt stands for. The original schematic was for an AND gate; I modified it to show a SELECT gate.

Schematic of the DCS logic gate, as implemented on the chip. Vcc and Vee are the power supplies for the collector and emitter respectively. Vx controls the current sink. Vt is the pull-down voltage for the emitter-followers, but I'm not sure what Vt stands for. The original schematic was for an AND gate; I modified it to show a SELECT gate.

This shows the circuitry of one logic gate. Larger functional blocks such as adders were constructed by combining multiple gates. The full computer contains hundreds of thousands of these gates, implementing the processor and its control circuitry.

Conclusion

This bipolar logic chip illustrates the advanced technology of the ES/9000 mainframe.17 IBM pushed the limits of technology in everything from integrated circuit construction to ceramic modules to cooling systems. After all this effort, however, sales of the ES/9000 were underwhelming and couldn't slow the advance of microcomputers. Two years after the announcement, IBM had installed about 3600 of them, largely the lower-end models.18 In comparison, about 20 million personal computers were being sold per year, about 10,000 times the volume. Mainframes were 21.6% of computer industry revenue and dropping, less than half of personal computer revenue (44.5% of the industry). In 1997, IBM's bipolar processors reached the end of the road as IBM fully moved to CMOS processors.

I announce my latest blog posts on Twitter, so follow me @kenshirriff. I also have an RSS feed.

If you're interested in the TCM, you should definitely watch Dave Jones' teardown video below, as well as the videos where he attempts to remove the chips with hot air and a heating plate before finally succeeding. Thanks to Dave for sending me the chips as well as letting me use his photos.

Notes and references

  1. For more information, the IBM ES/9000 type 9121 was described in detail in a series of articles in the IBM Journal of Research and Development, May 1991. The most relevant articles: The IBM Enterprise System/9000 Type 9121 air-cooled processor describes how the processor was implemented, Differential current switch—High performance at low power describes the Differential Current Switch logic, IBM System/390 air-cooled alumina thermal conduction module describes the structure and manufacturing of the TCM in detail, IBM Enterprise System/9000 Type 9121 Model 320 air-cooled processor technology. The Sept 1992 issue has other relevant articles, including A four-level VLSI bipolar metallization design with chemical-mechanical planarization, Improved performance of IBM Enterprise System/9000 bipolar logic chips. Also see The Design of the ES/9000 module and High performance packaged electronics for the IBM ES9000 mainframe. IBM's announcement of the ES/9000 provides a good summary. 

  2. It's a bit tricky to keep track of IBM's naming and numbering schemes. The first distinction is between the architecture and the computers that implement the architecture. Enterprise Systems Architecture/390 (ESA/390) was IBM's mainframe architecture for the 1990s, continuing the path from System/360 and System/370. The ESA/390 architecture was implemented by several families of computers, including ES/9000, the CMOS-based 9672 Parallel Enterprise Server, the microprocessor-based Enhanced S/390 MicroProcessor Complex, S/390 Integrated Server, and S/390 Multiprise. The ES/9000 had three main processor types: the low-end CMOS 9221 in an air-cooled rack, the midrange 9121 in an air-cooled frame, and the large water-cooled 9021. (Confusingly, bigger numbers indicate a smaller system.) The 9121, the processor type in the middle, is the one I'm discussing in this blog post. Each processor type had several model numbers, as described below.

    The different ES/9000 models, from the reference guide.
The two-way and three-way multiprocessors are called "dyadic" and "triadic".

    The different ES/9000 models, from the reference guide. The two-way and three-way multiprocessors are called "dyadic" and "triadic".

    The ES/9000 family covered an enormous range of performance levels; the largest model provided over 100 times the performance of the smallest. The sizes varied widely too. The rack-mounted 9221 was designed for an office and took about 6 square feet of floor space, while the 9121 in the middle was roughly refrigerator-sized, occupying 15 to 24 square feet. The water-cooled 9021 was the classic room-filling mainframe, sized at 88 to 180 square feet. Roughly speaking the low-end ES/9000 9221 was a replacement for the IBM 9370 office-environment "super-mini computer", the air-cooled ES/9000 9121 was a replacement for the IBM 4381, while the water-cooled ES/9000 9021 was a replacement for the larger IBM 3090 systems. 

  3. IBM was a leader in industrial design, from their computers to the architecture of their buildings and even their logo, as discussed in the book The Interface: IBM and the Transformation of Corporate Design. In the 1950s and 1960s, the design for IBM's computers concealed the internal circuitry, rather than showing it off like many other systems. Instead, IBM expressed the "inherent drama" of computing through spinning tape drives and other peripherals.

    A large ES/9000 installation with the water-cooled 9021 processor. From IBM ESCON brochure.

    A large ES/9000 installation with the water-cooled 9021 processor. From IBM ESCON brochure.

    My opinion is that IBM's design style fell apart in the 1980s with the loss of dramatic consoles and tape drives, leaving just the featureless boxes. To make things worse, these boxes were stripped of subtle detailing such as their pedestal bases and accent trim, leaving units that wouldn't look out of place in a Soviet paper mill. The ES/9000 won a 1991 design award, however, so some people must like the design more than I do. 

  4. The ES/9000 had five TCMs. The Central Processor Element (CPE) is the microcoded CPU, the module that executes the instructions. The Buffer Control Element (BCE) implements a 64- or 128-kilobyte high-speed cache with error correction, and also handles virtual memory. The System Control Element (SCE) manages the flow of data between the different parts of the computer. (The System Control Element is especially important in a two- or three-processor system.) The Channel Control Element (CCE) controls the I/O channels and is essentially a separate I/O processor. The system can also have an optional Vector Control Element (VCE) for vector arithmetic.

    I was unable to conclusively determine the function of Dave's TCM. The large number (16) of memory chips suggests the cache in the Buffer Control Element (BCE), but this paper says the BCE has 26 memory chips. Possibly the chips are holding the microcode for the Central Processor Element (CPE). 

  5. The ceramic module has two layers of complex thin-film wiring visible on top. This wiring has a surprising purpose: it allows modifications and bug fixes to be made to the module. By cutting wires with a laser and attaching new wires, signals can be re-routed.

    Closeup of an IC location, showing the thin-film wiring on top. Courtesy of Dave Jones.

    Closeup of an IC location, showing the thin-film wiring on top. Courtesy of Dave Jones.

    IBM calls the modification of computer wiring an Engineering Change or EC. Back in the 1950s, an engineer could easily perform an engineering change by adding and removing wires from a mainframe's wire-wrapped backplane. The printed-circuit boards of the System/360 made changes more difficult, but IBM developed a special "delete" tool to drill out a trace on the circuit board, allowing modification.

    This diagram shows how an Engineering Change is made to an EC/9000 TCM. Parts of the thin-film wiring are cut with a laser, and a wire is attached to the special EC pads. From this paper.

    This diagram shows how an Engineering Change is made to an EC/9000 TCM. Parts of the thin-film wiring are cut with a laser, and a wire is attached to the special EC pads. From this paper.

    The introduction of the ceramic TCM raised the issue of how could engineering changes be made when the wiring was encased in ceramic. (Discarding the expensive module wasn't an attractive choice.) The solution was to put exposed wiring on the surface of the module, wiring that could be modified as necessary. This consisted of two layers of polyimide plastic (Kapton) with thin-film wiring. Instead of connecting the IC to the ceramic wiring directly, each chip signal went to an EC pad on the surface. The original trace could be vaporized with a laser, and a modification wire (gold-plated cadmium-copper alloy) ultrasonically bonded to the EC pad. The photo below shows a chip with some EC wires.

    Closeup of the module showing Engineering Change wires next to the die. The smaller reddish-brown objects are capacitors. Courtesy of Dave Jones.

    Closeup of the module showing Engineering Change wires next to the die. The smaller reddish-brown objects are capacitors. Courtesy of Dave Jones.

    In some cases it was necessary to remove a chip from the TCM. As Dave Jones found, unsoldering a chip is very difficult due to the thermal mass of the TCM. IBM invented a focused infrared machine to unsolder a chip. It combined a vacuum chip pick-up tool and infrared heater, along with a bias heater underneath the substrate to heat the whole TCM. A special prism ensured alignment of the new chip while a "mirror substrate" provided temperature feedback. This illustrates how the development of the ES/9000 required the invention of new, specialized tools. 

  6. These bipolar chips were created using an IBM technology called ATX-4 that achieved almost five times the density of IBM's earlier ATX-1 chips. IBM described three advanced features of these transistors. First, they used a polysilicon base contact self-aligned with the emitter, reducing stray capacitance by a factor of 3. Second, the transistors were surrounded by deep trenches that allowed transistors to be closely packed. Third, they used a very thin implant for the base and optimized doping for the collector. These features improved the density and performance of the transistors. 

  7. It's interesting to compare the complexity of the bipolar chip with a CMOS microprocessor at the same time. I did some rough estimates of transistor and gate counts, comparing the ES/9000 to a contemporary microprocessor. Each bipolar chip has 85,000 transistors. A CMOS processor from 1991, such as the MIPS R4000, has 1,350,000 transistors, almost 16 times as many, showing the huge density advantage of MOS over bipolar.

    Looking at gates shows an even larger advantage for CMOS. The bipolar chip implements 2620 DCS gates, of which about half are used. For the CMOS processor, I'll estimate 6 transistors for a 3-input gate; subtracting the 16-kilobyte cache in the MIPS R4000 yields about 100,000 gates, a factor of 70 more than the bipolar chip.

    Comparing a 121-chip TCM to a microprocessor yields a different story, with a TCM a bit more complex than a microprocessor. The TCM has roughly 144,000 gates and 256 kilobytes of cache, compared to 100,000 gates and 16 kilobytes of cache for the microprocessor. Thus, my estimate is that a TCM has 44% more gates than a contemporary microprocessor. Taking into account the R4000's external, off-chip cache, the cache sizes are comparable. The ES/9000 uses five TCMs for the processor, which works out to about 7 times the gates of the R4000.

    Cross-section of a transistor in the IC. From Advancing the state of the art in high performance logic and array technology.

    The four metal layers of the chip are also highly advanced. The wiring in the chips is made from aluminum-copper alloy sandwiched with titanium to support high current density. The wiring layers are double-insulated with silicon dioxide and silicon nitride to prevent shorts from developing over time. Each layer of the chip is polished flat (planarized) with chemical-mechanical polishing. Even the vias between wiring layers are complex, created by a "damascene stud" method. The vias are constructed by creating holes with reactive ion etching, filling them with metal, and then polishing away excess metal. 

  8. Here's a summary of the chip's parameters from IBM Enterprise System/9000 Type 9121 Model 320 air-cooled processor technology.

    Design parameters of the chip.

    Design parameters of the chip.

     

  9. If you're familiar with ECL (Emitter-Coupled Logic), DCS is similar except it uses differential inputs instead of reference-controlled inputs. Although ECL and DCS both use a current-switching differential amplifier, ECL inputs are compared to a reference voltage, rather than the complemented input. (In IBM's ECL circuitry, the reference voltage is ground, so a negative signal is a logic 0 and a positive signal is a logic 1.)

    The key performance benefit of ECL and DCS logic is that transistors are never completely turned on, i.e. saturated. A transistor is relatively slow to get out of saturation, so a logic family such as TTL that saturates transistors is slower. 

  10. One fairly obscure logic family supported by the chip is NTL, Non-Threshold Logic. NTL is similar to ECL, but without the reference voltage and reference transistors. As a result, NTL gates don't switch on and off sharply, but change in a more analog fashion with the input voltage. One advantage of NTL is that it uses one half-cell instead of the two used by ECL, so you can fit more NTL gates on a chip. NTL also consumes less power than ECL. However, its performance was poorer and it was more sensitive to noise, so it was rarely used in the ES/9000. NTL is described in more detail in this patent

  11. Each resistor has multiple taps (gray boxes) allowing 15 different resistance values to be obtained. Gates with various speed/power tradeoffs can be constructed by using different resistances: the DCS family supports high, medium, low, and ultra-low power gates. (Most of the circuitry is low and ultra-low power.) The 0.2 pf capacitors were used for ECL speedup and for delay elements. 

  12. Why do these chips require so many solder balls? There's some theory behind it. In the 1960s, E. F. Rent at IBM noticed a relationship between the number of components in an integrated circuit and the number of pins it required. Specifically, as the number of components increased, the number of pins required also increased, according to a power law. This became known as Rent's rule. As IBM increased the complexity of the logic chips, the number of solder bumps increased correspondingly. Chips in the IBM 3080 computer (1980) had an 11×11 grid of solder balls, while the IBM 3090's chips (1985) had a 17×17 grid. (Numbers from this paper.) The chip I examined has a 27×27 grid, but since the chips were limited by the number of I/O connections and half the gate were unused, it seems that this was insufficient. 

  13. The image below shows some cells from the chip's I/O circuitry. These cells have a different structure from the cells for the logic gates. These cells include larger transistors to provide the necessary output current.

    Die photo showing I/O cells. This die photo was formed from a stack of images.

    Die photo showing I/O cells. This die photo was formed from a stack of images.

     

  14. The SELECT operation (SA+S'B) can implement multiple operations. For instance, setting B=0 implements S AND A. (For this gate, the redundant transistors can be omitted.) Setting A=B' implements S XOR A. (XOR is inconvenient to implement in most logic families but simple with DCS). Wiring the output back to A results in a latch: when S is high, the output value is held, but when S is low, the latch is loaded from B. An inverter is trivial with DCS: because of the differential signaling, a signal can be inverted simply by switching the two lines. 

  15. Curiously, IBM's articles about the ES/9000 expand the DCS acronym as both Differential Current Switch and Differential Cascode Current Switch. The term cascode refers to "a two-stage amplifier that consists of a common-emitter stage feeding into a common-base stage." Essentially, it refers to how DCS has two layers of switching transistors, compared to the single layer in a typical ECL gate. 

  16. Each DCS gate has about 6 additional transistors for test purposes. The problem is how to detect a faulty logic gate. In most logic families, a faulty gate will typically have the output stuck at 0 or 1. By running various test sequences through the circuit, this stuck bit can be detected. However, since a DCS gate uses differential logic, it can end up with a fault where both differential outputs are approximately the same, for instance, if the current sink fails. This is difficult to detect with tests since it is unpredictable how this signal will be interpreted by other gates. This non-determinism makes it hard to detect a faulty gate. The solution is to add test circuitry to each gate. The test circuitry will force an indeterminate output to a 0 or a 1, depending on which test circuit is activated. This makes the tests deterministic and a faulty gate can be detected. This seems like a weird corner case, but it was important enough for IBM to add a substantial amount of circuitry to each gate. The test circuit is described in more detail in this patent

  17. In the 1980s, IBM faced the problem that it was the reigning computer company with its advanced mainframes, but it was encountering competition from microcomputers. Although microcomputers were technically inferior and much less powerful, they were much cheaper and rapidly increasing in power. The book The Innovator's Dilemma is the classic guide to this sort of problem. Incumbents often ignore the risk from disruptive technologies but IBM took the "right" approach and developed the IBM PC (1981) to take advantage of microprocessors. Although the IBM PC was extremely successful, IBM lost control of the PC architecture and personal computers devoured the mainframe market. It will be interesting to see what happens to Intel in the analogous situation as ARM processors gain functionality and cut into the market for technologically-advanced x86 chips. 

  18. According to Computerworld, the adoption of ES/9000 was slow, with 3600 installed almost two years after introduction. Of the installations, 47% were low-end rack-mounted systems, 36% were air-cooled frame systems, and 17% were high-end water-cooled systems. IBM had over half the mainframe market, well ahead of Fujitsu, Hitachi, and NEC. 

Two dies in one package: Teardown of a vintage ROM with double the storage

In 1971, semiconductor memory was still a new development so chips couldn't hold a lot of data. To double the storage capacity, IBM used the brute-force approach of putting two silicon dies into a 1-inch square package.1 The photo below shows a module with two face-down silicon dies, storing 4 kilobytes of data. In this blog post, I look inside this package, examine the dies, and explain how this ROM (read-only memory) was implemented. Although I expected the circuitry to be straightforward, the primitive MOS transistors of the time made the circuitry more complicated in several ways.

This IBM integrated circuit contains two silicon dies mounted on a ceramic substrate. Wiring printed on the substrate connects the dies to the pins underneath.

This IBM integrated circuit contains two silicon dies mounted on a ceramic substrate. Wiring printed on the substrate connects the dies to the pins underneath.

The photo below shows one of the silicon dies under the microscope. The white lines are the chip's metal layer, the wiring that connects the components together. The silicon underneath appears gray. Around the perimeter of the die, the dark circles are the solder balls that connect the die to the ceramic substrate. (Although other manufacturers typically attached tiny bond wires to pads on ICs, IBM soldered the die directly onto the substrate upside down in "flip-chip" style.) The solder balls provide the address lines, output data, and other connections. With 18 output connections, each die stores 1024 words of 18 bits: 9 on the left and 9 on the right. (18 bits may seem like a strange size, but it's a 16-bit word with a parity bit for each byte.2) The data is stored in a matrix of tiny transistors: 128 wide by 144 tall. This matrix is surrounded by the circuitry that selects a particular column and set of rows based on the address, outputting the desired 18 bits.

Die photo of one of the ROM dies. Click this image (or any other) for a larger version.

Die photo of one of the ROM dies. Click this image (or any other) for a larger version.

The integrated circuit is packaged in IBM's characteristic square metal can, below. These metal cans have their roots in the IBM System/360, a groundbreaking computer line introduced in 1964. Because IBM didn't consider the technology of integrated circuits to be mature enough at the time, IBM built these computers from hybrid modules called SLT (Solid Logic Technology). These thumbnail-sided modules consisted of individual transistors, diodes, and resistors encased in a square aluminum can. In 1968, IBM moved to integrated circuits (which they called Monolithic System Technology or MST), but kept the metal-can packaging. These packages gives vintage IBM boards a unique look, unlike the rectangular black epoxy integrated circuits used by most manufacturers.

The integrated circuit with the metal package, part number 5864741. The black clip next to the package holds the die, but I don't know if this was for shipping or during use.

The integrated circuit with the metal package, part number 5864741. The black clip next to the package holds the die, but I don't know if this was for shipping or during use.

To get inside the package, I removed the metal lid from the package with a hacksaw, exposing the two dies inside. To loosen the dies from the substrate, I used a butane torch to melt the solder connections.3 The photo below shows the dies next to the substrate. As you can see, the varnish on the substrate got a bit toasty during the removal process.

The ceramic substrate with the dies removed.

The ceramic substrate with the dies removed.

Looking at the substrate closely shows the complex wiring between the pins and the dies. The two chips are wired in parallel, with the substrate wiring connecting corresponding pins on the two dies. The exception is the three pins on the left side of each die near the bottom; they are wired separately to the two dies so one die can be selected.4 You can also see the tiny pads where the solder balls on the dies were attached.

This closeup of the substrate shows how the two dies are wired together, mostly in parallel, by wiring underneath the dies.

This closeup of the substrate shows how the two dies are wired together, mostly in parallel, by wiring underneath the dies.

Transistors in the chip

Next, I'll explain the construction of the chip, starting with the transistors that form its circuitry. The dies use metal-gate MOS transistors, an early type of MOS transistor that was largely replaced by silicon-gate transistors in the 1970s. The diagram below shows the construction of a metal-gate NMOS transistor. At the bottom, two regions of silicon (dark gray) are doped to make them conductive, forming the source and drain of the transistor. The gate is formed by a metal strip between the silicon regions, separated from the silicon by a thin layer of insulating oxide. (These layers—Metal, Oxide, Semiconductor—give the MOS transistor its name.) The transistor can be considered a switch between the source and drain, controlled by the gate. To simplify the behavior, the transistor turns on when the gate is pulled positive, and turns off when the gate is at 0 volts.5

Structure of a metal-gate MOSFET.

Structure of a metal-gate MOSFET.

In the closeup of the ROM below, you can see the individual bits. Each oval-shaped "bubble" is a transistor, representing a 1 bit. The vertical white stripes are the metal layer. The faint horizontal stripes are doped silicon. A "bubble" is formed by a thin spot in the oxide where the metal is close enough to the silicon to form a transistor gate. (Elsewhere, the thicker oxide layer separates the metal from the silicon so it doesn't have any effect.) These different layers were created by photolithography, projecting light through a patterned mask and then treating the silicon wafer with chemicals. The contents of the chip are fixed during manufacturing and cannot be changed. Since the mask defines the contents of the ROM, it is called a "mask ROM".

A closeup of the ROM showing some of the bits.

A closeup of the ROM showing some of the bits.

The diagram below explains the structure of the ROM. Each vertical metal line selects a column of transistors; there are 128 vertical lines in total. The ovals indicate transistors: each transistor is between a power line and a bit output line, and its gate is formed by the metal column select line above it. To read the ROM, one column is activated by pulling it high (yellow). This turns on the transistors (red) in that column. An activated transistor connects the corresponding bit output line to power, pulling it high.

Diagram showing operation of the ROM matrix.

Diagram showing operation of the ROM matrix.

The matrix produces 144 bits of output on each side of the chip. To select the desired 9 bits, a circuit called a "16-to-1 multiplexer" selects one bit out of each group of 16. To summarize, part of the address fed into the chip is used to select a column, and part of the address is used to select the output bits. Together, the address selects one of the 1024 words stored on the die.

Construction of an inverter

Next, I'll explain some of the logic circuitry. An inverter is the simplest logic gate, used in several places in the chip. The diagram below shows how a basic inverter appears on the die. The metal wiring (white) covers the silicon underneath. The middle diagram shows the conductive silicon in blue, while the transistors are colored green. The inverter is formed from two transistors: a pull-up transistor and a transistor I'll call the inverter transistor. These transistors are controlled by the metal wiring on top of them, which forms the gate.

Implementation of an inverter. At the left is the inverter on the die (somewhat simplified).
The middle diagram shows doped silicon in blue, with the transistor channel in green.
The schematic on the right shows the wiring of the inverter.

Implementation of an inverter. At the left is the inverter on the die (somewhat simplified). The middle diagram shows doped silicon in blue, with the transistor channel in green. The schematic on the right shows the wiring of the inverter.

The diagram below shows how the inverter operates. When the input is low (left), the pull-up transistor provides a weak current to pull the output high. (Because the transistor is long and narrow, its current is weak.) When the input is high (right), the lower transistor turns on, connecting the output to ground, resulting in a 0 output. Since this circuit produces a 1 output for a 0 input and vice versa, it acts as an inverter.

Simplified diagram of an inverter. With a 0 input, the pull-up transistor pulls the output high. With a 1 input, the lower transistor pulls the output to ground.

Simplified diagram of an inverter. With a 0 input, the pull-up transistor pulls the output high. With a 1 input, the lower transistor pulls the output to ground.

This inverter doesn't perform very well because these early metal-gate transistors had difficulty pulling the output high. The problem is that the transistor produces an output voltage that is 4 volts lower than the gate voltage due to the properties of the transistor. Thus, if the inverter above is powered with 10 volts, the output voltage will be just 6 volts, not 10.

The solution was the "bootstrap load" shown below: adding a capacitor and a third transistor to the inverter.6 The capacitor acted as a charge pump, boosting the voltage across the gate and thus the output voltage. The circuit is a bit tricky, but I'll try to explain it. In the first panel, a 1 input turns on the lower transistor, producing a 0 output as before. However, the upper transistor will charge the capacitor with 6 volts, which will be important in the next step.

Illustration of how the bootstrap load works.

Illustration of how the bootstrap load works.

Next, suppose we input a 0 to the inverter (middle panel). The pull-up transistor on the right will pull the output to 6 volts, as in the simple inverter. But here's the trick: the capacitor was previously charged to 6 volts, so if we raise the lower side of the capacitor to 6 volts, the high side now rises to 12 volts (because of the 6 volts stored in the capacitor). With 12 volts on the gate, the output transistor can produce 8 volts of output. This extra 2 volts will raise the capacitor even higher, giving more output voltage. This feedback loop continues, until the capacitor reaches 16 volts and the output reaches 10 volts. (The output can't get any higher than the 10 volts supplied to the transistor.) Thus, the output transistor has "pulled itself up by its bootstraps", reaching a nice 10-volt output, rather than the weak 6-volt output from the simpler inverter.

The diagram below shows an inverter on the die, with 5 transistors and a capacitor. This inverter has a bootstrap load, along with two output transistors to boost the current.7 The capacitor is constructed from a large region of metal over silicon: the metal and silicon form the two plates of the capacitor and hold the charge. Note the large size of the capacitor compared to the transistors. This diagram illustrates that even an inverter required a lot of circuitry when using the primitive transistors of 1971.

An inverter, built from 5 transistors and a capacitor.

An inverter, built from 5 transistors and a capacitor.

Column address decoding

The next circuit I'll describe is the address decoder, which selects the desired column of the ROM based on the input address. Specifically, 6 bits of the address are used to select one of 64 columns. The decoder takes up a fair amount of area on the die, with half the decoder above the ROM matrix and half below. The interesting thing about the decoder is that you can see its binary structure, with two rows of transistors that alternate, then two rows that alternate in groups of 2, groups of 4, and so on.8

Part of the column decoding circuitry. Ground lines are colored blue and output lines are colored green.

Part of the column decoding circuitry. Ground lines are colored blue and output lines are colored green.

Each vertical green line above is one decoder output, corresponding to one particular address. Electrically, each decoder line is wired as a NOR gate: if the line to any transistor is high, the transistor turns on, connecting that output line (green) to ground (blue), pulling it low. If all the corresponding address lines are low, the transistors will remain off, and that column will be activated. Each column of the decoder matches one address bit pattern, so each address selects the desired column.

Each horizontal line (and complement) that are fed into the decoder are driven from one of the address inputs. Next to each address input is the circuit below that drives these lines, as shown below. I won't go into details, but it's essentially a latch driven by the address input, outputting the value and its complement.

The circuitry for each address input.

The circuitry for each address input.

Row multiplexer circuit

While you might expect each column of the ROM to store one word, the result would be a very tall and skinny ROM that wouldn't fit on the IC die. Instead, each column of the ROM holds 16 words, making the ROM a more efficient rectangle. These 16 words are grouped by bit: the 16 values for bit 0 at the top, followed by the 16 values for bit 1, and so forth. Each output bit has a multiplexer circuit that selects one of these 16 values based on four bits of the address.

Each multiplexer circuit consists of 16 transistors, shown below: one row-select line is activated, turning on the appropriate transistor and connecting that ROM line to the multiplexer output, and thus the output pin. (The row select lines come from a decoder circuit similar to the column address decoder described earlier.) The output driver circuit amplifies the ROM output. Note the large output transistor below the solder ball. Its multiple vertical stripes are multiple gates, allowing it to produce more current for the external signal.

Diagram showing the multiplexer and output circuit.

Diagram showing the multiplexer and output circuit.

The substrate bias generator

To improve the performance of the transistors, many chips applied a negative "bias" voltage to the silicon die's substrate. The straightforward way to obtain this bias voltage was through an external pin, but this inconveniently required an additional power supply. The IBM ROM chip, instead, has a circuit to generate the negative bias voltage internally, avoiding the extra power supply.9

This substrate bias generator circuit uses a charge pump to create the negative bias voltage from the positive supply voltage, which is a neat trick. The idea is to "pump" electric charge in and out of a capacitor, analogous to a water pump, making the substrate negative. First, the capacitor is charged with 10 volts. Next, the upper side of the capacitor is grounded to 0 volts. Since the capacitor still holds a charge of 10 volts, the lower side of the capacitor must be at -10 volts, producing the desired negative voltage. This cycle is repeated at high speed, driven by an oscillator.

Operation of the charge pump. By grounding alternate sides of the capacitor, a negative voltage is created.

Operation of the charge pump. By grounding alternate sides of the capacitor, a negative voltage is created.

In more detail, the diagram above shows the charge pump driven by a pulse signal and its complement. In the first stage, the two smaller transistors are turned on, charging the capacitor to +10 volts. In the second state, the large lower transistor is turned on, grounding the left side of the capacitor. This forces the right side of the capacitor to -10 volts, pulling the substrate negative. The diode prevents current from flowing back into the substrate during the first stage.

The circuitry that drives the charge pump is shown below. Five inverters are connected into a ring, forming a ring oscillator. If the first inverter has a 1 input, it outputs a 0, so the second outputs a 1, and so forth, until the final inverter outputs a 0. This goes back into the first inverter, flipping its output to 1, and so forth, until the final inverter flips to a 1 output. The process repeats, causing an oscillation. The pulse generator circuit uses these oscillations to drive the charge pump. It also takes a feedback signal from the substrate, stopping the charge pump when the substrate is sufficiently negative.

The circuitry that drives the charge pump.

The circuitry that drives the charge pump.

The diagram below shows how the substrate bias generator is implemented on the die. The five inverters are on the right, while the charge pump circuitry is on the left. (These inverters are implemented using the inverter circuit described earlier.) The large capacitor, transistor, and diode for the charge pump are the most visible features.

A closeup of the substrate bias circuitry. It is in the lower-right corner of the die.

A closeup of the substrate bias circuitry. It is in the lower-right corner of the die.

Optional circuitry

One interesting characteristic of the chip is that some transistors are not implemented and some wiring connections are omitted. In the diagram below, the metal wire on the left has a contact with the silicon, but the metal wire on the right doesn't have a contact; it just overlaps. With a small change to the mask during manufacturing, the contact can be switched to the other wire. This swaps the function of the two inputs in the upper right corner of the chip, strobe and address. (I'm not sure why this is useful, though. Maybe backward compatibility with two different chips?)

A closeup of contacts that allow the wiring to be customized.

A closeup of contacts that allow the wiring to be customized.

To support that functionality swap, the chip also has unimplemented transistors, as shown below. The upper block has the "bubbles" that indicate working gates. The lower block has the silicon and metal layout of a transistor, but without the gates this circuitry is inert. With a small mask change, the chip can be manufactured with transistors in the lower block and the upper block unused. The point is that the chip was designed so different versions of the chip could easily be manufactured.

Transistors and omitted transistors. The upper rectangular block consists of transistors, while the lower rectangular block has no function.

Transistors and omitted transistors. The upper rectangular block consists of transistors, while the lower rectangular block has no function.

Conclusion

The 1970s were a time of great change for integrated circuits. Chips based on MOS transistors were rapidly growing in capability, leading to the rise of microprocessors, semiconductor storage, and other applications. But in 1971, the performance of these transistors was still limited, requiring inconvenient workarounds such as capacitors for bootstrap loads. The density of chips was also limited, causing IBM to put multiple dies in one package to store enough data.

Looking at this package, both dies are the same, except for the data stored on them. The photo below shows the other die that was in the package. The black globs are some sort of varnish that covered the dies and leaked in around the edges. I couldn't find anything that dissolved the varnish, so I ended up tediously chipping it off under a microscope. (This is why the cleaned die photo at the beginning of the post has some scratches.)

The second ROM die in the package. This photo shows the die after removal from the substrate, with varnish around the edges. Click for full size.

The second ROM die in the package. This photo shows the die after removal from the substrate, with varnish around the edges. Click for full size.

What does the ROM hold? Unfortunately, I don't know. I'm told that it comes from some type of IBM printer so it's probably some sort of interface firmware.

I announce my latest blog posts on Twitter, so follow me @kenshirriff. I also have an RSS feed.

Notes and references

  1. IBM often put multiple silicon dies in a single package, especially to increase memory density. Since the capacity of a single die was limited by the silicon technology at the time, packaging multiple dies together was a straightforward way to increase density. The memory module below shows four silicon RAM dies mounted on two layers of ceramic.

    An 8-kilobit IBM memory module containing four 2-kilobit chips on two levels. More details here.

    An 8-kilobit IBM memory module containing four 2-kilobit chips on two levels. More details here.

     

  2. I checked some of the data in the ROM to verify that the "extra" bits were parity. I confirmed that each 9-bit chunk had odd parity, an odd number of 1 bits. 

  3. I have disassembled several IBM modules. In most cases, heating the ceramic substrate with a heat gun is sufficient to melt the solder and release the dies. However, the ROM dies were apparently covered with varnish that held them securely, and the heat gun was not sufficient to remove them. Lacking a propane torch, I used a crème brûlée torch which provided enough heat to get the chips off the substrate. The substrate ended up blackened and started smoking in the process, however. At least I didn't need to barbecue the module. 

  4. Each die has 34 solder balls, while the package has 36 pins (4 rows of 9), so I'll explain how the math works out. Most of the pins are connected in parallel to each die. Ground, however, is connected twice to each die. Each die also has 3 pins that are connected separately, allowing the die to be addressed individually. Thus, the package has 30 pins that are shared across the die, and 6 pins that are connected 3 to each die. 

  5. Since I have no information about this chip, everything is from reverse-engineering and I had to make some guesses. I want to be honest about what parts are speculative, so I'll summarize in this footnote. I don't know if the chip uses NMOS or PMOS transistors since they look the same under the microscope. Given the early date of this chip, it's very possible that it used PMOS transistors. If so, the explanation of the chip is essentially the same, except the voltage levels are reversed and negative. I illustrate many of the circuits with a supply voltage of 10 volts; I don't know the actual voltage used by these chips. Likewise, the 4-volt threshold voltage is an assumption. The output labels 0-17 are arbitrary since I can't tell what order the bits are in. The labels on the address bits are based on the decoder patterns but I don't know if data was stored row-first or column-first. I'm speculating that a transistor in the ROM indicates a 1 bit, but it could indicate a 0 bit. The explanation of the strobe and enable inputs is based on examining the circuit, but could be wrong. 

  6. The transistor's output voltage is lower than desired due to the large "threshold voltage" of early metal-gate transistors. The transistor turns on when the gate voltage is sufficiently higher than the drain (output) voltage. This voltage difference is the threshold voltage, which could be several volts. The workaround is to raise the gate voltage a few volts higher to overcome the threshold voltage. I've written about the bootstrap load in the Intel 8008 processor (link) if you want more information about bootstrap loads. 

  7. The bootstrap load produces a higher-voltage output than can be obtained directly. The higher voltage can be traded off to obtain a high-current output. The trick is to use two more transistors to produce the final output, as shown below. The upper transistor, fed by the inverter, pulls the output high, while the lower transistor, fed by the original input, pulls the output low. The point of this is it takes five transistors and a capacitor to produce a good inverter. In comparison, just a couple of years later, semiconductor technology had advanced so only two transistors were required.

    Adding two output transistors provides a higher-current output for the inverter.

    Adding two output transistors provides a higher-current output for the inverter.

     

  8. A few things to note about the column decoders. First, half the decoders are at the top of the ROM and half are at the bottom of the ROM. This is because the decoders are about twice as wide as a ROM cell so they wouldn't all fit on one side of the ROM. Second, the decoders are duplicated for the left and right sides of the ROM since the left and right sides provide two bytes for the same address. (It was more space-efficient to duplicate the decoders than to use one set of decoders with 64 wires between the two sides of the ROM.) Third, if you look carefully, the first rows of transistors don't alternate in the pattern "ABAB ABAB" (as you'd expect for binary), but instead alternate "ABBA ABBA". Thus, the columns are accessed in the order 0, 1, 3, 2, and so forth, instead of the order 0, 1, 2, 3. This is invisible to the user of the ROM, as long as the columns are shuffled appropriately when the ROM is programmed. 

  9. Curiously, the IBM ROM chip has a pin that appears to be tied to the substrate to provide bias, as well as an on-chip bias generator I don't know why the chip would have both. If you want more information about substrate bias generators, I've written about the substrate bias generators in the 8086 and 8087.

    The bias voltage of the Hewlett-Packard Nanoprocessor was unusual. Due to variability in the manufacturing process, the bias voltage varied from chip to chip. During production, each chip was tested and the proper bias voltage was hand-written on the chip. Each circuit board had to be adjusted to provide the necessary bias voltage.

    The HP Nanoprocessor. Note the hand-written voltage "-2.5 V". The last digit (1) of the part number is also hand-written, indicating the speed of the chip. Photo courtesy of Marc Verdiell.

    The HP Nanoprocessor. Note the hand-written voltage "-2.5 V". The last digit (1) of the part number is also hand-written, indicating the speed of the chip. Photo courtesy of Marc Verdiell.

     

Reverse-engineering the standard-cell logic inside a vintage IBM chip

Integrated circuits are often built from standard-cell logic, constructed from standardized building blocks such as NAND gates. Since I've been looking at a chip that uses standard-cell logic, I figured it was a good opportunity to examine standard-cell logic closely by reverse-engineering a simple block of logic on the chip. (It turned out to be a divide-by-16 module.) The diagram below shows the die from an IBM token ring chip from 1993. The chip contains a block of analog network circuitry, but curiously the analog block contains some standard-cell digital logic. Finally, zooming in shows one NAND gate in the logic.

Standard cells let automated tools design a complex integrated circuit from a description in a language such as Verilog. These tools select the appropriate cells from a cell library, place them in rows, and route the wiring between the cells to create the desired logic. This is much easier than a fully-custom design with each individual transistor arranged on the die.1 Vendors supply a library of standard cells2 as well as software to create the design.3 While a library may contain hundreds of different types of cells, the circuit I examined only uses five different cell types, which I will explain below.

Zooming in on the die, the analog block, its standard-cell logic, and finally a single gate. Click this photo (or any other) for a larger version.

Zooming in on the die, the analog block, its standard-cell logic, and finally a single gate. Click this photo (or any other) for a larger version.

The chip

I'll give a brief overview of the chip first, before I scare everyone off with CMOS circuit diagrams. The chip is the large (1.5") square integrated circuit on the board below, packaged in IBM's unusual shiny aluminum can. This chip is the controller for this token ring network board. (I recently wrote about a different token ring IC; the current post describes an older (but related) IC on a different token ring board.)

The IBM 4/16 ISA token ring board. The metal-can IC has part number 63F7704.

The IBM 4/16 ISA token ring board. The metal-can IC has part number 63F7704.

Removing the metal lid from the IC exposes the silicon die inside. The die is mounted upside-down on a ceramic substrate, connected to 175 pins by thin traces on the substrate. Instead of bond wires, the die is attached by solder balls on its surface.

The die is mounted upside down on the ceramic substrate.

The die is mounted upside down on the ceramic substrate.

The photo below shows the die under the microscope. The black circles are the solder balls. They form two rows around the perimeter of the die, but there are also rows of solder balls throughout the chip, distributing power and ground.

Die photo of the chip. (Click for a larger version.)

Die photo of the chip. (Click for a larger version.)

The chip has two layers of metal wiring: thicker yellowish wires on top and thinner gray wires underneath. The underlying silicon appears pinkish in this photo. Brownish polysilicon wiring is also visible on top of the silicon. Most of the chip consists of rows of standard-cell logic, about 24,000 gates.4 The chip contains a custom microprocessor in the upper left corner. In the lower-left is an analog block that interfaces to the network.5 This block contains a small amount of digital standard-cell logic, which is what I'll describe below.

How CMOS logic is implemented

The chip is built with CMOS logic (complementary MOS), which uses two types of transistors, NMOS and PMOS, working together. The diagram below shows how an NMOS transistor is constructed. The transistor can be considered a switch between the source and drain, controlled by the gate. The source and drain (gray) consist of regions of silicon doped with impurities to change its semiconductor properties and called N+ silicon. The gate consists of a special type of silicon called polysilicon, separated from the underlying silicon by a very thin insulating oxide layer. The NMOS transistor turns on when the gate is pulled high.

Structure of an NMOS transistor. A PMOS transistor has the same structure, but with N-type and P-type silicon reversed.

Structure of an NMOS transistor. A PMOS transistor has the same structure, but with N-type and P-type silicon reversed.

A PMOS transistor has the opposite construction from NMOS: the source and drain consist of P+ silicon embedded in a substrate of N silicon. The operation of a PMOS transistor is also opposite from the NMOS transistor: it turns on when the gate is pulled low. Typically PMOS transistors pull the drain (output) high, while NMOS transistors pull the drain low. In CMOS, the transistors act in a complementary fashion, pulling the output high or low as needed.

A NAND gate implemented in CMOS.

A NAND gate implemented in CMOS.

The diagram above illustrates how a CMOS NAND gate works. The gate consists of two PMOS transistors at the top and two NMOS transistors at the bottom. The first case shows what happens when an input is 0. The corresponding PMOS transistor turns on, pulling the output high. In the second case, both inputs are 1. The NMOS transistors turn on, pulling the output to ground, creating a 0 output. Thus, the circuit implements the NAND function.

By removing one input and the corresponding pair of transistors, this circuit becomes an inverter. By adding additional inputs and pairs of transistors, this circuit can be extended to create a NAND gate with 3 or more inputs. Note that the PMOS transistors (on top) are wired in parallel, while the NMOS transistors (on the bottom) are wired in series; this will be important for the standard cell layout.

The standard cell circuits

The circuit block that I'm examining uses five different types of standard cells (out of the hundreds in the library). In this section, I'll show the construction of each cell type, starting with the 2-input NAND gate, and then the more complex cells. Each cell is constructed as a rectangle that fits between the power rails, with inputs and outputs in a line at the bottom. This standard cell layout allows the gates to be arranged into rows without worrying about the internal construction of the cells. The cells can then be wired together, using the chip's two layers of metal wiring.

NAND

I'll start by examining a 2-input NAND gate cell that implements the NAND circuit described earlier. The photo on the left shows how this NAND gate looks on the die, and the diagram on the right explains the key components. Starting at the bottom, the two inputs are connected to polysilicon wires (red). When these wires cross the N-type silicon (turquoise) at the bottom, they form NMOS transistors. These transistors are connected together by sharing silicon. At the top, when the polysilicon wires cross the P-type silicon (yellow), they form PMOS transistors.6 These transistors are wired in parallel, with one end connected to +5 volts. The metal wire in the middle connects the PMOS transistors to the second NMOS transistor and the output.

A 2-input NAND gate implemented as a standard cell. The photo on the left shows how it appears on the die, while the diagram on the right explains the construction of the cell.

A 2-input NAND gate implemented as a standard cell. The photo on the left shows how it appears on the die, while the diagram on the right explains the construction of the cell.

The schematic below shows the transistors arranged to match their physical layout in the cell. If you trace out the paths, this circuit is the same as the NAND circuit described earlier. The structure of the gate is harder to follow in this schematic because the layout is constrained by the needs of the standard cell.

Schematic of a 2-input NAND gate; the schematic layout matches the physical layout.

Schematic of a 2-input NAND gate; the schematic layout matches the physical layout.

Once we have determined the structure of the NAND gate cell, we can find all the instances of this cell. The diagram below shows a detail of the chip with four NAND gates marked. The gates are identical, except the gates in the top row are flipped because the power wire for them is on the bottom, not the top. (Two other gates in this photo don't match the NAND cell; they will be described below.) Note the two inputs and the output for each of these gates.

Part of the circuit, with four NAND gates labeled.

Part of the circuit, with four NAND gates labeled.

The cells are connected together by metal wiring. The chip has two layers of metal. The bottom metal layer is used for the thick horizontal power and ground wiring, the wiring inside each cell, and horizontal wiring between cells. The second metal layer is used for vertical wiring. Much of this vertical wiring passes over cells; because it uses a different layer than the wiring inside the cell, there is no conflict.

3-input NAND

The circuit also uses 3-input NAND gates. The construction is similar to the smaller NAND gate, except there is another PMOS transistor in parallel on top and another NMOS transistor in series on the bottom. While the NMOS transistors are in a nice row, the PMOS transistors require an additional metal wire to connect them in parallel. (The two thick vertical metal wires are not part of the cell.) The schematic is in a footnote for reference.7

Structure of the 3-input NAND gate.

Structure of the 3-input NAND gate.

4-input AND

The next gate is more complex: a 4-input AND gate. An AND gate can't be built directly because a CMOS gate requires inversion (because a 1 to the NMOS transistor pulls the output low). Instead, an AND gate is built by inverting the output of a NAND gate, as shown below. In other words, this cell contains two gates.

A 4-input AND gate, created from a NAND gate and inverter.

A 4-input AND gate, created from a NAND gate and inverter.

A second complication is that this gate is constructed to output twice the standard current. It is implemented by using pairs of transistors in parallel in the inverter: two NMOS transistors and two PMOS transistors.8

The result of those factors is the 4-input AND gate cell below. On the right side of the cell is a 4-input NAND gate. It is similar to the earlier NAND gates, but with the inputs connected to four PMOS transistors on top wired in parallel and four NMOS transistors on the bottom wired in series. The series transistors are packed together in a tight row, but the parallel PMOS transistors have a more complex layout due to the +5 connections and the wiring to connect them together. On the left is the inverter, driven by the NAND gate's output. The inverter has two pairs of transistors to provide the high-current output. For details, see the schematic in the footnote.9

Schematic of the 4-input AND gate. The black dot in the middle indicates the connection between the NAND gate's output (metal) and
the inverter's input (polysilicon).

Schematic of the 4-input AND gate. The black dot in the middle indicates the connection between the NAND gate's output (metal) and the inverter's input (polysilicon).

Buffer

Next is a non-inverting buffer with triple-current output, using principles similar to the AND gate. The non-inverting action is achieved by putting the output of an inverter through a second inverter, yielding the original value. The first inverter is on the right, constructed from a PMOS transistor and an NMOS transistor. The output inverter on the left has 3 pairs of transistors to provide high-current output. The H-shaped metal wiring collects the output from the six transistors. The schematic is in the footnote.10

Layout of a non-inverting buffer.

Layout of a non-inverting buffer.

Inverter/driver

The final cell type is an inverter with triple-current output. This could be implemented with a single inverter, but the cell uses three inverters in series. The input goes into the inverter on the left, which is connected to a second inverter in the middle. This drives the inverter on the right, which has three pairs of transistors.11

Layout of the standard-cell inverter/driver.

Layout of the standard-cell inverter/driver.

Reverse-engineering the circuit

After determining this set of standard cells, each cell on the chip can be labeled with its function, as in the diagram below. Next, tracing out the wiring between the cells reveals how the circuitry is connected. I noticed a repeated motif of six NAND gates connected as cross-coupled latches; these groups are outlined in black.

The circuit with the cells labeled. The four flip-flops are outlined in black.

The circuit with the cells labeled. The four flip-flops are outlined in black.

The schematic below shows how these 6-cell blocks are wired. After puzzling over this a while, I realized that this circuit was a standard edge-triggered flip-flop. The idea behind an edge-triggered flip-flop is that when the clock signal goes from 0 to 1, the flip-flop latches the value on the data input and holds it until the next clock transition. In this way, flip-flops provide synchronization and a form of memory and are very useful in many applications. The flip-flop outputs the stored value as Q, and the complement of this value as Q.

An edge-triggered flip-flop built from 6 NAND gates. It is wired as a toggle flip-flop

An edge-triggered flip-flop built from 6 NAND gates. It is wired as a toggle flip-flop

In this circuit, the inverted output Q is connected back to the data input, so every clock pulse will cause the flip-flop to toggle between 0 and 1. Since two clock pulses will cause a single 0→1→0 cycle on the output, this flip-flop divides the clock frequency by 2.

With the flip-flops recognized, I could create the schematic for the complete block of logic. The four flip-flops are arranged in sequence to divide the input clock by 16. The four flip-flops are also fed into the 4-input AND gate, which creates a pulse once every 16 clock cycles.12

Schematic of the divide-by-16 circuit.

Schematic of the divide-by-16 circuit.

Conclusion

Standard-cell logic is the mainstream methodology for designing digital logic. In this post, I've reverse-engineered some of the cells used in a vintage IBM chip and determined the circuit implemented by the cells. Although this specific circuit is not very complex, it's interesting to see how standard cells are constructed and how they are used in a real chip. (Although vendors publish specifications of their libraries, it's hard to find details on the physical implementation of the cells.) The chip I examined is from 1993, so its 1µm technology is obsolete compared to modern standard cell libraries that go down to 7 nm and have many layers of metal wiring, but the principles remain the same.

I announce my latest blog posts on Twitter, so follow me @kenshirriff. I also have an RSS feed.

Notes and references

  1. You might think that the standard cell layout is not that different from a custom layout. However, a custom layout can be very tightly packed, with transistors winding all over the place. As an example, the photo below shows part of the 8086 processor. Note that the metal lines on top almost completely occupy the available space. The transistors underneath wind around in complex patterns. In addition, the sizes of the transistors are carefully optimized for their role. This is in contrast to standard-cell logic where transistors have a few, fixed sizes. The point is that a custom, optimized layout may be very complicated to achieve as much density as possible.

    A closeup of the Intel 8086 die.

    A closeup of the Intel 8086 die.

     

  2. For examples of commercial standard-cell libraries, see AMI's databook (1996) or a Samsung library (2000). 

  3. IBM's software for synthesizing Boolean logic was called BoolDozer. Papers on it are here and here

  4. One unusual thing about this integrated circuit is that the CPU, the analog block, and the general logic all use standard cells, but they use different standard cell libraries with completely different layout styles, as shown below. The CPU's standard cells appear to be the densest, with cells between power and ground lines. Horizontal and vertical routing takes place over the cells. The general logic, on the other hand, has larger cells. Wide horizontal bands are used for routing, so only 1/3 of the space contains cells. The logic in the analog block is the least dense. The cells resemble the general logic cells, but larger. The routing wiring is thicker and less dense, looking like little optimization was performed. It's a surprise to find such a variety of standard cell implementations on one chip.

    Comparison of standard cells in the CPU, general logic, and the analog block.

    Comparison of standard cells in the CPU, general logic, and the analog block.

     

  5. The chip contains a block of analog circuitry implemented in CMOS. This circuitry "performs signal conversion and clock recovery functions as well as detecting and compensating for line impairments". This circuitry includes resistors, capacitors, MOS transistors with special properties, and other components. The analog block uses a variety of circuits such as op-amps, switched-capacitor amplifiers, voltage references, peak detectors, a charge pump, voltage-controlled-oscillator, and phase-locked loop. 

  6. The PMOS transistors must be embedded in an N-type substrate, while the NMOS transistors must be embedded in a P-type substrate. I suspect that the chip as a whole has a P-type substrate, while the NMOS transistors are in a "tub" of N-type silicon. The substrate doping isn't visible under the microscope, so it could be the other way around. I'm ignoring the substrates in the diagrams. 

  7. The schematic below shows how the transistors are connected in the 3-input NAND cell. The layout of the schematic matches the physical layout of the cell to make comparison easier. You can verify that the PMOS transistors (top) are in parallel, while the NMOS transistors (bottom) are in series.

    Schematic of the 3-input NAND gate.

    Schematic of the 3-input NAND gate.

     

  8. Standard-cell libraries typically contain versions of gates with multiple output current levels. A "×2" gate doubles the output transistors, while a ×3 gate has triple output transistors and so forth. Although the different sizes provide flexibility, custom circuitry gives you much more control since transistors can have arbitrary sizes, exactly matching the circuit's need. Typically a gate with higher current output is used if it's driving a long wire or multiple loads. But you don't want to use larger gates unnecessarily, since they have more capacitance and typically take longer to switch. So there are tradeoffs involved. 

  9. Schematic of the 4-input AND gate with double output drive. It is constructed from a 4-input NAND gate on the right, and an inverter/driver on the left.

    Schematic of the 4-input AND.

    Schematic of the 4-input AND.

     

  10. The schematic below shows the construction of the non-inverting buffer with triple-current output. It is constructed from an inverter (on the right) feeding a triple-current inverter.

    Buffer schematic.

    Buffer schematic.

     

  11. It may seem strange to use three inverters in series when one inverter has the same logical function, but I think there's an explanation. The triple-current inverter has about three times the input capacitance because of its multiple transistors. Driving this inverter directly would put more load on the gate connected to the input, potentially slowing it down. Adding the two-inverter buffer in front ensures that the cell can be driven with a relatively weak signal.

    Schematic of the inverter with ×3 output.

    Schematic of the inverter with ×3 output.

  12. Interestingly, this divide-by-16 circuit has four outputs, but only two are used. My first thought was that the others are for testing (since they are connected to internal pads). However, these outputs are simply the complements of the other outputs, so they wouldn't provide any testing benefit. The other possibility is that the whole divide-by-16 circuit is a standardized block, used in other applications.