Inside the stacked RAM modules used in the Apple III

In 1978, a memory chip stored just 16 kilobits of data. To make a 32-kilobit memory chip, Mostek came up with the idea of putting two 16K chips onto a carrier the size of a standard integrated circuit, creating the first memory module, the MK4332 "RAM-pak". This module allowed computer manufacturers to double the density of their memory systems and by 1982, Mostek had sold over 3 million modules. The Apple III is the best-known system that used these memory modules.

The MK4332 memory module combined two 16-kilobit memory chips on a ceramic substrate.

The MK4332 memory module combined two 16-kilobit memory chips on a ceramic substrate.

This module was built from two 16-kilobit memory chips, constructed from the standard MK4116 dynamic RAM (DRAM) chip packaged in a leadless ceramic chip carrier; these are the golden rectangles on top of the carrier.

You might wonder why customers didn't simply use these surface-mount packages directly, but at the time soldering surface-mount components was still a challenge for many customers. However, mounting two leadless chips on a dual inline-package (DIP) carrier allowed customers to double their memory density while still using their standard through-hole soldering techniques.

The purple carrier holding the chips was a ceramic substrate designed for thermal compatibility with the chips.1 There is no circuitry inside the ceramic carrier except wiring between the chips and the eighteen DIP pins. The two memory chips were wired in parallel except for their two select lines, which were kept separate. This allowed the desired memory chip to be selected. As a result, the MK4332 module has 18 pins, compared to 16 pins for the chips on top. Mostek used the same module design with the next generation of RAM chips, creating a 128-kilobit RAM module (MK4528) from two 64-kilobit RAM chips (MK4564).

Inside the 4116 memory chip

Although you might expect a complex mounting technique, the two 4116 chips are simply soldered onto the substrate with standard reflow techniques. For the photo below, I removed the metal lid from the left chip with a chisel and unsoldered the right chip with a hot air gun. On the left, you can see the rectangular silicon die inside the leadless carrier package. On the right are the 16 solder pads on the ceramic substrate. The wiring between the solder pads and the DIP pins is inside the ceramic substrate.

The MK4332 with the left package opened and the right package unsoldered.

The MK4332 with the left package opened and the right package unsoldered.

I created the die photo below from multiple microscope images. The white lines are the metal wiring on top of the chip, while the silicon underneath appears dark red. The two large rectangular regions are the 16,384 memory cells, arranged as a 128×128 matrix, split in two. The circuitry in between these regions consists of 128 sense amplifiers to amplify the bits read from memory, and selection circuitry to select one bit out of the 128. (Externally, the chip is accessed as 16,384×1, outputting a single bit. Typically, eight of these chips were used to store bytes.) The control and interface circuitry is at the left and right, connected to the external pads via tiny bond wires.

Die photo of the 4116 memory chip. Click for a larger image.

Die photo of the 4116 memory chip. Click for a larger image.

In dynamic RAM, a bit is stored in a capacitor, with a transistor providing access to the capacitor. The value of the bit is represented by the presence or absence of charge on the capacitor. The advantage of dynamic RAM is that each memory cell is very small, constructed from just two components,2 allowing a high memory density. (In comparison, static RAM may require six transistors per cell.) The downside of dynamic RAM is that the charge on a capacitor leaks away after a few milliseconds. To avoid losing data, dynamic RAM must be constantly refreshed: bits are read from the capacitors, amplified, and then written back to the capacitors. For this particular chip, all the data must be refreshed every two milliseconds.

The diagram below illustrates the wiring of the memory cells, showing two of the 128 rows and columns. To read or write data, a row select line is energized. The transistors in that row turn on, connecting that row's capacitors to the data in/out lines. The data from that row is read out of the capacitors and amplified. At that point, the data can either be written back to refresh the row, or a new bit can be written. Note that although the chip accesses 128 bits in parallel internally, the chip provides access to one bit at a time externally, selecting one of the 128 bits to read or write.

Structure of the memory cells, based on the patent.

Structure of the memory cells, based on the patent.

The magnified photo below shows some of the storage cells, densely packed together. It's a bit hard to visualize what's going on because the chip is constructed from multiple layers. The bottom layer is the grayish silicon die. On top of the silicon are two layers of polysilicon. Above this is the metal wiring, which was removed for this photo. The photo shows three sense lines (data in/out) in the silicon, with bulb-shaped storage cells connected on either side. Vertical strips of polysilicon (poly 1) over the storage cells implement capacitors: the silicon forms the lower plate, while the polysilicon forms the upper plate. The second layer of polysilicon (poly 2) is arranged in diagonal regions to implement the selection transistors. Square notches in the poly 1 layer allow the poly 2 layer to approach the silicon to form transistors. Horizontal metal wiring (not visible) is connected to the poly 2 regions to select a row by driving the transistors. Note that the rows are staggered and interlocking (kind of like a zipper) due to the highly-optimized layout. At the time, fitting this much memory on a chip was a challenge that pushed the limits of integrated circuit technology.

A closeup of the memory chip under the microscope, showing individual storage cells.

A closeup of the memory chip under the microscope, showing individual storage cells.

Memory chips in the Apple III

Apple was a major customer of these memory modules, using them in the Apple III computer (1980). The Apple III was marketed as a business computer to follow the popular Apple II. Unfortunately, the Apple III was a business failure due to reliability issues and competition from the IBM PC introduced a year later.

Apple III Plus computer. Photo by Bilby, CC BY 3.0.

Apple III Plus computer. Photo by Bilby, CC BY 3.0.

As was usual for the time, the Apple III's memory board3 was stuffed with memory chips to achieve more capacity. An unusual part of the design is it used three rows of memory chips (instead of a power of two), mixing 16-kilobit and 32-kilobit memory chips to achieve 128 kilobytes of storage. (The Apple III's case was designed before the boards, so the boards had to be designed to fit the available space.) In the photo below, the top row holds MK4332 memory modules, while the bottom two rows hold 16-kilobit MK4116 chips.4

Apple III main memory card. Photo courtesy of DigiBarn, CC BY-NC 3.0

Apple III main memory card. Photo courtesy of DigiBarn, CC BY-NC 3.0

A brief history of memory

Memory is an under-appreciated part of computing. The CPU usually gets the attention, but memory was often the limiting factor. The problem with memory is that storing a single bit is easy, but most approaches are impractical when you try to scale up to thousands or millions of bits.

The early ENIAC computer (1946) used vacuum tubes for storage, but these were bulky and expensive, limiting ENIAC to just 20 words (of 10 digits) stored in its accumulators. Early computers such as EDSAC (1949) used mercury delay lines for memory, sending pulse trains of sound waves through tubes of mercury. Although EDSAC could store 512 words, you had to wait for bits to circulate serially through the mercury. An improvement was the random-access Williams tube which stored data as spots on a cathode-ray tube screen. Although they were temperamental, Williams tubes were used in the Manchester Mark 1 (1949) and the commercial IBM 701 (1952).

The introduction of core memory revolutionized computing, providing fast, cheap, and reliable storage, storing each bit in a tiny magnetized ferrite ring. Core memory was introduced in the Whirlwind computer (1953) and used in most computers of the late 1950s and 1960s. However, since each bit required a separate physical ferrite core, memory sizes were limited to a few megabytes for even the largest customers. For example, memory cabinets for the IBM System/360 (1969) held 256 kilobytes but weighed over a ton each (below).

Magnetic core memory was relatively bulky. This photo shows an IBM System/360 Model 85 installation. The cabinets in the front are IBM 2365 Processor Storage, each holding 256 kilobytes. The double-H cabinet in the center is the CPU. Photo from IBM.

Magnetic core memory was relatively bulky. This photo shows an IBM System/360 Model 85 installation. The cabinets in the front are IBM 2365 Processor Storage, each holding 256 kilobytes. The double-H cabinet in the center is the CPU. Photo from IBM.

Semiconductor memory led to another dramatic shift. At first, semiconductor memory was costly and had very small capacity; Intel's first product was a memory chip holding just 64 bits and costing $99.50. In 1968, Dennard at IBM invented cost-effective dynamic RAM and semiconductor DRAM technology advanced quickly at various companies. Intel introduced the first commercially available DRAM chip in 1970, the i1103 holding 1K bits. This chip was nicknamed the "core killer" because of its impact on the magnetic core memory industry.

Computer storage rapidly moved from core memory to DRAM as the capacity of DRAM increased and the price fell.5 Mostek introduced the 4-kilobit MK4096 chip in 1973, followed by the 16-kilobit MK4116 in 1976. In 1978, Fujitsu introduced the first commercial 64-kilobit DRAM chip and Japan took the lead in DRAM manufacturing.6 Intel left the DRAM industry in 1985 due to decreasing market share and profits, followed by the remaining US DRAM manufacturers.

Fifty years after the introduction of DRAM, it is still the dominant technology for main storage, a remarkably long lifetime. Compared to the 16-kilobit chip I described, Samsung's recent 16-gigabit DRAMs are a factor of a million larger, showing the incredible increase in density. It remains to be seen if anything will challenge the long storage leadership of DRAM.

I announce my latest blog posts on Twitter, so follow me at kenshirriff. I also have an RSS feed. Thanks to Mike Braden for suggesting the MK4332 chip to me.

Notes and references

  1. For details on the construction of the memory modules, see Rectangular chip-carriers double memory-board density, Electronics, 1982. 

  2. Early dynamic RAMs such as the Intel 1103 used three transistors per cell and used separate lines for reading and writing data. Improvements in memory technology shrunk the circuit to a single transistor and a single data line. 

  3. The Apple III memory board pictured is the "12 volt memory board", given that name because the memory chips required 12 volts (as well as +5 and -5). It was upgraded by the "5 volt memory board", which used only a 5 volt supply. The 5 volt memory board used more modern 64-kilobit memory chips (4864) giving it a larger capacity of 128 or 256 kilobytes. Inconveniently, the power supply required a 12-volt load to operate, so the 5-volt memory board has a power resistor to draw 0.4 amps from the otherwise-unused 12-volt supply. Details are in the Apple III reference manual

  4. The Apple III memory board was also available in a lower-cost 96-kilobyte module. In that configuration, the 4332 memory modules were replaced with the 16-kilobit (MK4116) chips used on the rest of the board. One clever feature of the 4332 module is the two "extra" select pins are on the end of the package. The result is that a memory board (such as the Apple III's) can be designed to accept either the 16-pin 16-kilobyte chips or the 18-pin 32-kilobyte modules, depending on how much memory is desired. With the smaller chips, the two extra pins are unused. It's strange, however, that the Apple III memory board only accepted the larger modules in one of the three rows of chips. 

  5. The industry switch from magnetic core memory to semiconductor memory wasn't as straightforward as superior semiconductor memory overthrowing inferior core memory. Instead, there was a time period where they co-existed, due to tradeoffs. For instance, in 1972, a customer could select core memory, semiconductor memory, or a mixture for the D-112 minicomputer (a PDP-8 clone); semiconductor memory was 5 times faster, but core memory supplied four times the capacity per board. By 1973, industry publications were reporting that "Semiconductor memories are taking over data-storage applications". As late as 1980, core memory manufacturers were advertising the benefits of core memory, battling the "myths" that semiconductor was better.

    Was the overthrow of magnetic core by semiconductor memory inevitable? My view is that "technological determinism" acts in some ways; the development of DRAM memory was almost unavoidable following the development of MOS transistors. However, "economic determinism" was more responsible for the success of semiconductor memory: if magnetic core had remained the lower-cost option, it probably would have remained dominant. As a counterexample, CCD (charge-coupled device) memory and bubble memory were hyped as storage technologies of the future, but couldn't achieve the price-performance to dislodge either semiconductor memory or hard disks. 

  6. Note that the capacity of memory chips increased by a factor of 4 each generation (1-, 4-, 16-, 64-kilobit) rather than a factor of 2. The reason is that each address pin was multiplexed to provide two address bits, so each additional address pin resulted in a factor of four increase. By reusing each address pin for both a row address and a column address, the number of address pins was kept low so compact 16-pin packages could be used even as memory sizes expanded to 256-kilobit. Conveniently, as technology improved, memory chips required fewer voltages, freeing up pins formerly used for power. One consequence, though, was the ordering of address pins on the chip was essentially random as new address pins were assigned based on which pins were available, rather than sequentially. The multiplexed address system was introduced in the Mostek MK4096 chip and meant that the 256-kilobit 41256 chip used fewer pins than the original 1-kilobit Intel 1103 (16 pins vs 18). 

How the bootstrap load made the historic Intel 8008 processor possible

Near the end of 1972, Intel introduced their first 8-bit microprocessor, the 8008. Decades later, this processor still influences computing; you probably use an x86 processor that is a descendent of the 8008. One unusual feature of the 8008 processor is its use of a "bootstrap load" or "bootstrap capacitor", a special capacitor circuit to improve performance.1 Federico Faggin, who led the development of the 8008, is the main character in this story; he invented a new way to fabricate bootstrap capacitors for the Intel 4004 and 8008 processors and says it "proved essential to the microprocessor realization" and "without [the bootstrap load], there was no microprocessor."

Die photo of the 8008 microprocessor. (Click for a larger image.)
The initials HF appear on the top right for Hal Feeney, who did the chip's logic design and physical layout.

Die photo of the 8008 microprocessor. (Click for a larger image.) The initials HF appear on the top right for Hal Feeney, who did the chip's logic design and physical layout.

My photo above shows the tiny silicon die inside the 8008 package. You can barely see the wires and transistors that make up the chip. There are 90 bootstrap capacitors, visible as small yellow rectangles, especially in the upper center. The squares around the outside are the 18 pads that are connected to the external pins by tiny bond wires. 18 pins is a very small number for a microprocessor, but Intel was bizarrely committed to small packages at the time.2 This required inconvenient tradeoffs; the lack of multiple power pins was one factor forcing the use of bootstrap loads.

The 8008 processor's history is more complex than you might expect. Its roots are the Datapoint 2200, a popular computer introduced in 1970 as a programmable terminal. Created before the microprocessor, the Datapoint 2200 contained a board-sized CPU build from individual TTL chips. Datapoint talked with both Intel and Texas Instruments about replacing the processor board with a single MOS chip. Texas Instruments created the TMX 1795 processor in March 1971, while Intel created the 8008 around the end of 1971 but Datapoint rejected both chips for a variety of reasons. Texas Instruments abandoned the TMX 1795 after their attempts to market it failed. Intel, on the other hand, marketed the 8008 as a general-purpose microprocessor, creating the microprocessor industry.

(You might wonder how the Intel 4004 fits into this story. The Intel 4004 is architecturally unrelated to the 8008 in almost every way; despite the similar names, the 8008 is not an 8-bit version of the 4-bit 4004. After the Intel 4004 was launched in 1971, much of the 4004 team (including Faggin, Hoff, Mazor, and Feeney) moved over to the 8008 project. Because the 4004 and 8008 processors were built by the same team with the same PMOS3 process, they have some layout and circuit-level similarities, in particular the bootstrap load circuit.)

Why the bootstrap load?

The purpose of the bootstrap load is to get extra voltage out of a transistor when necessary. To explain this, I'll start by showing how an inverter works when implemented in a processor. The diagram below shows an inverter, built from a PMOS3 transistor and a load resistor (which is actually a transistor). If the input to the inverter is 0 (low), the lower transistor turns on, pulling the output high (1). But if the input is 1 (high), the output transistor turns off. In that case, the load resistor pulls the output low (0). Thus, the input signal is inverted.

How an inverter is constructed from PMOS transistors. The upper symbol indicates a PMOS transistor that is acting as a load resistor.  Based on the 8008 datasheet.)

How an inverter is constructed from PMOS transistors. The upper symbol indicates a PMOS transistor that is acting as a load resistor. Based on the 8008 datasheet.)

The diagram below shows the physical implementation of an inverter in the 8008 processor. The first die photo shows the inverter as it appears in the chip. The horizontal metal wiring on top provides VDD and the input to the circuit. For the second photo, I dissolved the metal layer to reveal the two transistors that form the circuit. The schematic on the right matches the physical layout of the transistors on the die but otherwise corresponds to the schematic above. Because creating resistors in an integrated circuit is inconvenient, the load resistor is implemented by a transistor.

How an inverter appears in the 8008 processor.

How an inverter appears in the 8008 processor.

There's a complication from using a transistor as a load resistor: these MOS transistors have a property called the threshold voltage VT. The problem is that when you try to pull a signal low, the transistor can't pull it all the way low. Although you'd like the signal to get pulled down to VDD (-9 volts), the threshold voltage (say -5 volts)9 means that you can only get the signal down to -4 volts. (This is one of the reasons why the 8008 requires a much larger voltage (15 volts overall) than modern integrated circuits; if you tried to run it at 5 volts, the threshold voltage would consume the entire signal.)

The diagram below explains the threshold voltage in more detail. VD, VG, and VS are the voltages on the drain, gate, and source respectively. VGS is the voltage between the gate and the source. The transistor will turn on if VGS < VT, the threshold voltage. (Inconveniently, most of these voltages are negative in a PMOS transistor, which makes things confusing.) The problem is that with a gate voltage of -9 volts and a threshold voltage of -5 volts, the transistor will only be on if VS is higher than -4 volts. Thus, the transistor can't pull VS lower than -4 volts. The only way to get VS lower is if you had a more-negative gate voltage, at least -14 volts in this case. Some chips solve this by using an additional voltage supply to provide more voltage to the gate, such as the Intel 8080 or the HP Nanoprocessor.

VD, VG, and VS are the voltages on the transistor's drain, gate, and source respectively. VGS is the voltage difference between the gate and source.

VD, VG, and VS are the voltages on the transistor's drain, gate, and source respectively. VGS is the voltage difference between the gate and source.

The threshold voltage isn't much of a problem when you're dealing with inverters and other gates, because the voltage levels are restored by each gate. However, there are two places where the threshold voltage is a problem: superbuffers and pass transistor logic. In these circuits (described in the footnote4), the threshold voltage drop happens twice, yielding an output that is too weak. Since these circuits are common in processors, a solution was needed: the bootstrap load. It is a way of generating more voltage for the gate to overcome the threshold voltage so the transistor to pull its output all the way to VD.

How the bootstrap load works

The bootstrap load is essentially a charge pump circuit that uses a bootstrap capacitor to boost the gate voltage. The diagram below shows the basic idea of a charge pump. On the left, a capacitor is charged to -9 volts from a voltage source. If you disconnect the voltage source and then re-connect the negative side to the capacitor as shown on the right, the capacitor retains its charge of -9 volts. However, since the lower side of the capacitor is now at -9 volts, the upper side of the capacitor is now at -18 volts. The bootstrap load uses this -18 volts as the gate voltage, sufficient to overcome the threshold voltage.

A charge pump. On the left, the capacitor is charged to -9 volts. On the right, the bottom of the capacitor is connected to -9 volts, yielding -18 volts on top of the capacitor.

A charge pump. On the left, the capacitor is charged to -9 volts. On the right, the bottom of the capacitor is connected to -9 volts, yielding -18 volts on top of the capacitor.

The diagram below shows the bootstrap load circuit. The circuit is similar to the inverter described earlier, but with the addition of a capacitor and a transistor. In the first diagram, a 0 input turns on the lower transistor (Q1), yielding a 1 output (+5 volts). Meanwhile, Q3 acts as a load resistor, pulling the top of the capacitor to -4 volts (not -9 volts due to the threshold voltage.) This results in -9 volts stored across the capacitor.

How the bootstrap load circuit works.

How the bootstrap load circuit works.

The second and third diagrams show what happens with a 1 input. The lower transistor Q1 turns off, allowing Q2 to pull the output low. With a regular inverter, -4 volts is as low as the output can go (second diagram). However, as explained earlier, the capacitor still holds -9 volts, so the top of the capacitor must be -13 volts. With -13 volts on the gate of Q2, Q2 will continue to pull the output lower, until the circuit ends up as shown on the right, with the output pulled all the way down to -9 volts. Note that the source can't get pulled down any lower than the drain, regardless of the gate voltage. (In comparison, the simple inverter described earlier could only pull the output down to -5 volts.)5

The image below shows part of Intel's schematic for the 4004 processor, showing the circuit for a standard load and the circuit for the bootstrap load, indicated by a "B" next to the resistor.

Representation of the bootstrap load on the Intel 4004 schematic. The resistor with "B" symbolizes the bootstrap load circuit next to it.

Representation of the bootstrap load on the Intel 4004 schematic. The resistor with "B" symbolizes the bootstrap load circuit next to it.

The silicon-gate bootstrap load

So far, I've discussed the bootstrap load, which was extensively used with MOS circuitry, and was patented by North American Rockwell in 1966. The invention necessary for the 4004 and 8008 processors was the extension of the bootstrap load to silicon-gate integrated circuits.

One of the key inventions that made the 8008 practical was the self-aligning silicon gate transistor.6 The diagram below shows the structure of an MOS transistor. Early MOS integrated circuits used metal-gate 7 transistors, which used metal, typically aluminum, instead of polysilicon for the gate. But at Fairchild in 1968, Faggin and Klein invented a practical way to make transistors with silicon gates. This may seem like a trivial difference, but silicon-gate transistors were better than metal-gate transistors in three important ways. First, the electrical properties of silicon-gate transistors are much better than metal-gate transistors, running faster and at lower power. Second, polysilicon provided a second layer for routing signals, making integrated circuit layouts much more compact.

Structure of a PMOS transistor.

Structure of a PMOS transistor.

Finally, polysilicon permitted construction of self-aligned transistors, which play an important part in the bypass capacitor story. Integrated circuits are constructed through a sequence of processing steps, using optical masks and photo-sensitive resist to create patterns on the surface. An integrated circuit with metal-gate transistors is constructed from the bottom up. First, the source and drain regions are doped with impurities to form P-type silicon, as shown below. In a later step, the metal gate is created between the source and the drain, using a different mask. The tricky part is making sure the gate is lined up with the source and the drain; if there's a gap, the transistor won't work. Thus, a metal gate is made larger than necessary so it will still cover the gate channel, even if the alignment of the layers is slightly off. Unfortunately, this overlap creates capacitance and harms performance.

How a photomask is used to dope regions of silicon.

How a photomask is used to dope regions of silicon.

On the other hand, the self-aligned gate is created in the opposite order. The polysilicon gate is created first. In a later step, the source and drain regions are doped. However, a mask isn't used to separate the source and drain from the gate. Instead, the gate itself blocks doping of the region in between the source and drain. Thus, the source and drain are automatically "self-aligned" with the gate, eliminating the excess capacitance from a too-large gate. (Why couldn't metal gates be self-aligned? Because doping the silicon requires high temperatures that would melt the metal, but polysilicon can handle the heat.)

Although self-aligned silicon gates are a major improvement over metal gates, there was one drawback: capacitors. With metal-gate transistors, a capacitor could be easily constructed by using metal and doped silicon as the plates: a large metal layer on top, doped silicon underneath, and a thin insulating oxide layer in between. (In other words, a transistor with a large gate is used as a capacitor.) With self-aligned gates, the polysilicon gate could be used as a capacitor plate in place of the metal layer. However, in the self-aligned process, the polysilicon gate blocks doping of the silicon underneath, which is good for a transistor but bad for a capacitor, since you can't dope the silicon under the polysilicon plate. (You could use an extra manufacturing step to dope the capacitor plates before creating the polysilicon gate, but this extra step would increase the cost.)

Faggin invented a solution that made capacitors practical with self-aligned gates.8 He realized that if you bias the capacitor correctly, the charge on the upper plate will create a conductive region in the silicon underneath it, even without any doping. He tried this at Fairchild and discovered that it worked. This solved the problem of how to use a bootstrap load with self-aligned silicon-gate transistors.

Closeup of a bootstrap load circuit in the 8008.

Closeup of a bootstrap load circuit in the 8008.

The photo above zooms in on one of the boostrap load circuits in the 8008, used in an inverter. The diagram below shows the underlying silicon after removing the metal layer. The bootstrap capacitor is constructed by a layer of polysilicon (pinkish) over the underlying silicon, forming the capacitor plates. The transistor on the right inverts the input. The capacitor is charged by the transistor in the lower left. The load transistor is in the middle; the capacitor provides the boosted voltage to its gate. The transistors have varying sizes depending on their roles. The inverting transistor is the largest since it provides the most current. The transistor that charges the capacitor is very small in comparison because a small current can keep the capacitor charged.

The circuitry of an inverter with a bootstrap load.

The circuitry of an inverter with a bootstrap load.

This bootstrap load technique was extensively used in the 4004 and 8008 processors. The diagram below shows the bootstrap loads in the 8008 processor, indicated with a red box. The 8008 has 90 bootstrap loads, so it is a significant circuit. Many bootstrap loads are around the periphery of the chip to help drive the output pins. The instruction register (upper center) uses bootstrap loads to drive the relatively large instruction decoder (center). At the right, bootstrap loads drive the register storage (upper right) and stack storage (lower right). Other miscellaneous circuits throughout the processor also use bootstrap loads.

The bootstrap loads in the 8008 are indicated by red boxes.

The bootstrap loads in the 8008 are indicated by red boxes.

Conclusion

A final question is if the bootstrap load was a key invention that made the microprocessor possible (as embodied in the 4004 and 8008) or if the microprocessor was inevitable regardless of features such as the bootstrap load. One view is that "the buried contact and particularly the bootstrap load, were indispensable to obtain the required speed within the available power budget." Feeney said in an 8008 oral history "that being limited on pins, limited on power supplies, whatever, that the bootstrap load became very, very critical." On the other hand, the development of the microprocessor seemed an inevitable, incremental process to many. Fairchild engineer Lee Boysel said in 1970,10 "The computer-on-a-chip is no big deal. It's almost here now... I've no doubt the whole computer will be on one chip within five years." Hal Feeney of Intel said, "a the time in the early 1970s, late 1960s, the industry was ripe for the invention of the microprocessor."

In the narrow sense, the bootstrap load made the 4004 and 8008 possible with their given size, performance, and power consumption. The bootstrap load also illustrates how the microprocessor is not a single invention, but the aggregation of many smaller inventions that made it possible. However, looking at the broader picture, microprocessors would have been only slightly hampered if the bootstrap capacitor didn't exist. There were many alternatives such as four-phase logic, static logic, higher gate voltages, an additional power supply, or using an extra mask for the capacitors. The Texas Instruments TMX 1795 provides a direct comparison, since it was built at the same time as the 8008 with the same architecture, but using metal-gate transistors instead of silicon-gate. The diagram below shows that the TMX 1795 was considerably larger than the 8008, and it had somewhat worse performance, but the point is that microprocessors would have proceeded essentially the same without the bootstrap load. In any case, by 1974, the switch to NMOS transistors and improvements in threshold voltages made bootstrap loads unnecessary. My conclusion is that the bootstrap load was a helpful innovation, but microprocessors would have proceeded along a similar path even without this invention. Once technology permitted a few thousand transistors to be constructed on an integrated circuit, the single-chip CPU was inevitable.

Comparative die sizes of the TMX 1795, 4004 and 8008 microprocessors. Note that the 4004 and 8008 are nearly the same size, while the TMX 1795 is more than twice as large. The top third of the TMX 1795 is instruction decoding and control logic, the middle is the 8-bit ALU, and the bottom is storage (stack and registers). TMX 1795 die photo courtesy of Computer History Museum.

Comparative die sizes of the TMX 1795, 4004 and 8008 microprocessors. Note that the 4004 and 8008 are nearly the same size, while the TMX 1795 is more than twice as large. The top third of the TMX 1795 is instruction decoding and control logic, the middle is the 8-bit ALU, and the bottom is storage (stack and registers). TMX 1795 die photo courtesy of Computer History Museum.

If you're interested in the 8008, my previous article has a detailed discussion of the 8008's architecture and more die photos; I also explain the 8008's ALU. I announce my latest blog posts on Twitter, so follow me at kenshirriff. I also have an RSS feed.

Notes and references

  1. Bootstrap loads in the Intel 4004 are discussed by Insanity 4004 here and here

  2. In his oral history, Faggin describes Intel's fixation on 16-pin packages. When a memory chip required 18 pins instead of 16, it was "like the sky had dropped from heaven. I never seen so [many] long faces at Intel, over this issue, because it was a religion in Intel; everything had to be 16 pins, in those days. Everything had to be 16 pins... It was a completely silly requirements to have 16 pins." At the time, other manufacturers were using 40- and 48-pin packages, so there was no technical limitation, just a minor cost saving from the smaller package. 

  3. The classic microprocessors such as the 8080, 6502, and Z-80 were built with NMOS transistors. The earlier 4004 and 8008 used PMOS transistors, which were easier to manufacture but had poorer performance. If you're familiar with NMOS logic, PMOS logic is a mirror world, where everything is backward. PMOS used negative voltages, which were also significantly higher than the 5 volts used by standard TTL. For compatibility with TTL levels, the 8008 ran with Vcc at +5V and Vdd at -9V, so it could produce TTL-compatible outputs of roughly 0 volts and 5 volts. (See the datasheet for more details.) The 4004 required -15 volts, typically Vdd = -10V and Vss = +5V. Confusingly, the 4004 defined logic "0" as the more positive voltage and logic "1" as the more negative voltage (datasheet). 

  4. The "superbuffer" replaces the load resistor with an active transistor and is used when more current is required, for instance to drive an internal bus or an output pin. The upper transistor is driven by an inverter, so it is on when the lower transistor is off. Instead of the weak current from the load resistor/transistor, this transistor provides a high current. The problem is that the threshold voltage limits the voltage from the upper transistor. With a regular inverter, the inverter output loses VT, so it will provide -4 volts to the upper transistor's gate. Losing another VT there yields an insufficient output voltage of +1 volt instead of the desired -9 volts.

    A superbuffer provides a fast, high-current output in both directions.

    A superbuffer provides a fast, high-current output in both directions.

    The second case where the threshold voltage drop is a problem is with a pass transistor, used for dynamic logic. The diagram below illustrates a simple pass transistor circuit. When the control signal is low, the transistor is active, passing the input signal through to the output. But when the control signal is high, the transistor stops passing the input. Instead, the previous value is held by the circuit's capacitance (shown in gray) so the output holds its previous value. Thus, pass transistors provide an efficient way of implementing temporary storage. The problem with pass transistors is the threshold voltage. If the control signal on the gate comes from a regular gate, the "on" voltage will be -4 volts due to the threshold voltage loss. The pass transistor causes a second threshold voltage loss, so the lowest it can pull its output is +1 volt, not enough for reliable operation.

    A simple pass-transistor circuit.

    A simple pass-transistor circuit.

    The bootstrap load fixes these problems. By putting a bootstrap load on the inverter in the superbuffer or on the circuit controlling the pass transistor, the drive voltage will be close to -9 volts. Now there is only a single threshold voltage drop, leaving the output at -5 volts, sufficiently negative for reliable operation. 

  5. This discussion of the bootstrap load is a simplified explanation. The real circuit is affected by stray capacitance, transistor leakage, and other factors, so the output wouldn't be all the way to VDD. One thing I'd like to point out, though, is that you might expect the capacitor's charge to leak out through Q3 as fast as it charged. Although Q3 is treated as a resistor, it also acts as a diode, blocking the capacitor from discharging. (With the capacitor more negative, the roles of Q3's source and drain are reversed and it no longer conducts.) 

  6. The silicon-gate bootstrap capacitor exemplifies the paths of information between companies at the dawn of the microprocessor era. Practical silicon gate technology was created at Fairchild (with some earlier roots). When employees (including Faggin) left Fairchild for Intel, they took this knowledge with them. (And in some cases took "lots and lots of Fairchild internal confidential documents", see Shima oral history). From Intel, ideas spread to other companies, such as when Faggin leaving Intel to found Zilog, basing the Zilog Z80 on the Intel 8080.  

  7. Interestingly, in 2007 Intel started using metal gates again in order to scale transistors further (details). In a way, semiconductor technology has gone full circle, back to metal gates, although now unusual metals such as hafnium are used. 

  8. In the making of the first microprocessor, Federico Faggin says, "bootstrap load was a very popular circuit design trick used in just about all MOS dynamic circuits of that time. It made possible an output signal swing that was not only equal to the power supply voltage, but was also faster than possible with normal MOS loads for the same power dissipation." Faggin describes how he invented the bootstrap load in the 4004 oral history (p11) and the 8008 oral history (p8). Also see Faggin's The MOS silicon gate technology and the first microprocessors. He describes how the bootstrap load is needed for a two-phase design, and how silicon gate technology didn't support capacitors. Faggin's site describes the bootstrap load. Bootstrap load is also described at mosgate

  9. The threshold voltage depends on various properties of the integrated circuit including the gate material and the oxide thickness. I couldn't find a specific value for the threshold voltage in the 8008 processor, but -5 volts seems like the right ballpark (and is a conveniently round number). The book MOSFET in Circuit Design discusses threshold voltages for P-channel devices.  

  10. The bootstrap load illustrates the social process through which people are assigned credit for inventions and the construction of reputation. Although Faggin had a key role in the 4004 and 8008 processors, "when he left to found Zilog he got temporarily written outside of the Intel history." (See Intel disowns Faggin and Interview with San Mazor.) Faggin states, "They tried to erase my name from all of my contributions, including the silicon gate technology and the first microprocessor, and attribute them to others." After lobbying efforts by Faggin's wife and the pro-Faggin website intel4004.com, Intel reluctantly gave Faggin more credit. Faggin eventually received various awards including the National Medal of Technology and Innovation in 2010, so in the end he received his (deserved) recognition.

    The point is that credit is not assigned objectively, but is a dynamic force depending on various corporate and personal forces and who tells the story. (Wikipedia is one modern arena for these conflicts.) One corrective is the book History of semiconductor engineering, which covers many of the key people in the history of integrated circuits, with little regard for the "generally accepted" history. I should make it clear that I am drawing most heavily on Faggin's writings for background on the bootstrap load, so this blog post should not be viewed as an "objective" view of who should get credit for it. It looks like the silicon-gate bootstrap load was invented simultaneously at National Semiconductor; patent 3912948 filed in 1971 by Dilip Bapat describes an identical silicon-gate bootstrap load circuit.