A bug fix in the 8086 microprocessor, revealed in the die's silicon

The 8086 microprocessor was a groundbreaking processor introduced by Intel in 1978. It led to the x86 architecture that still dominates desktop and server computing. While reverse-engineering the 8086 from die photos, a particular circuit caught my eye because its physical layout on the die didn't match the surrounding circuitry. This circuit turns out to implement special functionality for a couple of instructions, subtlely changing the way they interacted with interrupts. Some web searching revealed that this behavior was changed by Intel in 1978 to fix a problem with early versions of the 8086 chip. By studying the die, we can get an idea of how Intel dealt with bugs in the 8086 microprocessor.

In modern CPUs, bugs can often be fixed through a microcode patch that updates the CPU during boot.1 However, prior to the Pentium Pro (1995), microprocessors could only be fixed through a change to the design that fixed the silicon. This became a big problem for Intel with the famous Pentium floating-point division bug. The chip turned out to have a bug that resulted in rare but serious errors when dividing. Intel recalled the defective processors in 1994 and replaced them, at a cost of $475 million.

The circuit on the die

The microscope photo below shows the 8086 die with the main functional blocks labeled. This photo shows the metal layer on top of the silicon. While modern chips can have more than a dozen layers of metal, the 8086 has a single layer. Even so, the metal mostly obscures the underlying silicon. Around the outside of the die, you can see the bond wires that connect pads on the chip to the 40 external pins.

The 8086 die with main functional blocks labeled. Click this image (or any other) for a larger version.

The 8086 die with main functional blocks labeled. Click this image (or any other) for a larger version.

The relevant part of the chip is the Group Decode ROM in the upper center. The purpose of this circuit is to categorize instructions into groups that control how they are decoded and processed. For instance, very simple instructions (such as setting a flag) can be performed directly in one cycle. Other instructions are not complete instructions, but a prefix that modifies the following instruction. The remainder of the instructions are implemented in microcode, which is stored in the lower-right corner of the chip. Many of these instructions have a second byte, the "Mod R/M" byte that specifies a register and the memory addressing scheme. Some instructions have two versions: one for an 8-bit operand and one for a 16-bit operand. Some operations have a bit to swap the source and destination. The Group Decode ROM is responsible for looking at the 8 bits of the instruction and deciding which groups the instruction falls into.

A closeup of the Group Decode ROM. This image is a composite showing the metal, polysilicon, and silicon layers.

A closeup of the Group Decode ROM. This image is a composite showing the metal, polysilicon, and silicon layers.

The photo above shows the Group Decode ROM in more detail. Strictly speaking, the Group Decode ROM is more of a PLA (Programmable Logic Array) than a ROM, but Intel calls it a ROM. It is a regular grid of logic, allowing gates to be packed together densely. The lower half consists of NOR gates that match various instruction patterns. The instruction bits are fed horizontally from the left, and each NOR gate is arranged vertically. The outputs from these NOR gates feed into a set of horizontal NOR gates in the upper half, combining signals from the lower half to produce the group outputs. These NOR gates have vertical inputs and horizontal outputs.

The diagram below is a closeup of the Group Decode ROM, showing how the NOR gates are constructed. The pinkish regions are silicon, doped with impurities to make it a semiconductor. The gray horizontal lines are polysilicon, a special type of silicon on top. Where a polysilicon crosses conductive silicon, it forms a transistor. The transistors are wired together by metal wiring on top. (I dissolved the metal layer with acid to show the silicon; the blue lines show where two of the metal wires were.) When an input is high, it turns on the corresponding transistors, pulling the vertical lines low. This creates NOR gates with multiple inputs. The key idea of the PLA is that at each point where horizontal and vertical lines cross, a transistor can be present or absent, to select the desired gate inputs. By doping the silicon in the desired pattern, transistors can be created or omitted as needed. In the diagram below, two of the transistors are highlighted. You can see that some of the other locations have transistors, while others do not. Thus, the PLA provides a dense, flexible way to produce a set of outputs from a set of inputs.

Cioseup of part of the Gate Decode ROM showing a few of the transistors. I dissolved the metal layer for this image, to reveal the silicon and polysilicon underneath.

Cioseup of part of the Gate Decode ROM showing a few of the transistors. I dissolved the metal layer for this image, to reveal the silicon and polysilicon underneath.

Zooming out a bit, the PLA is connected to some unusual circuitry, shown below. The last two columns in the PLA are a bit peculiar. The upper half is unused. Instead, two signals leave the side of the PLA horizontally and bypass the top of the PLA. These signals go to a NOR gate and an inverter that are kind of in the middle of nowhere, separated from the rest of the logic. The output from these gates goes to a three-input NOR gate, which is curiously split into two pieces. The lower part is a normal two-input NOR gate, but then the transistor for the third input (the one we're looking at) is some distance away. It's unusual for a gate to be split across a distance like this.

The circuitry as it appears on the die.

The circuitry as it appears on the die.

It can be hard to keep track of the scale of these diagrams. The highlighted box in the image below corresponds to the region above. As you can see, the circuit under discussion spans a fairly large fraction of the die.

The red rectangle in this figure highlights the region in the diagram above.

The red rectangle in this figure highlights the region in the diagram above.

My next question was what instructions were affected by this mystery circuitry. By looking at the transistor pattern in the Group Decode ROM, I determined that the two curious columns matched instructions with bits 10001110 and 000xx111. A look at the 8086 reference shows that the first bit pattern corresponds to the instructions MOV sr,xxx, which loads a value into a segment register. The second bit pattern corresponds to the instructions POP sr, which pops a value from the stack into a segment register. But why did these instructions need special handling?

The interrupt bug

After searching for information on these instructions, I came across errata stating: "Interrupts Following MOV SS,xxx and POP SS Instructions May Corrupt Memory. On early Intel 8088 processors (marked “INTEL ‘78” or “(C) 1978”), if an interrupt occurs immediately after a MOV SS,xxx or POP SS instruction, data may be pushed using an incorrect stack address, resulting in memory corruption." The fix to this bug turns out to be the mystery circuitry.

I'll give a bit of background. The 8086, like most processors, has an interrupt feature where an external signal, such as a timer or input/output, can interrupt the current program. The processor starts running different code to handle the interrupt, and then returns to the original program, continuing where it left off. When interrupted, the processor uses its stack in memory to keep track of what it was doing in the original program so it can continue. The stack pointer (SP) is a register that keeps track of where the stack is in memory.

A complication is that the 8086 uses "segmented memory", where memory is divided into chunks (segments) with different purposes. On the 8086, there are four segments: the Code Segment, Data Segment, Stack Segment, and Extra Segment. Each segment has an associated segment register that holds the starting memory address for that segment. Suppose you want to change the location of the stack in memory, maybe because you're starting a new program. You need to change the Stack Segment register (called SS) to point to the new location for the stack segment. And you also need to change the Stack Pointer register (SP) to point to the stack's current position within the stack segment.

A problem arises if the processor receives an interrupt after the Stack Segment register has been changed, but before the Stack Pointer register has been changed. The processor will store information on the stack using the old stack pointer address but in the new segment. Thus, the information is stored into essentially a random location in memory, which is bad.2 Intel's fix was to delay an interrupt after an update to the stack segment register, so you had a chance to update the stack pointer.3 The stack segment register could be changed in two ways. First, you could move a value to the register ("MOV SS, xxx" in assembly language), or you could pop a value off the stack into the stack segment register ("POP SS"). These are the two instructions affected by the mystery circuitry. Thus, we can see that Intel added circuitry to delay an interrupt immediately after one of these instructions and avoid the bug.

Conclusions

One of the interesting things about reverse-engineering the 8086 is when I find a curious feature on the die and then find that it matches an obscure part of the 8086 documentation. Most of these are deliberate design decisions, but they show how complex and ad-hoc the 8086 architecture is, with many special cases. Each of these cases results in some circuitry and gates, complicating the chip. (In comparison, I've reverse-engineered the ARM1 processor, a RISC processor that started the ARM architecture. The ARM1 has a much simpler architecture with very few corner cases. This is reflected in circuitry that is much simpler.)

The case of the segment registers and interrupts, however, is the first circuit that I've found on the 8086 die that is part of a bug fix. This fix appears to have been fairly tricky, with multiple gates scattered in unused parts of the chip. It would be interesting to get a die photo of a very early 8086 chip, prior to this bug fix, to confirm the change and see if anything else was modified.

If you're interested in the 8086, I wrote about the 8086 die, its die shrink process and the 8086 registers earlier. I plan to write more about the 8086 so follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @[email protected].

Notes and references

  1. The modern microcode update process is more complicated than I expected with updates possible before the BIOS is involved, during boot, or even while applications are running. Intel provides details here. Apparently Intel originally added patchable microcode to the Pentium Pro for chip debugging and testing, but realized that it would be a useful feature to fix bugs in the field (details). 

  2. The obvious workaround for this problem is to disable interrupts while you're changing the Stack Segment register, and then turn interrupts back on when you're done. This is the standard way to prevent interrupts from happening at a "bad time". The problem is that the 8086 (like most microprocessors) has a non-maskable interrupt (NMI), an interrupt for very important things that can't be disabled. 

  3. Intel documents the behavior in a footnote on page 2-24 of the User's Manual:

    There are a few cases in which an interrupt request is not recognized until after the following instruction. Repeat, LOCK and segment override prefixes are considered "part of" the instructions they prefix; no interrupt is recognized between execution of a prefix and an instruction. A MOV (move) to segment register instruction and a POP segment register instruction are treated similarly: no interrupt is recognized until after the following instruction. This mechanism protects a program that is changing to a new stack (by updating SS and SP). If an interrupt were recognized after SS had been changed, but before SP had been altered, the processor would push the flags, CS and IP into the wrong area of memory. It follows from this that whenever a segment register and another value must be updated together, the segment register should be changed first, followed immediately by the instruction that changes the other value. There are also two cases, WAIT and repeated string instructions, where an interrupt request is recognized in the middle of an instruction. In these cases, interrupts are accepted after any completed primitive operation or wait test cycle.

    Curiously, the fix on the chip is unnecessarily broad: a MOV or POP for any segment register delays interrupts. There was no hardware reason for this: the structure of the PLA means that all the necessary instruction bits were present and it would be no harder to test for the Stack Segment register specifically. The fix of delaying interrupts after a POP or MOV remains in the x86 architecture today. However, it has been cleaned up so only instructions affecting the Stack Segment register cause the delay; operations on other segment registers have no effect. 

The unusual bootstrap drivers inside the 8086 microprocessor chip

The 8086 microprocessor is one of the most important chips ever created; it started the x86 architecture that still dominates desktop and server computing today. I've been reverse-engineering its circuitry by studying its silicon die. One of the most unusual circuits I found is a "bootstrap driver", a way to boost internal signals to improve performance.1

The bootstrap driver circuit from the 8086 processor.

The bootstrap driver circuit from the 8086 processor.

This circuit consists of just three NMOS transistors, amplifying an input signal to produce an output signal, but it doesn't resemble typical NMOS logic circuits and puzzled me for a long time. Eventually, I stumbled across an explanation:2 the "bootstrap driver" uses the transistor's capacitance to boost its voltage. It produces control pulses with higher current and higher voltage than otherwise possible, increasing performance. In this blog post, I'll attempt to explain how the tricky bootstrap driver circuit works.

A die photo of the 8086 processor. The metal layer on top of the silicon is visible. Around the edge of the chip, bond wires provide connections to the chip's external pins. Click this image (or any other) for a larger version.

A die photo of the 8086 processor. The metal layer on top of the silicon is visible. Around the edge of the chip, bond wires provide connections to the chip's external pins. Click this image (or any other) for a larger version.

NMOS transistors

The 8086 is built from MOS transistors (MOSFETs), specifically NMOS transistors. Understanding the bootstrap driver requires some understanding of these transistors. If you're familiar with MOSFETs as components, they have source and drain pins and current flows from the drain to the source, controlled by the gate pin. Most of the time I treat an NMOS transistor as a digital switch between the drain and the source: a 1 input turns the transistor on, closing the switch, while a 0 turns the transistor off. However, for the bootstrap driver, we must consider the MOSFET in a bit more detail.

A MOSFET switches current from the drain to the source, under control of the gate.

A MOSFET switches current from the drain to the source, under control of the gate.

The important aspect of the gate is the difference between the gate voltage and the (typically lower) source voltage; this is denoted as Vgs. Without going into semiconductor physics, a slightly more accurate model is that the transistor turns on when the voltage between the gate and the source exceeds the fixed threshold voltage, Vth. This creates a conducting channel between the transistor's source and drain. Thus, if Vgs > Vth, the transistor turns on and current flows. Otherwise, the transistor turns off and no current flows.

The voltage between the gate and the source (Vgs) controls the transistor.

The voltage between the gate and the source (Vgs) controls the transistor.

The threshold voltage has an important consequence for a chip such as the 8086. The 8086, like most chips of that era, used a 5-volt power supply. The threshold voltage depends on manufacturing characteristics, but I'll use 1 volt as a typical value.3 The result is that if you put 5 volts on the drain and on the gate, the transistor can pull the source up to about 4 volts, but then Vgs falls to the threshold voltage and the transistor stops conducting. Thus, the transistor can't pull the source all the way up to the 5-volt supply, but falls short by a volt on the output. In some circumstances this is a problem, and this is the problem that the bootstrap driver fixes.

Due to the threshold voltage, the transistor doesn't pull the source all the way to the drain's voltage, but "loses" a volt.

Due to the threshold voltage, the transistor doesn't pull the source all the way to the drain's voltage, but "loses" a volt.

If you get a transistor as a physical component, the source and drain are not interchangeable. However, in an integrated circuit, there is no difference between the source and the drain, and this will be important.4 The diagram below shows how a MOSFET is constructed on the silicon die. The source and drain consist of regions of silicon doped with impurities to change their property. Between them is a channel of undoped silicon, which normally does not conduct. Above the channel is the gate, made of a special type of silicon called polysilicon. The voltage on the gate controls the conductivity of the channel. A very thin insulating layer separates the gate from the channel. As a side effect, the insulating layer creates some capacitance between the gate and the underlying silicon.

Diagram of an NMOS transistor in an integrated circuit.

Diagram of an NMOS transistor in an integrated circuit.

Basic NMOS circuits

Before getting to the bootstrap driver, I'll explain how a basic inverter is implemented in an NMOS chip like the 8086. The inverter is built from two transistors: a normal transistor on the bottom, and a special load transistor on top that acts like a pull-up resistor, providing a small constant current.5 With a 1 input, the lower transistor turns on, pulling the output to ground to produce a 0 output. With a 0 input, the lower transistor turns off and the current from the upper transistor drives the output high to produce a 1 output. Thus, the circuit implements an inverter: producing a 1 when the input is 0 and vice versa.

A standard NMOS inverter is built from two transistors. The upper transistor is a "depletion load" transistor.

A standard NMOS inverter is built from two transistors. The upper transistor is a "depletion load" transistor.

The disadvantage of this inverter circuit is that when it produces a 0 output, current continuously flows through the load transistor and the lower transistor to ground. This wastes power, leading to high power consumption for NMOS circuitry. (To solve this, CMOS circuitry took over in the 1980s and is used in modern microprocessors.) This also limits the current that the inverter can provide.

If a gate needs to provide a relatively large current, for instance to drive a long bus inside the chip, a more complex circuit is used, the "superbuffer". The superbuffer uses one transistor to pull the output high and a second transistor to pull the output low.6 Because only one transistor is on at a time, a high-current output can be produced without wasting power. There are two disadvantages of the superbuffer, though. First, the superbuffer requires an inverter to control the high-side transistor, so it uses considerably more space on the die. Second, the superbuffer can't pull the high output all the way up; it loses a volt due to the threshold voltage as described earlier.

Combining two output transistors with an inverter produces a higher-current output, known as a superbuffer.

Combining two output transistors with an inverter produces a higher-current output, known as a superbuffer.

The bootstrap driver

In some circumstances, you want both a high-current output, and the full output voltage. One example is connecting a register to an internal bus. Since the 8086 is a 16-bit chip, it uses 16 transistors for the bus connection. Driving 16 transistors in parallel requires a fairly high current. But the bus transistors are "pass" transistors, which lose a volt due to the threshold voltage, so you want to start with the full voltage, not already down one volt. To provide both high current and the full voltage, bootstrap drivers are used to control the buses, as well as similar tasks such as ALU control.

The concept behind the bootstrap driver is to drive the gate voltage significantly higher than 5 volts, so even after losing the threshold voltage, the transistor can produce the full 5-volt output.7 The higher voltage is generated by a charge pump, as illustrated below. Suppose you charge a capacitor with 5 volts. Now, disconnect the bottom of the capacitor from ground, and connect it to +5 volts. The capacitor is still charged with 5 volts, so now the high side is at +10 volts with respect to ground. Thus, a capacitor can be used to create a higher voltage by "pumping" the charge to a higher level.

On the left, the "flying capacitor' is charged to 5 volts. By switching the lower terminal to +5 volts, the capacitor now outputs +10 volts

On the left, the "flying capacitor' is charged to 5 volts. By switching the lower terminal to +5 volts, the capacitor now outputs +10 volts

The idea of the bootstrap driver is to attach a capacitor to the gate and charge it to 5 volts. Then, the low side of the capacitor is raised to 5 volts, boosting the gate side of the capacitor to 10 volts. With this high voltage on the gate, the threshold voltage is easily exceeded and the transistor can pass the full 5 volts from the drain to the source, producing a 5-volt output.

With a large voltage on the gate, the threshold voltage is exceeded and the transistor remains on until the source reaches 5 volts.

With a large voltage on the gate, the threshold voltage is exceeded and the transistor remains on until the source reaches 5 volts.

In the 8086 bootstrap driver,8 an explicit capacitor is not used.9 Instead, the transistor's inherent capacitance is sufficient. Due to the thin insulating oxide layer between the gate and the underlying silicon, the gate acts as the plate of a capacitor relative to the source and drain. This "parasitic" capacitance is usually a bad thing, but the bootstrap driver takes advantage of it.

The diagrams below show how the bootstrap driver works. Unlike an inverter, the bootstrap driver is controlled by the chip's clock, generating an output only when the clock is high. In the first diagram, we assume that the input is a 1 and the clock is low (0). Two things happen. First, the inverted clock turns on the bottom transistor, pulling the output to ground. Second, the 5V input passes through the first transistor; the left side of the transistor acts as the drain and the right side as the source. Due to the threshold voltage, a volt is "lost" so about 4 volts reaches the gate of the second transistor. Since the source and drain of the second transistor are at 0 volts, the gate capacitors are charged with 4 volts. (Recall that these are not explicit capacitors, but are parasitic capacitors.)

The first step in the operation of the bootstrap driver. The gate capacitance is charged by the input.

The first step in the operation of the bootstrap driver. The gate capacitance is charged by the input.

In the next step, the clock switches state and things become more interesting. The second transistor is on due to the voltage on the gate, so current flows from the clock to the output. In a "normal" circuit, the output would rise to 4 volts, losing a volt due to the threshold voltage of the second transistor. However, as the output voltage rises, it boosts the voltage on the gate capacitors and thus raises the gate voltage. The increased gate voltage allows the output voltage to rise above 4 volts, pushing the gate voltage even higher, until the output reaches 5 volts.10 Thus, the bootstrap driver produces a high-current output with the full 5 volts.

The second step in the operation of the bootstrap driver. As the output rises, it boosts the gate voltage even higher.

The second step in the operation of the bootstrap driver. As the output rises, it boosts the gate voltage even higher.

An important factor is that the first transistor now has a higher voltage on the right than on the left, so the source and drain switch roles. Since the transistor has 5 volts on the gate and on the (now) source, Vgs is 0 and current can't flow. Thus the first transistor blocks current flow from the gate, keeping the gate at its higher voltage. This is the critical role of the first transistor in the bootstrap driver, acting as a diode to block current flow out of the gate.

The diagram below shows what happens when the clock switches state again, assuming a low input. Now the first transistor's source voltage drops, making Vgs large and turning the transistor on. This allows the second transistor's gate voltage to flow out. Note that the first transistor is no longer acting as a diode, since current can flow in the "reverse" direction. The other important action in this clock phase is that the bottom transistor turns on, pulling the output low. These actions discharge the gate capacitance, preparing it for the next bootstrap cycle.

When the clock switches off, the driver is discharged, preparing it for the next cycle.

When the clock switches off, the driver is discharged, preparing it for the next cycle.

The 8086 die

Now that I've explained the theory, how do bootstrap drivers appear on the silicon die of the 8086? The diagram below shows six drivers that control the ALU operation.11 There's a lot happening in this diagram, but I'll try to explain what's going on. For this photo, I removed the metal layer with acid to reveal the silicon underneath; the yellow lines show where the metal wiring was. The large pinkish regions are doped silicon, while the gray speckled lines are polysilicon on top. The greenish and reddish regions are undoped silicon, which doesn't conduct and can be ignored. A transistor is formed where a polysilicon line crosses silicon, with the source and drain on opposite sides. Note that some transistors share the source or drain region with a neighboring transistor, saving space. The circles are vias, connections between the metal and a lower layer.

Six bootstrap drivers as they appear on the chip.

Six bootstrap drivers as they appear on the chip.

The drivers start with six inputs at the right. Each input goes through a "diode" transistor with the gate tied to +5V. I've labeled two of these transistors and the other four are scattered around the image. Next, each signal goes to the gate of one of the drive transistors. These six large transistors pass the clock to the output when turned on. Note that the clock signal flows through large silicon regions, rather than "wires". Finally, each output has a pull-down transistor on the left, connecting it to ground (another large silicon region) under control of the inverted clock. The drive transistors are much larger than the other transistors, so they can provide much more current. Their size also provides the gate capacitance necessary for the operation of the bootstrap driver.

Although the six drivers in this diagram are electrically identical, each one has a different layout instead of repeating the same layout six times. This demonstrates how the layout has been optimized, moving transistors around to use space most efficiently.

In total, the 8086 has 81 bootstrap drivers, mostly controlling the register file and the ALU (arithmetic-logic unit). The die photo below shows the location of the drivers, indicated with red dots. Most of them are in the center-left of the chip, between the registers and ALU on the left and the control circuitry in the center.

The 8086 die with main functional blocks labeled. The bootstrap drivers are indicated with red dots.

The 8086 die with main functional blocks labeled. The bootstrap drivers are indicated with red dots.

Conclusions

For the most part, the 8086 uses standard NMOS logic circuits. However, a few of its circuits are unusual, and the bootstrap driver is one of them. This driver is a tricky circuit, depending on some subtle characteristics of MOS transistors, so I hope my explanation made sense. This driver illustrates how Intel used complex, special-case circuitry when necessary to get as much performance from the chip as possible.

If you're interested in the 8086, I wrote about the 8086 die, its die shrink process and the 8086 registers earlier. I plan to write more about the 8086 so follow me on Twitter @kenshirriff or RSS for updates.

Notes and references

  1. Intel used a "bootstrap load" circuit in the 4004 and 8008 processors. The bootstrap load has many similarities to the bootstrap driver, using capacitance to boost the output voltage. But it is a different circuit, used in a different role. The bootstrap load was designed for PMOS circuits to boost the voltage from a pull-up transistor, using explicit capacitors, built with a process invented by Federico Faggin. I wrote about the bootstrap load here

  2. The only explanation of a bootstrap driver that I could find is in section 2.3.1 of DRAM Circuit Design: A Tutorial. The 8086 transistors with the gate wired to +5V puzzled me for the longest time. It seemed to me that this transistor would always be on, and thus had no function. However, the high voltage of the bootstrap driver gives it a function. I was randomly reading the DRAM book and suddenly recognized that one of the circuits in that book was similar to the mysterious 8086 circuit. 

  3. The threshold voltage was considerably higher for older PMOS transistors. To get around this, old chips used considerably higher supply voltages, so "losing" the threshold voltage wasn't as much of a problem. For instance, the Intel 4004 used a 15-volt supply. 

  4. The reason that MOSFETs are symmetrical in an integrated circuit and asymmetrical as physical components is that MOSFETs really have four terminals: source, gate, drain, and the substrate (the underlying silicon on which the transistor is constructed). In component MOSFETs, the substrate is internally connected to the source, so the transistor has three pins. However, the source-substrate connection creates a diode, making the component MOSFET asymmetrical. Four-terminal MOSFETs such as the 3N155 exist but are rare. The MOnSter 6502 made use of 4-terminal MOSFET modules to implement the 6502's pass transistors. 

  5. The load transistor is a special type of transistor, a depletion transistor that is doped differently. The doping produces a negative threshold voltage, so the transistor remains on and provides a relatively constant current. See Wikipedia for more on depletion loads. 

  6. The superbuffer has some similarity with a CMOS gate. Both use separate transistors to pull the signal high or low, with only one transistor on at a time. The difference is that CMOS uses a complementary transistor, i.e. PMOS, to pull the signal high. PMOS performs better in this role than NMOS. Moreover, a PMOS transistor is turned on by a 0 on the gate. This behavior eliminates the need for the inverter in a superbuffer. 

  7. The 8086 processor also uses completely different charge pumps to create a negative voltage for a substrate bias. I discuss that use of charge pumps here

  8. Why is it called a bootstrap driver? The term originates with footwear: boots often had boot straps on the top, physical straps to help pull the boots on. In the 1800s, the saying "No man can lift himself by his own boot straps" was used as a metaphor for the impossibility of improvement solely through one's own effort. (Pulling on the straps on your boots superficially seems like it should lift you off the ground, but is of course physically impossible.) By the mid-1940s, "bootstrap" was used in electronics to describe a circuit that started itself up through positive feedback, metaphorically pulling itself up by its bootstraps. The bootstrap driver continues this tradition, pulling itself up to a higher voltage. 

  9. Some circuits in the 8086 use physical capacitors on the die, constructed from a metal layer over silicon. The substrate bias generators use relatively large capacitors. There are also some small capacitors that appear to be used for timing reasons. 

  10. The exact voltage on the gate will depend on the relative capacitances of different parts of the circuit, but I'm ignoring these factors. The voltages that I show in the diagram are illustrations of the principle, not accurate values. 

  11. Some of the 8086's bootstrap drivers pre-discharge when the clock is low and produce an output when the clock is high, while other drivers operate on the opposite clock phases. The ALU drivers in the die photo operate on the opposite phases, but I've labeled the diagram to match the previous discussion.