Reverse-engineering the Intel 8086 processor's HALT circuits

The 8086 processor was introduced in 1978 and has greatly influenced modern computing through the x86 architecture. One unusual instruction in this processor is HLT, which stops the processor and puts it in a halt state. In this blog post, I explain in detail how the halt circuitry is implemented and how it interacts with the 8086's architecture.

The die photo below shows the 8086 microprocessor under a microscope. The metal layer on top of the chip is visible, with the silicon and polysilicon mostly hidden underneath. Around the edges of the die, bond wires connect pads to the chip's 40 external pins. I've labeled the key functional blocks; the ones that are important to this discussion are darker and will be discussed in detail below. Architecturally, the chip is partitioned into a Bus Interface Unit (BIU) at the top and an Execution Unit (EU) below. The BIU handles memory accesses, while the Execution Unit (EU) executes instructions. Both are stopped by a halt instruction.

The 8086 die under a microscope, with main functional blocks labeled. This photo shows the chip's single metal layer; the polysilicon and silicon are underneath. Click on this image (or any other) for a larger version.

The 8086 die under a microscope, with main functional blocks labeled. This photo shows the chip's single metal layer; the polysilicon and silicon are underneath. Click on this image (or any other) for a larger version.

Halt processing in the Execution Unit

In this section, I'll explain how the HLT instruction is decoded and handled in the Execution Unit. The 8086 uses a combination of lookup ROMs, logic, and microcode to implement instructions. The process starts with the loader, a state machine that provides synchronization between the prefetch queue and the decoding circuitry. When an instruction byte is available, the loader provides a signal called First Clock that loads the instruction into the Instruction Register and starts the instruction decoding process.

Before microcode gets involved, the Group Decode ROM classifies instructions by producing about 15 signals, indicating properties such as instructions with a Mod R/M byte, instructions with a byte/word bit, instructions that always act on a byte, and so forth. For the HLT instruction, the Group Decode ROM provides two important signals. The first is one-byte logic (1BL), indicating that the instruction is one byte long and is implemented with logic circuitry rather than microcode.1 The second signal is produced for the HLT instruction specifically and generates the internal HALT signal. This signal travels to various parts of the 8086 to halt the processor.

The Group Decode ROM. The yellow rectangle detects the HLT instruction, with an output at the bottom. The red rectangle generates the 1BL (one-byte logic) signal.

The Group Decode ROM. The yellow rectangle detects the HLT instruction, with an output at the bottom. The red rectangle generates the 1BL (one-byte logic) signal.

In the Execution Unit, the HALT signal blocks the reading of new instructions from the prefetch queue. This causes the loader to wait indefinitely and stops execution of new instructions. Since no new instruction replaces HLT, the Group Decode ROM continues to generate the HALT signal. The HALT signal also blocks most of the other outputs from the Group Decode ROM, preventing other decoding actions.

Thus, the Execution Unit sits idle as a result of the HLT instruction, unable to start a new instruction. Modern processors often have low-power halt modes, where part of the processor is shut down or a clock domain is stopped to reduce power consumption. The 8086, however, doesn't do anything clever to minimize power consumption in the halt mode, since this wasn't a concern for processors in the 1970s.

Halt processing in the Bus Interface Unit

Memory and I/O devices are connected to the 8086 chip through a bus that transmits address, data, and control information. The 8086's Bus Interface Unit handles reads and writes over this bus, running independently from the Execution Unit. A complete bus cycle for a read or write takes four clock periods, called T1, T2, T3, and T4,2 with specific signals on the bus for each time state.

A HLT instruction stops the Bus Interface Unit, but this takes several steps. First, the Bus Interface Unit must complete any currently-running bus cycle. Any new bus cycle must be blocked. Finally, the processor indicates the HALT state to any devices on the bus by issuing a special T1 cycle over the bus.

The main HALT control signal inside the Bus Interface Unit is something I call halt-not-hold, indicating a HALT is active, but not a HOLD. (Ignore the HOLD part for now.) This signal is activated by the HLT instruction signal from the Group Decode ROM, except it is blocked by any bus operations in progress. Once any current bus operation reaches T2, halt-not-hold gets activated and starts the halt process while the current bus cycle finishes up.

To prevent new bus activity, the halt-not-hold signal blocks new prefetch requests. The only other source of bus activity is an instruction that performs reads or writes. But the current instruction is HLT, so it won't generate any bus traffic. Thus, the Bus Interface Unit will remain idle.

The read/write control circuitry on the die with the flip-flops labeled. Metal and polysilicon were removed to show the underlying silicon.

The read/write control circuitry on the die with the flip-flops labeled. Metal and polysilicon were removed to show the underlying silicon.

The circuitry to control the bus cycle is complicated with many flip-flops and logic gates; the diagram above shows the flip-flops. I plan to write about the bus cycle circuitry in detail later, but for now, I'll give an extremely simplified description. Internally, there is a T0 state before T1 to provide a cycle to set up the bus operation. The bus timing states are controlled by a chain of flip-flops configured like a shift register with additional logic: the output from the T0 flip-flop is connected to the input of the T1 flip-flop and likewise with T2 and T3, forming a chain. A bus cycle is started by putting a 1 into the input of the T0 flip-flop.3 When the CPU's clock transitions, the flip-flop latches this signal, indicating the (internal) T0 bus state. On the next clock cycle, this 1 signal goes from the T0 flip-flop to the T1 flip-flop, creating the externally-visible T1 state. Likewise, the signal passes to the T2 and T3 flip-flops in sequence, creating the bus cycle.

A slightly different path is used to generate the special T1 signal that indicates a HALT. Once any bus activity is completed, the halt-not-hold signal puts a 1 into the T1 flip-flop through some gates. This generates the T1 signal, bypassing T0. Moreover, this signal does not propagate to the T2 flip-flop because it is blocked by halt-not-hold and some gates. Another flip-flop blocks this T1 cycle after the first cycle so halt-not-hold doesn't repeately trigger it. Overall, this special HALT T1 state looks like a special case that was hacked into the circuitry.

One complication is the bus hold feature. The 8086 supports complex bus configurations, where external devices may take control of the bus. For instance, peripherals may use the bus for direct memory access, bypassing the CPU. A device can request control of the bus, a "bus hold", through the 8086's HOLD pin.4 This causes the 8086 to electrically stop putting signals on the bus (i.e. a high-impedance, tri-state off state). This allows another device to use the bus until it releases HOLD.

Even when the CPU is halted, the CPU still has "ownership" of the bus and drives the bus with idle signals.5 If a device requests a bus hold when the CPU is halted, the halt-not-hold signal is blocked. When the device releases the hold, halt-not-hold is unblocked. This causes the 8086 to go through the special T1 cycle again, using the same flip-flop process described above. This lets listeners on the bus know that the CPU is still halted.

Exiting the halt state

The processor exits the halt state when it receives a reset, interrupt, or non-maskable interrupt. To implement this, an interrupt unblocks the instruction decoder by overriding the queue-unavailable signal. This causes the loader, which controls instruction decoding, to move into the First Clock state. Meanwhile, the interrupt causes the microcode address register to be loaded with the hardcoded microcode address of the appropriate interrupt routine. Thus, the microcode engine starts running the interrupt handler microcode.

The Instruction Register holds the 8-bit opcode that is currently being processed. It has a ninth bit that indicates if an interrupt is being processed. The Instruction Register (including the interrupt bit) is loaded on First Clock (described above). It outputs the instruction and interrupt bit to the Group Decode ROM one clock cycle later. The interrupt bit blocks regular instruction decoding by the Group Decode ROM. In particular, the HLT instruction will no longer be decoded, dropping the HALT signal throughout the CPU. In the Execution Unit, this reactivates the prefetch queue. This will allow instruction execution once the microcode finishes executing the interrupt handling code. In the Bus Interface Unit, dropping the HALT signal causes halt-not-hold to drop. This enables bus activity from the Bus Interface Unit.6

History of HALT and x86

Historically, computers usually had some sort of "stop" or "wait" instruction to stop execution at the end of a program. This goes back to the electromechanical Harvard Mark I (1944), EDSCAC (1949), and Univac I (1951), among other machines. Most (but not all) mainframes and minicomputers continued this approach.7

The HLT instruction in the 8086, like many other features, derives from the Datapoint 2200, and there's an interesting story behind that. The Datapoint 2200 was a desktop computer announced in 1970, and sold as a "programmable terminal". The processor of the Datapoint 2200 was implemented with a board of TTL integrated circuits, since this was before microprocessors. The Datapoint manufacturer talked to Intel and Texas Instruments about replacing the board of chips with a single processor chip. Texas Instruments produced the TMX 1795 microprocessor chip and Intel produced the 8008 shortly after,8 both copying the Datapoint 2200's architecture and instruction set. Datapoint didn't like the performance of these chips and decided to stick with a TTL-based processor. Texas Instruments couldn't find a customer for the TMX 1795 and abandoned it. Intel, on the other hand, sold the 8008 as an 8-bit microprocessor, creating the microprocessor market in the process. Intel improved the 8008 to create the popular 8080 microprocessor (1974). Zilog produced the more powerful Z80 (1976), backward-compatible with the 8080.

The Datapoint 2200. This is the later Model II with an improved TTL processor using the 74181 ALU chip.

The Datapoint 2200. This is the later Model II with an improved TTL processor using the 74181 ALU chip.

Intel started designing the iAPX 432 in 1975 to be their high-end 32-bit processor, a "micromainframe" that supported garbage collection and objects in the processor. The iAPX 432 was too complex for the time and as the schedule slipped, Intel decided to produce a stopgap 16-bit processor to compete with Zilog and Motorola: this processor became the 8086. To make it easier for Intel customers to move to the 8086, the processor was designed for compatibility with 8080 assembly language so it inherited much of the architecture and instruction set, although extended from 8 bits to 16 bits.9

The consequence of this history is that the 8086 inherited many features from the Datapoint 2200. The Datapoint 2200 used cheaper shift-register memory so it had a serial processor that operated on one bit at a time. This required the Datapoint 2200 to be little-endian, a feature that lives on in the x86 architecture. Since the Datapoint 2200 was marketed as a programmable terminal, it had parity calculation built into the hardware. Thus, the 8008 and descendants have a parity flag, in contrast to contemporary processors such as the 6800 and 6502 that omitted this moderately complex feature. The use of I/O ports instead of memory-mapped I/O is another feature of the Datapoint 2200 that persists in the x86, but was not used in the 6800 and 6502 and their descendants. The opcodes of the Datapoint 2200 were based on octal 3-bit fields for hardware reasons. The x86 instructions are still designed around octal, but the usual hexadecimal display obscures their structure. Finally, the Datapoint 2200's HALT instruction was exactly copied by the 800810 and persists in x86.

Conclusions

The HLT instruction seems like a simple function, but its implementation touches many parts of the 8086. It is implemented in logic circuitry, completely bypassing the microcode. The implementation became more complicated because of the 8086's four-step bus protocol, as well as interaction between halting and the bus hold feature. This illustrates how complexity creates more complexity, something the RISC processors of the 1980s tried to counter.

I've written multiple posts on the 8086 so far and plan to continue reverse-engineering the 8086 die so follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @[email protected]. Thanks to monocasa for suggesting this topic.

Notes and references

  1. The instructions implemented outside microcode are the segment register prefixes (ES:, CS:, SS:, DS:), the other prefixes (LOCK, REPNZ, REPZ), the simple flag instructions (CMC, CLC, STC, CLI, STI, CLD, STD), and HLT. These instructions are indicated by the 1BL (one-byte logic) output from the Group Decode ROM. 

  2. The bus cycle may also include optional Tw wait states after T3 for slow memory. The memory (or I/O device) lowers the READY pin until it is ready to proceed and the Bus Interface Unit waits. I'm ignoring Tw states in this discussion to keep things simpler. 

  3. For some reason, the T-state flip-flops all hold inverted signals, so strictly speaking a 0 bit goes through the flip-flops. 

  4. The 8086 has a separate prioritized "request/grant" way for a device to obtain a bus hold, but it doesn't change the underlying hold behavior. 

  5. During a HALT, the 8086 is not actively using the bus, but it does not release the bus either; it is still electrically driving the bus. Otherwise, the bus would float to random voltages, confusing attached memory chips or other circuitry. 

  6. When the Bus Interface Unit is unhalted due to an interrupt, you might expect it to immediately start prefetching, accessing unwanted instructions. It turns out that the prefetch circuitry does try to start prefetching and reaches the internal T0 bus state. But it then gets preempted by the interrupt handler microcode, which uses the bus to send two interrupt acknowledge cycles. Immediately after, the microcode routine suspends prefetching. Thus, prefetching doesn't run until the interrupt microcode finishes and reenables prefetching. There's a lot of tricky timing in the 8086 to make everything work. 

  7. For more history of the stop instruction, see "Computer Architecture", Blaauw and Brooks, page 349. (This the same Brooks who wrote "The Mythical Man-Month" and "No Silver Bullet".) 

  8. You might wonder how the Intel 4004 fits into this history. Although many of the same people worked on both chips, they have completely different architectures. The 8008 is not at all an 8-bit version of the 4-bit 4004. 

  9. Assembly code for the 8-bit 8080 processor couldn't run directly on the 16-bit 8086. Instead, a translation program converted the 8080 assembly language to be compatible with the 8086, making some changes in the process. The 8086 dropped some of the less-useful instructions of the 8080, replacing them with multiple instructions in the translation. For instance, the 8080 had conditional subroutine call and return instructions (inherited from the Datapoint 2200), but the 8086 dropped them. 

  10. To see that the 8008 copied the Datapoint 2200's HALT instruction, note that the Datapoint had three opcodes for HALT (00, 01, and FF), which is a bit unusual. The 8008 also has three opcodes for HLT: 00, 01, and FF. Most instructions in the 8008 used the same opcode values as the Datapoint, with a few minor changes. 

Reverse-engineering the conditional jump circuitry in the 8086 processor

Intel introduced the 8086 microprocessor in 1978 and it had a huge influence on computing. I'm reverse-engineering the 8086 by examining the circuitry on its silicon die and in this blog post I take a look at how conditional jumps are implemented. Conditional jumps are an important part of any instruction set, changing the flow of execution based on a condition. Although this instruction may seem simple, it involves many parts of the CPU: the 8086 uses microcode along with special-purpose condition logic.

The die photo below shows the 8086 microprocessor under a microscope. The metal layer on top of the chip is visible, with the silicon and polysilicon mostly hidden underneath. Around the edges of the die, bond wires connect pads to the chip's 40 external pins. I've labeled the key functional blocks; the ones that are important to this discussion are darker and will be discussed in detail below. Architecturally, the chip is partitioned into a Bus Interface Unit (BIU) at the top and an Execution Unit (EU) below. The BIU handles memory accesses, while the Execution Unit (EU) executes instructions. Most of the relevant circuitry is in the Execution Unit, such as the condition evaluation circuitry near the center, and the microcode in the lower right. But the Bus Interface Unit plays a part too, holding and modifying the program counter.

The 8086 die under a microscope, with main functional blocks labeled. This photo shows the chip's single metal layer; the polysilicon and silicon are underneath. Click on this image (or any other) for a larger version.

The 8086 die under a microscope, with main functional blocks labeled. This photo shows the chip's single metal layer; the polysilicon and silicon are underneath. Click on this image (or any other) for a larger version.

Microcode

Most people think of machine instructions as the basic steps that a computer performs. However, many processors (including the 8086) have another layer of software underneath: microcode. One of the hardest parts of computer design is creating the control logic that directs the processor for each step of an instruction. The straightforward approach is to build a circuit from flip-flops and gates that moves through the various steps and generates the control signals. However, this circuitry is complicated, error-prone, and hard to design.

The alternative is microcode: instead of building the control circuitry from complex logic gates, the control logic is largely replaced with code. To execute a machine instruction, the computer internally executes several simpler micro-instructions, specified by the microcode. In other words, microcode forms another layer between the machine instructions and the hardware. The main advantage of microcode is that it turns design of control circuitry into a programming task instead of a difficult logic design task.

The 8086 uses a hybrid approach: although the 8086 uses microcode, much of the instruction functionality is implemented with gate logic. This approach removed duplication from the microcode and kept the microcode small enough for 1978 technology. In a sense, the microcode is parameterized. For instance, the microcode can specify a generic Arithmetic/Logic Unit (ALU) operation, and the gate logic determines from the instruction which ALU (Arithmetic/Logic Unit) operation to perform. More relevant to this blog post, the microcode can specify a generic conditional test and the gate logic determines which condition to use. Although this made the 8086's gate logic more complicated, the tradeoff was worthwhile.

Microcode for conditional jumps

The 8086 processor has six status flags: carry, parity, auxiliary carry, zero, sign, and overflow.1 These flags are updated by arithmetic and logic operations based on the result. The 8086 has sixteen different conditional jump instructions2 that test status flags and jump if conditions are satisfied, such as zero, less than, or odd parity. These instructions are very important since they permit if statements, loops, comparisons, and so forth.

In machine language, a conditional jump opcode is followed by a signed offset byte which specifies a location relative to the current program counter, from 127 bytes ahead to 128 bytes back. This is a fairly small range, but the benefit is that the offset fits in a single byte, reducing the code size.3 For typical applications such as loops or conditional code, jumps usually stay in the same neighborhood of code, so the tradeoff is worthwhile.

The 8086's microcode was disassembled by Andrew Jenner (link) from my die photos, so we can see exactly what micro-instructions the 8086 is running for each machine instruction. The microcode below implements conditional jumps. In brief, the conditional jump code (Jcond) gets the branch offset byte. It tests the appropriate condition and, if satisfied, jumps to the relative jump microcode (RELJUMP). The RELJMP code adds the offset to the program counter. In either case, the microcode routine ends when it runs the next instruction (RNI).

   move       action
Jcond:
1 Q→tmpBL
2          XC    RELJMP                    
3          RNI                       

RELJMP:
4          SUSP
5          CORR                      
6 PC→tmpA  ADD   tmpA
7 Σ→PC     FLUSH RNI                       

In more detail, micro-instruction 1 (arbitrary numbering) moves a byte from the prefetch queue (Q) across the queue bus to the ALU's temporary B register.4 (Arguments for ALU operations are first stored in temporary registers, invisible to the programmer.) Instruction 2 tests the appropriate condition with XC, and jumps to the RELJMP routine if the condition is satisfied.5 Otherwise, RNI (Run Next Instruction) ends this sequence and loads the next machine instruction without jumping.

If the condition is satisfied, the relative jump routine starts with instruction 4, which suspends prefetching.6 Instruction 5 corrects the program counter value, since it normally points to the next byte to prefetch, not the next byte to execute. Instruction 6 moves the corrected program counter address to the ALU's temporary A register. It also starts an ALU operation to add temporary A and temporary B. Instruction 7 moves the sum (Σ) to the program counter. It flushes the prefetch queue, which starts up prefetching from the new PC value. Finally, RNI runs the next instruction, from the updated address.

This code supports all 16 conditional jumps because the microcode tests the generic "XC" condition. This indicates that the specific test depends on the four low bits of the opcode, and the hardware determines exactly what to test. It's important to keep the two levels straight: the machine instruction is doing a conditional jump to a different memory address, while the microcode that implements this instruction is performing a conditional jump to a different micro-address.

The timing for conditional jumps

The RNI (Run Next Instruction) micro-operation initiates processing of the next machine instruction. However, it takes a clock cycle to get the next instruction from the prefetch queue, decode it, and start the appropriate micro-instruction. This causes a wasted clock cycle before the next micro-instruction executes. To avoid this delay, most microcode routines issue a NXT micro-operation one cycle before they end. This gives the 8086 time to decode the next machine instruction so micro-instructions can run uninterrupted.

Unfortunately, the conditional jump instructions can't take advantage of NXT. The problem is that the control flow in the microcode depends on whether the conditional jump is taken or not. By the time the microcode knows it is not taking the branch, it's too late to issue NXT.

The datasheet gives the timing of a conditional jump as 4 clock cycles if the jump is not taken, and 8 clock cycles if the jump is taken. Looking at the microcode explains these timings. There are 3 micro-instructions executed if the jump is not taken, and 7 if it is taken. Because of the RNI, there is one wasted clock cycle, resulting in the documented 4 or 8 cycles in total.

The conditions

At this point I will review the 8086's conditional jumps. The 8086 implements 16 conditional jumps. (This is a large number compared to earlier CPUs: the 8080, 6502, and Z80 all had 8 conditional jumps, specified by 3 bits.) The table below shows which flags are tested for each condition, specified by the low four bits of the opcode. Some jump instructions have multiple names depending on the programmer's interpretation, but they map to the same machine instruction.7

ConditionBitsCondition trueCondition false
Overflow Flag (OF)=1000xoverflow (JO)not overflow (JNO)
Carry Flag (CF)=1001xcarry (JC)
below (JB)
not above or equal (JNAE)
not carry (JNC)
not below (JNB)
above or equal (JAE)
Zero Flag (ZF)=1010xzero (JZ)
equal (JE)
not zero (JNZ)
not equal (JNE)
CF=1 or ZF=1011xbelow or equal (JBE)
not above (JNA)
not below or equal (JNBE)
above (JA)
Sign Flag (SF)=1100xsign (JS)not sign (JNS)
Parity Flag (PF)=1101xparity (JP)
parity even (JPE)
not parity (JNP)
parity odd (JPO)
SF ≠ OF110xless (JL)
not greater or equal (JNGE)
not less (JNL)
greater or equal (JGE)
ZF=1 or SF ≠ OF111xless or equal (JLE)
not greater (JNG)
not less or equal (JNLE)
greater (JG)

From the hardware perspective, the important thing is that there are eight different condition flag tests. Each test has two jump instructions associated with it: one that jumps if the condition is true, and one that jumps if the condition is false. The low bit of the opcode selects "if true" or "if false".

The image below shows the condition evaluation circuitry as it appears on the die. There isn't much structure to it; it's just a bunch of gates. This image shows the doped silicon regions that form transistors. The numerous small polygons with a circle inside are connections between the metal layer and the polysilicon layer. Many of these connections use the silicon layer to optimize the layout.

The circuitry to compute conditions as it appears on the die. The metal and polysilicon layers have been removed for this image, showing the silicon underneath.

The circuitry to compute conditions as it appears on the die. The metal and polysilicon layers have been removed for this image, showing the silicon underneath.

This circuitry evaluates each condition by getting the instruction bits from the Instruction Register, checking the bits to match each condition, and testing if the condition is satisfied. For instance, the overflow condition (with instruction bits 000x) is computed by a NOR gate: NOR(IR3, IR2, IR1, OF'), which will be true if instruction register bits 3, 2, and 1 are zero and the Overflow Flag is 1.

The results from the individual condition tests are combined with a 7-input NOR gate, producing a result that is 0 if the specified 3-bit condition is satisfied. Finally, the "if true" and "if false" cases are handled by flipping this signal depending on the low bit of the instruction. This final result indicates if the 4-bit condition in the instruction is satisfied, and this signal is passed on to the microcode control circuitry.

One unexpected feature of the implementation is that a 7-input NOR gate combines the various conditions to test if the selected condition is satisfied. You'd expect that with eight conditions, there would be eight inputs to the NOR gate. However, there is a clever optimization that takes advantage of conditions that are combinations of clauses, for example, "less or equal". Specifically, the zero flag is tested for bit pattern 01xx (where x indicates a 0 or 1), which covers two conditions with one gate. Likewise, SF≠OF is tested for bit pattern 11xx and CF=1 is tested for bit pattern 0x1x. With these optimizations, the eight conditions are covered with seven checks. (This shows that the opcodes weren't assigned arbitrarily: the bit patterns needed to be carefully assigned for this to work.)

Back to the microcode

Before explaining how the microcode jump circuitry works, I'll briefly discuss the microcode format. A micro-instruction is encoded into 21 bits as shown below. Every micro-instruction contains a move from a source register to a destination register, each specified with 5 bits. The meaning of the remaining bits is a bit tricky since it depends on the type field, which is two or three bits long. The "short jump" (type 0) is a conditional jump within the current block of 16 micro-instructions. The ALU operation (type 1) sets up the arithmetic-logic unit to perform an operation. Bookkeeping operations (type 4) are anything from flushing the prefetch queue to ending the current instruction. A memory read or write is type 6. A "long jump" (type 5) is a conditional jump to any of 16 fixed microcode locations (specified in an external table). Finally, a "long call" (type 7) is a conditional subroutine call to one of 16 locations (different from the jump targets).

The encoding of a micro-instruction into 21 bits. Based on NEC v. Intel: Will Hardware Be Drawn into the Black Hole of Copyright?

The encoding of a micro-instruction into 21 bits. Based on NEC v. Intel: Will Hardware Be Drawn into the Black Hole of Copyright?

I'm going to focus on the XC RELJMP micro-instruction that we saw in the microcode earlier. This is a "long jump" with XC as the condition and RELJMP as the target tag. Another layer of hardware is required to implement the microcode conditions. The microcode supports 16 conditions, which are completely different from the 16 programmer-level conditions.8 Some microcode conditions test special-purpose internal flags, while others test conditions such as an interrupt, the chip's TEST pin, bit 3 of the opcode, or if the instruction has a one-byte address offset. The XC condition is one of these 16 conditions, number 15 specifically.

The conditions are evaluated by the condition PLA (Programmable Logic Array, a grid of gates), shown below. The four condition bits from the micro-instruction, along with their complements, are fed into the columns. The PLA has 16 rows, one for each condition. Each row is a NOR gate matching one bit combination (i.e. selecting a condition) and the corresponding signal value to test.9 Thus, if a particular condition is specified and is satisfied, that row will be 1. The 16 row outputs are combined by the 16-input NOR gate at the left. Thus, if the specified condition is satisfied, this output will be 0, and if the condition is unsatisfied, the output will be 1. This signal controls the jump or call micro-instruction: if the condition is satisfied, the new micro-address is loaded into the microcode address register. If the condition is not satisfied, the microcode proceeds sequentially.

The condition PLA evaluates microcode conditionals.

The condition PLA evaluates microcode conditionals.

Conclusions

To summarize, the 8086 processor implements 16 conditional jump instructions. One piece of microcode efficiently implements all 16 instructions, with gate logic determining which flags to test, depending on bits in the machine instruction. The result of this test is used by the microcode XC conditional jump, one of 16 completely different microcode-level conditions. If the XC condition is satisfied, the program counter is updated by adding the offset, jumping to the new location.

Conditional jumps are relatively straightforward instructions from the programmer's perspective, but they interact with most parts of the 8086 processor including the prefetch queue, the address adder, the ALU, microcode, and the Translation ROM. The diagram below shows the interactions for each step of the jump.

The conditional jump involves many parts of the die, shown in this diagram.

The conditional jump involves many parts of the die, shown in this diagram.

I've written multiple posts on the 8086 so far and plan to continue reverse-engineering the 8086 die so follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @[email protected].

Notes and references

  1. In addition to the six status flags, the 8086 has three control flags: trap, direction, and interrupt enable. These flags aren't tested by conditional branches so I won't discuss them further. 

  2. Strictly speaking, the 8086 has a few more conditional jumps. The JCXZ instruction tests if the CX register is zero. The LOOP, LOOPNZ, and LOOPZ instructions decrement the CX register and loop if it is nonzero. The last two only loop if the zero flag indicates nonzero or zero, respectively. I'm ignoring these instructions in the blog post. 

  3. Although a conditional jump only supports a small range, it's still possible to conditionally jump to a distant location by using two instructions. A conditional jump with the opposite condition can skip over a longer unconditional jump instruction. The 80386 removed this restriction by providing long-displacement conditional jumps, which could perform a 16-bit or 32-bit relative jump. 

  4. The relative offset byte is sign-extended when it is moved to the temporary B register. That is, if the top bit is high, the high byte is set to all 1's to produce a 16-bit negative value. 

  5. The details of how the microcode jumps to the RELJMP routine are interesting, but a bit of a tangent, so I've put this discussion in a footnote. For long jumps (and long calls) in microcode, the target micro-addresses are stored in the Translation ROM, and the 4-bit target tag indexes into this ROM. The motivation for this structure is that micro-addresses are 13 bits, which is a lot of bits to try to fit into a 21-bit micro-instruction. Using a 4-bit tag keeps the microcode compact, but at the cost of requiring a small ROM in the 8086.

    The translation ROM on the die.

    The translation ROM on the die.

    Above is a view of the Translation ROM, with the RELJMP entry highlighted. The left half decodes tags, while the right half provides the corresponding microcode address. The row for RELJMP is highlighted. 

  6. Much of this microcode snippet deals with the prefetch queue. To increase efficiency, the 8086 processor fetches instructions from memory before they are needed and stores them in a 6-byte prefetch queue. In most processors, the program counter points to the memory address of the next instruction to execute. However, in the 8086, the program counter advances during prefetching, so it points to the memory address of the next instruction to fetch. This discrepancy is invisible to the programmer, but the microcode needs to handle it.

    First, the microcode issues a SUSP micro-operation to suspend prefetching. This ensures that the program counter will not be changed due to more prefetching. Next, the CORR micro-operation corrects the program counter to point to the next address to execute. This correction is performed by subtracting the number of unused bytes in the prefetch queue. You might expect this correction to be performed by the Arithmetic/Logic Unit (ALU). However, the 8086 has a separate adder that is used for memory address computations: each memory access in the 8086 requires a segment register base address to be added to an offset address. This address adder is also used for program counter correction. The constant ROM holds the values -1 through -6, the appropriate constant is selected based on the number of bytes in the prefetch queue, and this constant is added to the program counter. (Interestingly, the address adder is used for program counter correction, while the ALU is used to modify the program counter for the relative jump computation.)

    The address adder has multiple uses. It is also used for updating the program counter during prefetching. It updates addresses when performing block copy operations. Finally, it updates addresses when performing an unaligned word operation. The constant ROM holds constants for these operations.

    At the end of the microcode sequence, the FLUSH micro-operation flushes the stale bytes from the prefetch queue, resets the prefetch queue pointers, and restarts prefetching. I wrote about prefetching in detail here

  7. Often, the compare (CMP) instruction will be executed to compare two numbers by subtracting and discarding the result but keeping the condition codes. One complication is that some tests make sense for signed numbers, while other tests make sense for unsigned numbers. Specifically, "greater", "greater or equal", "less", and "less or equal" make sense for signed comparisons. On the other hand, "above", "above or equal", "below", and "below or equal" make sense for unsigned comparisons.

    The 8086 supports both signed and unsigned numbers. The arithmetic operations are the same for both; it's just the programmer's interpretation that differs. For instance, consider adding hex numbers 0xfe and 0x01. Treating them as unsigned numbers, the sum is 254 + 1 = 255. But as signed numbers, -2 + 1 = -1. In either case, the processor computes the same result, 0xff, but the interpretation is different.

    The signed vs unsigned distinction matters for comparisons. For instance, as unsigned numbers, 0xfe (254) is above 0x01 (1). But as signed numbers, 0xfe (-2) is less than 0x01 (1). This is why different instructions are used to compare unsigned versus signed numbers.

    Another important factor is that the carry flag indicates an unsigned result is too large for its byte (or word), while the overflow flag indicates that a signed result is too large for its byte (or word). For instance, adding unsigned bytes 0xff (255) and 0x02 (2) yields 0x01 (1) and a carry, indicating the result is too big for a byte. However, as signed bytes this corresponds to -1 + 2 = 1, which fits in a byte, so the overflow flag is not set. Conversely, 0x7f + 0x01 = 0x80. As unsigned bytes, this corresponds to 127 + 1 = 128 which is fine. But as signed bytes, this corresponds to 127 + 1, which unexpectedly yields -128 due to overflow. Thus, the carry flag is not set, but the overflow flag is set in this case. 

  8. Short jumps have four bits to specify the condition, so they can access 16 conditions. For long jumps and long calls, one bit is "stolen" from the condition to indicate the type, so they can only access eight of the conditions. Thus, the conditions need to be assigned carefully so the necessary ones are available. 

  9. PLAs are typically uniform grids, but the grid pattern breaks down a bit in the condition PLA. The reason is that each test uses a separate signal, so there is a different signal into each row (unlike a typical PLA where each row receives the same signals). Moreover, some of the test signals are processed at the left, distorting the 16-input NOR gate. This illustrates the degree of layout optimization in the 8086, squeezing transistors in to save a bit of space. 

Inside the Globus INK: a mechanical navigation computer for Soviet spaceflight

The Soviet space program used completely different controls and instruments from American spacecraft. One of the most interesting navigation instruments onboard Soyuz spacecraft was the Globus, which used a rotating globe to indicate the spacecraft's position above the Earth. This navigation instrument was an electromechanical analog computer that used an elaborate system of gears, cams, and differentials to compute the spacecraft's position. Officially, the unit was called a "space navigation indicator" with the Russian acronym ИНК (INK),1 but I'll use the more descriptive nickname "Globus".

The INK-2S "Globus" space navigation indicator. Coincidentally, the latitude indicator matches the Ukrainian flag.

The INK-2S "Globus" space navigation indicator. Coincidentally, the latitude indicator matches the Ukrainian flag.

We recently received a Globus from a collector and opened it up for repair and reverse engineering. In this blog post, I explain how it operated, show its internal mechanisms, and describe what I've learned so far from reverse engineering. The photo below gives an idea of the mechanical complexity of this device, which also has a few relays, solenoids, and other electrical components.

Side view of the Globus INK. Click this (or any other image) for a larger version.

Side view of the Globus INK. Click this (or any other image) for a larger version.

Functionality

The primary purpose of the Globus was to indicate the spacecraft's position. The globe rotated while fixed crosshairs on the plastic dome indicated the spacecraft's position. Thus, the globe matched the cosmonauts' view of the Earth, allowing them to confirm their location. Latitude and longitude dials next to the globe provided a numerical indication of location. Meanwhile, a light/shadow dial at the bottom showed when the spacecraft would be illuminated by the sun or in shadow, important information for docking. The Globus also had an orbit counter, indicating the number of orbits.

The Globus had a second mode, indicating where the spacecraft would land if they fired the retrorockets to initiate a landing. Flipping a switch caused the globe to rotate until the landing position was under the crosshairs and the cosmonauts could evaluate the suitability of this landing site.

The cosmonauts configured the Globus by turning knobs to set the spacecraft's initial position and orbital period. From there, the Globus electromechanically tracked the orbit. Unlike the Apollo Guidance Computer, the Globus did not receive navigational information from an inertial measurement unit (IMU) or other sources, so it did not know the spacecraft's real position. It was purely a display of the predicted position.

A close-up of the complex gear trains in the Globus.

A close-up of the complex gear trains in the Globus.

The globe

The globe itself is detailed for its small size, showing terrain features such as mountains, lakes, and rivers. These features on the map helped cosmonauts compare their position with the geographic features they could see on Earth. These features were also important for selecting a landing site, so they could see what kind of terrain they would be landing on. For the most part, the map doesn't show political boundaries, except for thick red and purple lines. This line shows the borders of the USSR, as well as the boundaries between communist and non-communist countries, also important for selecting a landing site. The globe also has numbered circles 1 through 8 that indicate radio sites for communication with the spacecraft, allowing the cosmonauts to determine what ground stations could be contacted.

A view of the globe showing Asia.

A view of the globe showing Asia.

Controlling the globe

On seeing the Globus, one might wonder how the globe is rotated. It may seem that the globe must be free-floating so it can rotate in two axes. Instead, a clever mechanism attaches the globe to the unit. The key is that the globe's equator is a solid piece of metal that rotates around the horizontal axis of the unit. A second gear mechanism inside the globe rotates the globe around the North-South axis. The two rotations are controlled by concentric shafts that are fixed to the unit, allowing two rotational degrees of freedom through fixed shafts.

The photo below shows the frame that holds and controls the globe. The dotted axis is fixed horizontally in the unit and rotations are fed through the two gears at the left. One gear rotates the globe and frame around the dotted axis, while the gear train causes the globe to rotate around the vertical polar axis (while the equator remains fixed).

The axis of the globe is at 51.8° to support that orbital inclination.

The axis of the globe is at 51.8° to support that orbital inclination.

The angle above is 51.8° which is very important: this is the inclination of the standard Soyuz orbit. As a result, simply rotating the globe around the dotted line causes the crosshair to trace the standard orbit.2 Rotating the two halves of the globe around the poles yields the different 51.8° orbits over the Earth's surface as the Earth rotates. (Why 51.8 degrees? The Baikonur Cosmodrome, launching point for Soyuz, is at 45.97° N latitude, so 45.97° would be the most efficient inclination. However, to prevent the launch from passing over western China, the rocket must be angled towards the north, resulting in 51.8° (details).)

One important consequence of this design is that the orbital inclination is fixed by the angle of the globe mechanism. Different Globus units needed to be built for different orbits. Moreover, this design only handles circular orbits, making it useless during orbit changes such as rendezvous and docking. These were such significant limitations that some cosmonauts wanted the Globus removed from the control panel, but it remained until it was replaced by a computer display in Soyuz-TMA (2002).3

A closeup of the gears that drive the motion of the two halves of the globe around the polar axis, leaving the equator fixed.

A closeup of the gears that drive the motion of the two halves of the globe around the polar axis, leaving the equator fixed.

This Globus had clearly suffered some damage. The back of the case had some large dents.7 More importantly, the globe's shaft had been knocked loose from its proper position and no longer meshed with the gears. This also put a gouge into Africa, where the globe hit internal components. Fortunately, CuriousMarc was able to get the globe back into position while ensuring that the gears had the right timing. (Putting the globe back arbitrarily would mess up the latitude and longitude.)

Orbital speed and the "cone"

An orbit of Soyuz takes approximately 90 minutes, but the time varies according to altitude.4 The Globus has an adjustment knob (below) to adjust the orbital period in minutes, tenths of minutes, and hundredths of minutes. The outer knob has three positions and points to the digit that changes when the inner knob is turned. The mechanism provides an adjustment of ±5 minutes from the nominal period of 91.85 minutes.3

The control to adjust the orbital period.

The control to adjust the orbital period.

The orbital speed feature is implemented by increasing or decreasing the speed at which the globe rotates around the orbital (horizontal) axis. Generating a variable speed is tricky, since the Globus runs on fixed 1-hertz pulses. The solution is to start with a base speed, and then add three increments: one for the minutes setting, one for the tenths-of-minutes setting, and one for the hundredths-of-minutes setting.5 These four speeds are added (as shaft rotation speeds) using obtain the overall rotation speed.

The Globus uses numerous differential gears to add or subtract rotations. The photo below shows two sets of differential gears, side-by-side.

Two differential gears in the Globus.

Two differential gears in the Globus.

The problem is how to generate these three variable rotation speeds from the fixed input. The solution is a special cam, shaped like a cone with a spiral cross-section. Three followers ride on the cam, so as the cam rotates, the follower is pushed outward and rotates on its shaft. If the follower is near the narrow part of the cam, it moves over a small distance and has a small rotation. But if the follower is near the wide part of the cam, it moves a larger distance and has a larger rotation. Thus, by moving the follower to a particular point on the cam, the rotational speed of the follower is selected.

A diagram showing the orbital speed control mechanism. The cone has three followers, but only two are visible from this angle. The "transmission" gears are moved in and out by the outer knob to select which follower is adjusted by the inner knob.

A diagram showing the orbital speed control mechanism. The cone has three followers, but only two are visible from this angle. The "transmission" gears are moved in and out by the outer knob to select which follower is adjusted by the inner knob.

Obviously, the cam can't spiral out forever. Instead, at the end of one revolution, its cross-section drops back sharply to the starting diameter. This causes the follower to snap back to its original position. To prevent this from jerking the globe backward, the follower is connected to the differential gearing via a slip clutch and ratchet. Thus, when the follower snaps back, the ratchet holds the drive shaft stationary. The drive shaft then continues its rotation as the follower starts cycling out again. Thus, the output is a (mostly) smooth rotation at a speed that depends on the position of the follower.

Latitude and longitude

The indicators at the left and the top of the globe indicate the spacecraft's latitude and longitude respectively. These are defined by surprisingly complex functions, generated by the orbit's projection onto the globe.6

The latitude and longitude functions are implemented through the shape of metal cams; the photo below shows the longitude mechanism. Each function has two cams: one cam implements the desired function, while the other cam has the "opposite" shape to maintain tension on the jaw-like tracking mechanism.

The cam mechanism to compute longitude.

The cam mechanism to compute longitude.

The latitude cam drives the latitude dial, causing it to oscillate between 51.8° N and 51.8° S. Longitude is more complicated because the Earth's rotation causes it to constantly vary. The longitude output on the dial is produced by adding the cam's value to the Earth's rotation through a differential gear.

Light and shadow

The Globus has an indicator to show when the spacecraft will enter light or shadow. The dial consists of two concentric dials, configured by the two knobs. These dials move with the spacecraft's orbit, while the red legend remains fixed. I think these dials are geared to the longitude dial, but I'm still investigating.

The light and shadow indicator is controlled by two knobs.

The light and shadow indicator is controlled by two knobs.

The landing location mechanism

The Globus can display where the spacecraft would land if you started a re-entry burn now, with an accuracy of 150 km. This is computed by projecting the current orbit forward by a partial orbit, depending on how long it would take to land. The cosmonaut specifies this value by the "landing angle", which indicates this fraction of an orbit as an angle. An electroluminescent indicator in the upper-left corner of the unit shows "Место посадки" (Landing place) to indicate this mode.

The landing angle control.

The landing angle control.

To obtain the landing position, a motor spins the globe until it is stopped after rotating through the specified angle. The mechanism to implement this is shown below. The adjustment knob on the panel turns the adjustment shaft which moves the limit switch to the desired angle via the worm gear. The wiring is wrapped around a wheel so the wiring stays controlled during this movement. When the drive motor is activated, it rotates the globe and the swing arm at the same time. Since the motor stops when the swing arm hits the angle limit switch, the globe rotates through the desired angle. The fixed limit switch is used when returning the globe's position to its regular, orbital position.

The landing angle function uses a complex mechanism.

The landing angle function uses a complex mechanism.

The landing location mode is activated by a three-position rotary switch. The first position "МП" (место посадки, landing site) selects the landing site, the second position "З" (Земля, Earth) shows the position over the Earth, and the third position "Откл" (off) undoes the landing angle rotation and turns off the mechanism.

The rotary switch to select the landing angle mode.

The rotary switch to select the landing angle mode.

Electronics

Although the Globus is mostly mechanical, it has an electronics board with four relays and a transistor, as well as resistors and diodes. I think that most of these relays control the landing location mechanism, driving the motor forward or backward and stopping at the limit switch. The diodes are flyback diodes, two diodes in series across each relay coil to eliminate the inductive kick when the coil is disconnected.

The electronics circuit board.

The electronics circuit board.

A 360° potentiometer (below) converts the spacecraft's orbital position into a voltage. Sources indicate that the Globus provides this voltage signal to other units on the spacecraft. My theory is that the transistor on the electronics board amplifies this voltage, but I am still investigating.

The potentiometer converts the orbital position into a voltage.
To the right is the cam that produces the longitude display. Antarctica is visible on the globe.

The potentiometer converts the orbital position into a voltage. To the right is the cam that produces the longitude display. Antarctica is visible on the globe.

The photo below shows the multiple wiring bundles in the Globus, at the front and the left. The electronics board is at the front right. The Globus contains a surprising amount of wiring for a device that is mostly mechanical. Inconveniently, all the wires to the box's external connector (upper left) were cut.7 Perhaps this was part of decommissioning the unit. However, one of the screws on the case is covered with a tamper-resistant wax seal with insignia, and this wax seal was intact. This indicates that the unit was officially re-sealed after cutting the wires, which doesn't make sense for a decommissioned unit.

This view shows the back and underside of the Globus. The round connector at the back left provided the interface with the rest of the spacecraft. The black wires under this connector were all cut.

This view shows the back and underside of the Globus. The round connector at the back left provided the interface with the rest of the spacecraft. The black wires under this connector were all cut.

The drive solenoids

The unit is driven by two ratchet solenoids: one for the orbital rotation and one for the Earth's rotation. These solenoids take 27-volt pulses at 1 hertz.3 Each pulse causes the solenoid to advance the gear by one tooth; a pawl keeps the gear from slipping back. These small rotations drive the gears throughout the Globus and result in a tiny movement of the globe.

One of the driving solenoids in the Globus. The wheels to indicate orbital time are underneath.

One of the driving solenoids in the Globus. The wheels to indicate orbital time are underneath.

The other driving solenoid in the Globus.

The other driving solenoid in the Globus.

Apollo-Soyuz

If you look closely at the globe, it has a bunch of pink dots added, along with three-letter labels in Latin (not Cyrillic) characters.8 In the photo below, you can see GDS (Goldstone), MIL (Merritt Island), BDA (Bermuda), and NFL (Newfoundland). These are NASA tracking sites, which implies that this Globus was built for the Apollo-Soyuz Test Project, a 1975 mission where an Apollo spacecraft docked with a Soyuz capsule.

North America as it appears on the globe. The US border is marked in red. The selection of cities seems a bit random, such as El Paso as the only western city until the coast.

North America as it appears on the globe. The US border is marked in red. The selection of cities seems a bit random, such as El Paso as the only western city until the coast.

Further confirmation of the Apollo-Soyuz connection is the VAN sticker in the middle of the Pacific Ocean (not visible above). The USNS Vanguard was a NASA tracking ship that was used in the Apollo program to fill in gaps in radio coverage. It was an oil tanker from World War II, converted postwar to a missile tracking ship and then used for Apollo. In the photo below, you can see the large tracking antennas on its deck. During the Apollo-Soyuz mission, Vanguard was stationed at 25 S 155 W for the Apollo-Soyuz mission, exactly matching the location of the VAN dot on the globe.

The USNS Vanguard with a NASA C-54 plane overhead. (source).

The USNS Vanguard with a NASA C-54 plane overhead. (source).

History

The Globus has a long history, back to the beginnings of Soviet crewed spaceflight. The first version was simpler and had the Russian acronym ИМП (IMP).9 Development of the IMP started in 1960 for the Vostok (1961) and Voshod (1964) spaceflights.

The Globus IMP. Photo from Francoisguay (CC BY-SA 3.0).

The Globus IMP. Photo from Francoisguay (CC BY-SA 3.0).

The basic functions of the earlier Globus IMP are similar to the INK, showing the spacecraft's position and the landing position. It has an orbit counter in the lower right. The latitude and longitude displays at the top were added for the Voshod flights. The large correction knob allows the orbital period to be adjusted. The main differences are that the IMP doesn't have a display at the bottom for sun and shade and doesn't have a control to set the landing angle.9 Unlike the INK, the mode (orbit vs landing position) was selected by external switches, rather than a switch on the unit.

The more complex INK model (described in this blog post) was created for the Soyuz flights, starting in 1967. It was part of the "Sirius" information display system (IDS). The Neptun IDS used on Soyuz-T (1976) and the Neptun-M for Soyuz-TM (1986) modernized much of the console but kept the Globus INK. The photo below shows the Globus mounted in the upper-right of a Soyuz-TM console.

The Neptun-M IDS for the Soyuz-TM (source).

The Neptun-M IDS for the Soyuz-TM (source).

The Soyuz-TMA (2002) upgraded to the Neptun-ME system3 which used digital display screens. In particular, the Globus was replaced with the graphical display below.

A computer display from the Neptune-ME display system used in the Soyuz-TMA spaceship. The Soyuz consoles are much simpler than the Apollo or Space Shuttle consoles, and built with completely different design principles. From Information Display Systems for Soyuz Spaceships.

A computer display from the Neptune-ME display system used in the Soyuz-TMA spaceship. The Soyuz consoles are much simpler than the Apollo or Space Shuttle consoles, and built with completely different design principles. From Information Display Systems for Soyuz Spaceships.

Conclusions

The Globus INK is a remarkable piece of machinery, an analog computer that calculates orbits through an intricate system of gears, cams, and differentials. It provided cosmonauts with a high-resolution, full-color display of the spacecraft's position, way beyond what an electronic space computer could provide in the 1960s.

Although the Globus is an amazing piece of mechanical computation, its functionality is limited. Its parameters must be manually configured: the spacecraft's starting position, the orbital speed, the light/shadow regions, and the landing angle. It doesn't take any external guidance inputs, such as an IMU (inertial measurement unit), so it's not particularly accurate. Finally, it only supports a circular orbit at a fixed angle. While the more modern digital display lacks the physical charm of a rotating globe, the digital solution provides much more capability.

I plan to continue reverse-engineering the Globus and hope to get it operational, so follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @[email protected]. Many thanks to Marcel for providing the Globus. Thanks to Stack Overflow for orbit information and my Twitter followers for translation assistance.

I should give a disclaimer that I am still reverse-engineering the Globus, so what I described is subject to change. Also, I don't read Russian, so any errors are the fault of Google Translate. :-)

With the case removed, the complex internals of the Globus are visible.

With the case removed, the complex internals of the Globus are visible.

Notes and references

  1. In Russian, the name for the device is "Индикатор Навигационный Космический" abbreviated as ИНК (INK). This translates to "space navigation indicator." The name Globus (Глобус) seems to be a nickname, and I suspect it's more commonly used in English than Russian. 

  2. To see how the angle between the poles and the globe's rotation results in the desired orbital inclination, consider two limit cases. First, suppose the angle between is 90°. In this case, the globe is "straight" with the equator horizontal. Rotating the globe along the horizontal axis, flipping the poles end-over-end, will cause the crosshair to trace a polar orbit, giving the expected inclination of 90°. On the other hand, suppose the angle is 0°. In this case, the globe is "sideways" with the equator vertical. Rotating the globe will cause the crosshair to remain over the equator, corresponding to an equatorial orbit with 0° inclination. 

  3. A detailed description of Globus in Russian is in this document, in Section 5. 

  4. Or conversely, the altitude varies according to the speed. 

  5. Note that panel control adjusts the period of the orbit, while the implementation adjusts the speed of the orbit. These are reciprocals, so linear changes in the period result in hyperbolic changes in the speed. The mechanism, however, changes the speed linearly, which seems like it wouldn't work. However, since the period is large relative to the change in the period, this linear approximation works and the error is small, about 1%. It's possible that the cone has a nonlinear shape to correct this, but I couldn't detect any nonlinearity in photographs. 

  6. The latitude is given by arcsin(sin i * sin (2πt/T)), while the longitude is given by λ = arctan (cos i * tan(2πt/T)) + Ωt + λ0, where t is the spaceship's flight time starting at the equator, i is the angle of inclination (51.8°), T is the orbital period, Ω is the angular velocity of the Earth's rotation, and λ0 is the longitude of the ascending node.3

    The formula for latitude is simpler than longitude because the latitude repeats every orbit. The longitude, however, continually changes as the Earth rotates under the spacecraft. 

  7. The back of the Globus has a 32-pin connector, a standard RS32TV Soviet military design. The case also has some dents visible; the dents were much larger before CuriousMarc smoothed them out.

    The back of the Globus.

    The back of the Globus.

     

  8. The NASA tracking sites marked with dots are CYI (Grand Canary Island), ACN (Ascension), MAD (Madrid, Spain), TAN (Tananarive, Madagascar), GWM (Guam), ORR (Orroral, Australia), HAW (Hawaii), GDS (Goldstone, California), MIL (Merritt Island, Florida), QUI (Quito, Ecuador), AGO (Santiago, Chile), BDA (Bermuda), NFL (Newfoundland, Canada), and VAN (Vanguard tracking ship). Most of these sites were part of the Spacecraft Tracking and Data Network. The numbers 1-7 are apparently USSR communication sites, although I'm puzzled by 8 in Nova Scotia and 9 in Honduras. 

  9. Details on the earlier Globus IMP are at this site, including a discussion of the four different versions IMP-1 through IMP-4. Wikipedia also has information. 

Counting the transistors in the 8086 processor: it's harder than you might think

How many transistors are in Intel's 8086 processor? This seems like a straightforward question, but it doesn't have a straightforward answer. Most sources say that this processor has 29,000 transistors.1 However, I have traced out every transistor from die photos and my count is 19,618. What accounts for the 9382 missing transistors?

The explanation is that when manufacturers report the transistor count of chips, typically often report "potential" transistors. Chips that include a ROM will have different numbers of transistors depending on the values stored in the ROM. Since marketing doesn't want to publish varying numbers depending on the number of 1 bits and 0 bits, they often count ROM sites: places that could have transistors, but might not. A PLA (Programmable Logic Array) has similar issues; the transistor count depends on the desired logic functions.

What are these potential transistor sites? ROMs are typically constructed as a grid of cells, with a transistor at a cell for a 1 bit, and no transistor for a 0 bit.2 In the 8086, transistors are created or not through the pattern of silicon doping. The photo below shows a closeup of the silicon layer for part of the 8086's microcode ROM. The empty regions are undoped silicon, while the other regions are doped silicon. Transistor gates are formed where vertical polysilicon lines (removed for the photo) passed over the doped silicon. Thus, depending on the data encoded into the ROM during manufacturing, the number of transistors varies.

A closeup of part of the microcode ROM. The dark circles indicate vias between the silicon and the metal on top.

A closeup of part of the microcode ROM. The dark circles indicate vias between the silicon and the metal on top.

The diagram below provides more detail, showing the microcode ROM up close. Green T's indicate transistors, while red X's indicate positions with no transistor. As you can see, the potential transistor positions form a grid, but only some of the positions are occupied by transistors. The common method for counting transistors counts all the potential positions (18 below) rather than the actual transistors that are implemented (12 below).

An extreme closeup of the microcode ROM. Green T's indicate transistors, while red X's indicate positions with no transistor.

An extreme closeup of the microcode ROM. Green T's indicate transistors, while red X's indicate positions with no transistor.

I found an Intel history that confirmed that the 8086 transistor count includes potential sites, saying "This is 29,000 transistors if all ROM and PLA available placement sites are counted." That paper gives the approximate number of (physical) transistors in the 8086 as 20,000. This number is close to my count of 19,618.

To get a transistor count that includes empty sites, I counted the number of transistor sites in the various ROMs and PLAs in the 8086 chip. This is harder than you might expect because the smaller ROMs, such as the constant ROM, have some layout optimization. The photo below shows a closeup of the constant ROM. It is essentially a grid, but has been "squeezed" slightly to optimize its layout, making it slightly irregular. I'm counting its "potential" transistors, but one could argue that it shouldn't be counted because filling in these transistors might run into problems.

Closeup of the constant ROM showing the silicon and polysilicon.

Closeup of the constant ROM showing the silicon and polysilicon.

The following table breaks down the ROM and PLA counts by subcomponent. I found a total of approximately 9659 unfilled transistor vacancies. If you add those to my transistor count, it works out to 29,277 transistors.

ComponentTransistor sitesTransistorsVacancies
Microcode1390462107694
Group Decode ROM1254603651
Translation ROM1050431619
Register PLAs465182283
ALU PLA354170184
Constant ROM20310994
Condition PLA1607486
Segment PLA904248

The image below shows these ROMs and PLAs on the die and how much the vacancies increase the transistor count. Not surprisingly, the large microcode ROM and its decoding PLA are responsible for most of the vacancies.

The 8086 die with transistor vacancy counts and how much they contribute to the final transistor count. (Click this image or any other for a larger version.)

The 8086 die with transistor vacancy counts and how much they contribute to the final transistor count. (Click this image or any other for a larger version.)

Potential exclusions

So are my counts of 19,618 transistors and 29,277 transistor sites correct? There are some subtleties that could lower this count slightly. First, the output pins use large, high-current transistors. Each output transistor is constructed from more than a dozen transistors wired in parallel. Should this be counted as a dozen transistors or a single transistor? I'm counting the component transistors.

An output pad with a bond wire attached. Driver transistors next to the pad are constructed from multiple transistors in parallel.

An output pad with a bond wire attached. Driver transistors next to the pad are constructed from multiple transistors in parallel.

The 8086 has about 43 transistors wired as diodes for various purposes. Some are input protection diodes, while others are used in the charge pump for the substrate bias generator. Should these be excluded from the transistor count? Physically they are transistors but functionally they aren't.

The 8086 is built with NMOS logic which builds gates out of active "enhancement" transistors as well as "depletion" transistors which basically act as pull-up resistors. I count 2689 depletion-mode transistors, but you could exclude them from the count as not "real" transistors.

Conclusions

The number of transistors in a chip is harder to define than you might expect. The 8086 is commonly described as having 29,000 transistors when including empty sites in ROMs and PLAs that potentially could have a transistor. The published number of physical transistors in the 8086 is "approximately 20,000". From my counts, the 8086 has 19,618 physical transistors and 29,277 transistors when including empty sites. Given the potential uncertainties in counting, it's not surprising that Intel rounded the numbers to the nearest thousand.

The practice of counting empty transistor sites may seem like an exaggeration of the real transistor count, but there are some good reasons to count this way. Including empty sites gives a better measure of the size and complexity of the chip, since these sites take up area whether or not they are used. This number also lets one count the number of transistors before the microcode is written, and it is also stable as the microcode changes. But when looking at transistor counts, it's good to know exactly what is getting counted.

I plan to continue reverse-engineering the 8086 die so follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @[email protected]. I discussed the transistor count in the 6502 processor here.

Notes and references

  1. For example, The 8086 Family Users Manual says on page A-210: "The central processor for the iSBC 86/12 board is Intel's 8086, a powerful 16-bit H-MOS device. The 225 sq. mil chip contains 29,000 transistors and has a clock rate of 5MHz." 

  2. ROMs can also be constructed the other way around, with a transistor indicating a 0. It's essentially an arbitrary decision, depending on whether the output buffer inverts the bit or not. Other ROM technologies may have transistors at all the sites but only connect the desired ones. 

Reverse-engineering an airspeed/Mach indicator from 1977

How does a vintage airspeed indicator work? CuriousMarc picked one up for a project, but it didn't have any documentation, so I reverse-engineered it. This indicator was used in the cockpit panel for business jets such as the Gulfstream G-III, Cessna Citation, and Bombardier Challenger CL600. It was probably manufactured in 1977 based on the dates on its transistors.

You might expect that the indicators on an aircraft control panel are simple dials. But behind this dial is a large, 2.8-pound box with a complex system of motors, gears, and feedback potentiometers, controlled by two boards of electronics. But for all this complexity, the indicator doesn't have any smarts: the pointers just indicate voltages fed into it from an air data computer. This is a quick blog post to summarize what I found.

Front view of the indicator.

Front view of the indicator.

The dial has two rotating pointers: the white pointer indicates airspeed in knots while the striped pointer indicates the maximum airspeed (which varies depending on altitude). The "digital" indicator at the top shows Mach number from 0.10 to 0.99, implemented with rotating digit wheels. When the unit is operating, the OFF indicator flag switches to black. The flag switches to a bright VMO warning if the pilot exceeds the maximum airspeed.1 On the rim of the dial, two small markers called "bugs" can be manually moved to indicate critical speeds such as takeoff speed.

In use, the indicator is connected to a Sperry air data computer and receives voltage signals to control the dial positions.3 The air data computer measures the static and dynamic air pressure from pitot tubes and determines the airspeed, Mach number, altitude, and other parameters. (These calculations become nontrival near Mach 1 as air compresses and the fluid dynamics change.) Since we didn't have the air data computer or its specifications, I needed to figure out the connections from the computer to the display.

With the unit's cover removed, you can see the internal mechanisms and circuitry. Each of the three indicators is controlled by a small DC motor with a potentiometer providing feedback. To the right, two circuit boards provide the electronics to drive the indicators.4 At the upper right, the black blob is a 26-volt 400-Hertz transformer to power the unit. Some power supply components are in front of it. Below the transformer is an orangish flexible printed-circuit board, which seems advanced for the timeframe. This flexible ribbon connects the transformer, the external connector, and the printed-circuit board sockets, providing the backplane for the system.

A side view of the unit shows the gears to control the indicators.

A side view of the unit shows the gears to control the indicators.

The diagram below shows the principle behind the servo mechanism that controls each indicator. The goal is to rotate the indicator to a position corresponding to the input voltage. A feedback loop is used to achieve this. The potentiometer provides a voltage proportional to its rotation. The input voltage and the feedback voltage are inputs to an op amp, which generates an error signal based on the difference between the inputs. The error signal rotates the DC motor in the appropriate direction until the potentiometer voltage matches the input voltage. Because the indicator and the potentiometer are geared together, the indicator will be in the correct position. As the input voltage changes, the system will continuously track the changes and keep the indicator updated.

A diagram illustrating the servo feedback loop.

A diagram illustrating the servo feedback loop.

Because the DC motor spins much faster than the dial moves, reduction gears slow the rotation. The photo below shows the gear train in the unit. A potentiometer is at the upper-right with three wires attached.

A closeup of the gear train. A potentiometer is on the right.

A closeup of the gear train. A potentiometer is on the right.

The Mach number has additional gearing to rotate the numbered wheels. When the low-digit wheel cycles around, it advances the high-digit wheel, similar to an odometer.

The mechanism to rotate the digit wheels for the Mach number.

The mechanism to rotate the digit wheels for the Mach number.

Fault checking

One interesting feature of the indicator unit is that it implements fault checking to alert the pilot if something goes wrong. The front panel has a three-position flag. By default it's in the OFF position. Powering the coil in one direction rotates the flag to the blank side. Powering the coil in the other direction rotates the flag to the "VMO" position which indicates that the pilot has exceeded the maximum operating speed.

I figured that powering up the unit would move the flag out of the OFF position, but it's more complicated than that. First, the unit checks that the air data computer is providing a suitable reference voltage. Second, the unit verifies that the motor voltages for the two needles are within limits; this ensures that the servo loop is operating successfully. Third, the unit checks that signals are received on status pins K and L. The unit only moves out of the OFF state if all these conditions are satisfied.5 Thus, if the unit receives bad signals or is malfunctioning, the pilot will be alerted by the OFF indicator, rather than trusting the faulty display.

The circuitry

The unit is powered by 26 volts, 400 Hz, a standard voltage for aviation. A small transformer provides multiple outputs for the various internal voltages. The unit has four power supplies: three on the first board and one on the back wall of the unit. One power supply is for the status indicator, one is for the op amps, one powers the 41.7V motors, and the fourth provides other power.

One subtlety is how the feedback potentiometers are powered. The servo loop compares the potentiometer voltage with the input voltage. But this only works if the potentiometer and the input voltage are using the same reference. One solution would be for the indicator unit and the air data computer to contain matching precision voltage regulators. Instead, the system uses a simpler, more reliable approach: the air data computer provides a reference voltage that the indicator unit uses to power the potentiometers.6 With this approach, the air data computer's voltage reference can fluctuate and the indicator will still reach the right position. (In other words, a 5V input with a 10V reference and a 6V input with a 12V reference are both 50%.)

The diagram below shows the board with the servo circuitry. The board uses dual op-amp integrated circuits, packaged in 10-pin metal cans that protected against interference.7 The ICs and some of the other components have obscure military part numbers; I don't know if this unit was built for military use or if military-grade parts were used for reliability.

The servo board is full of transistors, resistors, capacitors, diodes, and op-amp integrated circuits.

The servo board is full of transistors, resistors, capacitors, diodes, and op-amp integrated circuits.

The circuitry in the lower-left corner handles the reference voltage from the air data computer. The board buffers this voltage with an op amp to power the three feedback potentiometers. The op amp also ensures that the reference voltage is at least 10 volts. If not, the indicator unit shows the "OFF" flag to alert the pilot.

The schematic below shows one of the servo circuits; the three circuits are roughly the same. The heart of the circuit is the error op amp in the center. It compares the voltage from the potentiometer with the input voltage and generates an error output that moves the motor appropriately. A positive error output will turn on the upper transistor, driving the motor with a positive voltage. Conversely, a negative error output will turn on the lower transistor, driving the motor with a negative voltage. The motor drive circuit has clamp diodes to limit the transistor base voltages.

Schematic of one of the servo circuits.

Schematic of one of the servo circuits.

The op amp also receives a feedback signal from the motor output. I don't entirely understand this signal, which goes through a filter circuit with resistors, diodes, and a capacitor. I think it dampens the motor signal so the motor doesn't overshoot the desired position. I think it also keeps the transistor drive signal biased relative to the emitter voltage (i.e. the motor output).

On the input side, the potentiometer voltage goes through an op amp follower buffer, which simply outputs its input voltage. This may seem pointless, but the op amp provides a high-impedance input so the potentiometer's voltage doesn't get distorted.

The external input voltage goes through a resistor/capacitor circuit to scale it and filter out noise. Curiously, the circuit board was modified by cutting a trace and adding a resistor and capacitor to change the input circuit for one of the inputs. In the photo below, you can see the added resistor and capacitor; the cut trace is just to the right of the capacitor. I don't know if this modification changed the scale factor or if it filtered out noise. A label on the box says that Honeywell performed a modification on November 8, 1991, which presumably was this circuit.

A closeup of the circuit board showing the modification.

A closeup of the circuit board showing the modification.

The second board implements three power supplies as well as the circuitry for the OFF/VMO flag. The power supplies are simple and unregulated, just diode bridges to convert AC to DC, along with filter capacitors. Most of the circuitry on the board controls the status flag. Two dual op amps check the motor voltages against upper and lower limits to ensure that the motors are tracking the inputs. These outputs, along with other logic status signals, are combined with diode-transistor logic to determine the flag status. Driver transistors provide +18 or -18 volts to the flag's coil to drive it to the desired position.

This board has power supply circuitry and the control circuitry for the indicator flag.

This board has power supply circuitry and the control circuitry for the indicator flag.

Conclusions

After reverse-engineering the pinout, I connected the airspeed indicator to a stack of power supplies and succeeded in getting the indicators to operate (video). This unit is much more complex than I expected for a simple display, with servoed motors controlled by two boards of electronics. Air safety regulations probably account for much of the complexity, ensuring that the display provides the pilot with accurate information. For all that complexity, the unit is essentially a voltmeter, indicating three voltages on its display. This airspeed indicator is a bit different from most of the hardware I examine, but hopefully you found this look at its internal circuitry interesting.

With the case removed, the internal circuitry is visible.

With the case removed, the internal circuitry is visible.

You can follow me on Twitter @kenshirriff or rss. I've also started experimenting with mastodon recently as @[email protected].

Notes and references

  1. Since the unit has airspeed and maximum airspeed indicators, you might expect it to display the maximum airspeed warning flag based on the two speed inputs. Instead, the flag is controlled by input pin "L". In other words, the air data computer, not the indicator unit, determines when the maximum airspeed is exceeded. 

  2. This unit is a "Mach Airspeed Indicator", 4018366, apparently also called the SI-225,2

    Product label with part number 4018366-901.

    Product label with part number 4018366-901.

    Note that the label says Sperry. In 1986, Sperry attempted to buy Honeywell but instead Burroughs made a hostile takeover bid. The merger of Sperry and Burroughs formed Unisys. A couple of months after the merger, the Sperry Aerospace Group was sold to Honeywell for $1.025 billion. Thus, the indicator became a Honeywell product. This corporate history explains why the unit has a Honeywell product support sticker.

    Labels on top of the unit indicate that it worked with the Sperry 4013242 and 4013244 air data computers. These became the Honeywell AZ-242 and AZ-244.

    Labels on top of the unit indicate that it worked with the Sperry 4013242 and 4013244 air data computers. These became the Honeywell AZ-242 and AZ-244.

     

  3. The connector is a 32-pin MIL Spec round connector. Most of the 32 pins are unused. The connector has complex keying with 5 slots. I assume the keying is specific to this indicator, so the wrong indicator doesn't get connected.

    A closeup of the 32-pin connector, probably a MIL Spec 18-32.

    A closeup of the 32-pin connector, probably a MIL Spec 18-32.

    For reference, here is the pinout of the unit. Since this is based on reverse engineering, I don't guarantee it 100%. Don't use this for flight!

    PinUse
    A5V illumination
    BChassis ground
    CAC ground
    E26V 400 Hz
    F26V 400 Hz
    KEnable
    LSpeed ok
    MSignal ground
    NRef. voltage
    PVmax control voltage
    RAirspeed control voltage
    SMach control voltage
    VChassis ground

    Pins D, G, H, J, T, U, W, X, Y, Z, a, b, c, d, e, f, g, h, and j are unused. 

  4. The chassis has an empty slot for a third circuit board. My guess is that this chassis was used for multiple types of indicators and others required a third board. 

  5. If the L pin goes low, the indicator will move to the VMO position. 

  6. My hypothesis is that the correct reference voltage is 11.7 volts. This yields a scale factor of 1 volt equals 50 knots. It also matches up the display's change in scale at 250 knots with the measured scale change. 

  7. The meter uses three different integrated circuits in 10-pin metal cans with mysterious military markings: "FHL 24988", "JM38510/10102BIC 27014", and "SL14040". These appear to all be equivalent to uA747 dual op amps. (Note that JM38510 is not a part number; it is a general military specification for integrated circuits. The number after it is the relevant part number.)