Reverse-engineering the register codes for the 8086 processor's microcode

Like most processors, the Intel 8086 (1978) provides registers that are faster than main memory. As well as the registers that are visible to the programmer, the 8086 has a handful of internal registers that are hidden from the user. Internally, the 8086 has a complicated scheme to select which register to use, with a combination of microcode and hardware. Registers are assigned a 5-bit identifying number, either from the machine instruction or from the microcode. In this blog post, I explain how this register system works.

My analysis is based on reverse-engineering the 8086 from die photos. The die photo below shows the chip under a microscope. For this die photo, I removed the the metal and polysilicon layers, revealing the silicon underneath. I've labeled the key functional blocks; the ones that are important to this post are darker. In particular, the registers and the Arithmetic/Logic Unit (ALU) are at the left and the microcode ROM is in the lower right. Architecturally, the chip is partitioned into a Bus Interface Unit (BIU) at the top and an Execution Unit (EU) below. The BIU handles bus and memory activity as well as instruction prefetching, while the Execution Unit (EU) executes the instructions.

The 8086 die under a microscope, with main functional blocks labeled.  Click on this image (or any other) for a larger version.

The 8086 die under a microscope, with main functional blocks labeled. Click on this image (or any other) for a larger version.

Microcode

Most people think of machine instructions as the basic steps that a computer performs. However, many processors (including the 8086) have another layer of software underneath: microcode. With microcode, instead of building the control circuitry from complex logic gates, the control logic is largely replaced with code. To execute a machine instruction, the computer internally executes several simpler micro-instructions, specified by the microcode.

The 8086 uses a hybrid approach: although it uses microcode, much of the instruction functionality is implemented with gate logic. This approach removed duplication from the microcode and kept the microcode small enough for 1978 technology. In a sense, the microcode is parameterized. For instance, the microcode can specify a generic Arithmetic/Logic Unit (ALU) operation and a generic register. The gate logic examines the instruction to determine which specific operation to perform and the appropriate register.

A micro-instruction in the 8086 is encoded into 21 bits as shown below. Every micro-instruction has a move from a source register to a destination register, each specified with 5 bits; this encoding is the main topic of this blog post. The meaning of the remaining bits depends on the type field and can be anything from an ALU operation to a memory read or write to a change of microcode control flow. For more about 8086 microcode, see my microcode blog post.

The encoding of a micro-instruction into 21 bits. Based on NEC v. Intel: Will Hardware Be Drawn into the Black Hole of Copyright?

The encoding of a micro-instruction into 21 bits. Based on NEC v. Intel: Will Hardware Be Drawn into the Black Hole of Copyright?

Let's look at how the machine instruction XCHG AX,reg is implemented in microcode. This instruction exchanges AX with the register specified in the low 3 bits of the instruction.1 The microcode for this instruction consists of three micro-instructions, so the instruction takes three clock cycles. Each micro-instruction contains a move, which is the interesting part.2 The specified register is moved to the ALU's temporary B register, the AX register is moved to the specified register, and finally the temporary B is moved to AX, completing the swap.

 move       action
M → tmpB           XCHG AX,rw: move reg to tmpB
AX → M      NXT     move AX to reg, Next to last
tmpB → AX   RNI     move tmpB to AX, Run Next Instruction

The key part for this discussion is how M indicates the desired register. Suppose the instruction is XCHG AX,DX. The bottom three bits of the instruction are 010, indicating DX. During the first clock cycle of instruction execution, the opcode byte is transferred from the prefetch queue over the queue bus. The M register is loaded with the DX number (which happens to be 26), based on the bottom three bits of the instruction. After a second clock cycle, the microcode starts. The first micro-instruction puts M's value (26) onto the source bus and the number for tmpB (13) on the destination bus, causing the transfer from DX to tmpB. The second micro-instruction puts the AX number (24) onto the source bus and the M value (26) onto the destination bus, causing the transfer from AX to DX. The third micro-instruction puts tmpB number (13) onto the source bus and the AX number (24) onto the destination bus, causing the transfer from tmpB to AX.

Thus, the values on the source and destination bus control the data transfer during each micro-instruction. Microcode can either specify these values explicitly (as for AX and tmpB) or can specify the M register to use the register defined in the instruction. Thus, the same microcode implements all the XCHG instructions and the microcode doesn't need to know which register is involved.

The register encoding

The microcode above illustrated how different numbers specified different registers. The table below shows how the number 0-31 maps onto a register. Some numbers have a different meaning for a source register or a destination register; a slash separates these entries.

0ES8AL16AH24AX
1CS9CL17CH25CX
2SS10DL18DH(M)26DX
3DS11BL19BH(N)27BX
4PC12tmpA20Σ/tmpAL28SP
5IND13tmpB21ONES/tmpBL29BP
6OPR14tmpC22CR/tmpAH30SI
7Q/-15F23ZERO/tmpBH31DI

Most of these entries are programmer-visible registers: the segment registers are in green, the 8-bit registers in blue, and the 16-bit registers in red. Some internal registers and pseudo-registers are also accessible: IND (Indirect register), holding the memory address for a read or write; OPR (Operand register), holding the data for a read or write; Q (Queue), reading a byte from the instruction prefetch queue; ALU temporary registers A, B, and C, along with low (L) and (H) bytes; F, Flags register; Σ, the ALU output; ONES, all ones; CR, the three low bits of the microcode address; and ZERO, the value zero. The M and N entries can only be specified from microcode, taking the place of DH and BH.

The table is kind of complicated, but there are reasons for its structure. First, machine instructions in the 8086 encode registers according to the system below. The 5-bit register number above is essentially an extension of the instruction encoding. Moreover, the AX/CX/DX/BX registers (red) are lined up with their upper-byte and lower-byte versions (blue). This simplifies the hardware since the low three bits of the register number select the register, while the upper two bits perform the byte versus word selection.3 The internal registers fit into available spots in the table.

The register assignments, from MCS-86 Assembly Language Reference Guide.

The register assignments, from MCS-86 Assembly Language Reference Guide.

The ModR/M byte

Many of the 8086 instructions use a second byte called the ModR/M byte to specify the addressing modes.4 The ModR/M byte gives the 8086 a lot of flexibility in how an instruction accesses its operands. The byte specifies a register for one operand and either a register or memory for the other operand. The diagram below shows how the byte is split into three fields: mod selects the overall mode, reg selects a register, and r/m selects either a register or memory mode. For a ModR/M byte, the reg and the r/m fields are read into the N and M registers respectively, so the registers specified in the ModR/M byte can be accessed by the microcode.

modregr/m
76543210

Let's look at the instruction SUB AX,BX which subtracts BX from AX. In the 8086, some important processing steps take place before the microcode starts. In particular, the "Group Decode ROM" categorizes the instruction into over a dozen categories that affect how it is processed, such as instructions that are implemented without microcode, one-byte instructions, or instructions with a ModR/M byte. The Group Decode ROM also indicates the structure of instructions, such as instructions that have a W bit selecting byte versus word operations, or a D bit reversing the direction of the operands. In this case, the Group Decode ROM classifies the instruction as containing a D bit, a W bit, an ALU operation, and a ModR/M byte.

Based on the Group Decode ROM's signals, fields from the opcode and ModR/M bytes are extracted and stored in various internal registers. The ALU operation type (SUB) is stored in the ALU opr register. The ModR/M byte specifies BX in the reg field and AX in the r/m field so the reg register number (BX, 27) is stored in the N register, and the r/m register number (AX, 24) is stored in the M register.

Once the preliminary decoding is done, the microcode below for this ALU instruction is executed.5 There are three micro-instructions, so the instruction takes three clock cycles. First, the register specified by M (i.e. AX) is moved to the ALU's temporary A register (tmpA). Meanwhile, XI configures the ALU to perform the operation specified by the instruction bits, i.e. SUB. The second micro-instruction moves the register specified by N (i.e. BX) to the ALU's tmpB register. The last micro-instruction stores the ALU's result (Σ, number 20) in the register indicated by M (i.e. AX).

  move       action
M → tmpA     XI tmpA   ALU rm↔r: AX to tmpA
N → tmpB     NXT       BX to tmpB
Σ → M        RNI F     result to AX, update flags

One of the interesting features of the 8086 is that many instructions contain a D bit that reverses the direction of the operation, swapping the source and the destination. If we keep the ModR/M byte but use the SUB instruction with the D bit set, the instruction becomes SUB BX,AX, subtracting AX from BX, the opposite of before. (Swapping the source and destination is more useful when one argument is in memory. But I'll use an example with two registers to keep it simple.) This instruction runs exactly the same microcode as before. The difference is that when the microcode accesses M, due to the direction bit it gets the value in N, i.e. BX instead of AX. The access to N is similarly swapped. The result is that AX is subtracted from BX, and the change of direction is transparent to the microcode.

The M and N registers

Now let's take a closer look at how the M and N registers are implemented. Each register holds a 5-bit register number, expanded from the three bits of the instruction. The M register is loaded from the three least significant bits of the instruction or ModR/M byte, while the N register is loaded with bits three through five. Most commonly, the registers are specified by the ModR/M byte, but some instructions specify the register in the opcode.6

The table below shows how the bits in the instruction's opcode or ModR/M byte (i5, i4, i3) are converted to a 5-bit number for the N register. There are three cases: a 16-bit register, an 8-bit register, and a segment register. The mappings below may seem random, but they result in the entries shown in the 5-bit register encoding table earlier. I've colored the entries so you can see the correspondence.

Mode43210
16-bit reg11i5i4i3
8-bit regi5i5'0i4i3
segment reg000i4i3

I'll go through the three cases in more detail. Many 8086 instructions have two versions, one that acts on bytes and one that acts on words, distinguished by the W bit (bit 0) in the instruction. If the Group Decode ROM indicates that the instruction has a W bit and the W bit is 0, then the instruction is a byte instruction.7 If the instruction has a ModR/M byte and the instruction operates on a byte, the N register is loaded with the 5-bit number for the specified byte register. This happens during "Second Clock", the clock cycle when the ModR/M byte is fetched from the instruction queue. The second case is similar; if the instruction operates on a word, the N register is loaded with the number for the word register specified in the ModR/M byte.

The third case handles a segment register. The N register is loaded with a segment register number during Second Clock if the Group Decode ROM indicates the instruction has a ModR/M byte with a segment-register field (specifically the segment register MOV instructions). A bit surprisingly, a segment register number is also loaded during First Clock. This supports the PUSH and POP segment register instructions, which have the segment register encoded in bits 3 and 4 of the opcode.8

The table below shows how the bits are assigned in the M register, which uses instruction bits i2, i1, and i0. The cases are a bit more complicated than the N register. First, a 16-bit register number is loaded from the opcode byte during First Clock to support instructions that specify the register in the low bits. During Second Clock, this value may be replaced.

For a ModR/M byte using register mode, the M register is reloaded with the specified 8-bit or a 16-bit register, depending on the byte mode signal described earlier. However, for a ModR/M byte that uses a memory mode, the M register is loaded with OPR (Operand), the internal register that holds the word that is read or written to memory.

Mode43210
16-bit reg11i2i1i0
8-bit regi2i2'0i1i0
OPR00110
AX/ALbyte'1000
convert to 8-bitm2m2'0m1m0

Many instructions use the AX or AL register, such as the ALU immediate instructions, the input and output instructions, and string instructions. For these, the Group Decode ROM triggers the AX or AL register number specifically to be loaded into the M register during Second Clock. The top bit is set for a word operation and cleared for a byte operation providing AX or AL as appropriate.

The final M register case is a bit tricky. For an immediate move instruction such as MOV BX,imm, bit 3 switches between a byte and a word move (rather than bit 0), because bits 2-0 specify the register. Unfortunately, the Group Decode ROM outputs aren't available during First Clock to indicate this case. Instead, M is loaded during First Clock with the assumption of a 16-bit register. If that turns out to be wrong, the M register is converted to an 8-bit register number during Second Clock by shuffling a few bits.

Producing the source and destination values

There are three cases for the number that goes on the source or destination buses: the register number can come from the micro-instruction, the value can come from the M or N register as specified in the micro-instruction, or the value can come from the M and N register with the roles swapped by the D bit. (Note that the source and destination can be different cases and are selected with separate circuitry.)

The first case is the default case, where the 5 bits from the micro-instruction source or destination specify the register directly. For instance, in the micro-instruction tmpB→AX, the microcode knows which registers are being used and specifies them directly.

The second and third cases involve more logic. Consider the source in M→tmpB. For an instruction without a D bit, the register number is taken from M. Likewise if the D bit is 0. But if the instruction uses a D bit and the D bit is 1, then the register number is taken from N. Multiplexers between the M and N registers select the appropriate register to put on the bus.

The M and N registers as they appear on the die. The metal layer has been removed from this image to show the silicon and polysilicon underneath.

The M and N registers as they appear on the die. The metal layer has been removed from this image to show the silicon and polysilicon underneath.

The diagram above shows how the M and N register circuitry is implemented on the die, with the N register at the top and the M register below. Each register has an input multiplexer that implements the tables above, selecting the appropriate 5 bits depending on the mode. The registers themselves are implemented as dynamic latches driven by the clock. In the middle, a crossover multiplexer drives the source and destination buses, selecting the M and N registers as appropriate and amplifying the signals with relatively large transistors. The third output from the multiplexer, the bits from the micro-instruction, is implemented in circuitry physically separated and closer to the microcode ROM.

The register selection hardware

How does the 5-bit number select a register? The 8086 has a bunch of logic that turns a register number into a control line that enables reading or writing of the register. For the most part, this logic is implemented with NOR gates that match a particular register number and generate a select signal. The signal goes through a special bootstrap driver to boost its voltage since it needs to control 16 register bits.

The 8086 registers are separated into two main groups. The "upper registers" are in the upper left of the chip, in the Bus Interface Unit. These are the registers that are directly involved with memory accesses. The "lower registers" are in the lower left of the chip, in the Execution Unit. From bottom to top, they are AX, CX, DX, BX, SP, BP, SI, and DI; their physical order matches their order in the instruction set.9 A separate PLA (Programmable Logic Array) selects the ALU temporary registers or flags as destination. Just below it, a PLA selects the source from ALU temporary registers, flags, or the ALU result (Σ).10 I've written about the 8086's registers and their low-level implementation here if you want more information.

Some history

The 8086's system of selecting registers with 3-bit codes originates with the Datapoint 2200,11 a desktop computer announced in 1970. The processor of the Datapoint 2200 was implemented with a board of TTL integrated circuits, since this was before microprocessors. Many of the Datapoint's instructions used a 3-bit code to select a register, with a destination register specification in bits 5-3 of the instruction and a source register in bits 2-0. (This layout is essentially the same as in 8086 instructions and the ModR/M byte.)12 The eight values of this code selected one of 7 registers, with the eighth value indicating a memory access. Intel copied the Datapoint 2200 architecture for the 800813 microprocessor (1972) and cleaned it up for the 8080 (1974), but kept the basic instruction layout and register/memory selection bits.

The 8086's use of a numbering system for all the registers goes considerably beyond this pattern, partly because its registers function both as general-purpose registers and special-purpose registers.14 Many instructions can act on the AX, BX, etc. registers interchangeably, treating them as general-purpose registers. But these registers each have their own special purposes for other instructions, so the microcode must be able to access them specifically. This motivates the 8086's approach where registers can be treated as general-purpose registers that are selected from instruction bits, or as special-purpose registers selected by the microcode.

The Motorola 68000 (1979) makes an interesting comparison to the 8086 since they were competitors. The 68000 uses much wider microcode (85-bit microinstructions compared to 21 bits in the 8086). It has two main internal buses, but instead of providing generic source and destination transfers like the 8086, the 68000 has a much more complicated system: about two dozen microcode fields that connect registers and other components to the bus in various ways.15

Conclusions

Internally, the 8086 represents registers with a 5-bit number. This is unusual compared to previous microprocessors, which usually selected registers directly from the instruction or control circuitry. Three factors motivated this design in the 8086. First, it used microcode, so a uniform method of specifying registers (both programmer-visible and internal) was useful. Second, being able to swap the source and destination in an instruction motivated a level of indirection in register specification, provided by the M and N registers. Finally, the flexibility of the ModR/M byte, in particular supporting byte, word, and segment registers, meant that the register specification needed 5 bits.

I've written multiple posts on the 8086 so far and plan to continue reverse-engineering the 8086 die so follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @oldbytes.space@kenshirriff.

Notes and references

  1. As an aside, the NOP instruction (no operation) in the 8086 is really XCHG AX,AX. Exchanging the AX register with itself accomplishes nothing but takes 3 clock cycles. 

  2. The action part of the micro-instructions indicates the second-last micro-instruction (NXT, next) and the last (RNI, Run Next Instruction), so execution of the next machine instruction can start. 

  3. Note that register #18 can refer both to DH and the destination register. This doesn't cause a conflict because it refers to DH if loaded from the instruction, and refers to the destination register if specified in the micro-instruction. The only issue is that a micro-instruction can't refer to the DH register explicitly (or the BH register similarly). This restriction isn't a problem because the microcode never needs to do this. 

  4. I discuss the 8086's ModR/M byte in detail here

  5. The microcode listings are based on Andrew Jenner's disassembly. I have made some modifications to (hopefully) make it easier to understand. 

  6. There are a few instructions that specify a register in the opcode rather than the ModR/M byte. For 16-bit registers, the INC, DEC, XCHG, PUSH, and POP instructions specify the register in the low three bits of the opcode. The MOV immediate instructions specify either an 8-bit or 16-bit register in the low three bits. On the other hand, the segment is specified by bits 3 and 4 of the segment prefixes, PUSH, and POP instructions. 

  7. A few instructions only have byte versions (DAA, AAA, DAS, AAS, AAM, AAD, XLAT). This is indicated by a Group Decode ROM output and forces instruction execution into byte mode. Thus, these instructions would load a byte register into N, but since these instructions don't have a register specification, it doesn't matter. 

  8. The segment prefixes use the same instruction bits (3 and 4) as PUSH and POP to select the segment register, so you might expect the prefixes to also load the N register. However, the prefixes are implemented in hardware, rather than microcode. Thus, they do not use the N register and the N register is not loaded with the segment register number. 

  9. You might wonder why the BX register is out of sequence with the other registers, both physically on the chip and in the instruction set. The 8086 was designed so 8080 assembly code could be translated to 8086 code. Originally, the 8086 registers had different names: XA, BC, DE, HL, SP, MP, IJ, and IK. The first four names matched the registers in the Intel 8080 processor, while MP was Memory Pointer and IJ and IK were Index registers. However, when the 8086 was released the registers were given names that corresponded to their functions in the 8086, abandoning the 8080 names. XA became the Accumulator AX, The BC register was used for counting, so it became the Count register CX. The DE register was a data register, so it became the Data register DX. The HL register was used as a base for memory accesses, so it became the Base register BX. The result is that the BX register ended up last.

    A program CONV-86 allowed 8080 assembly programs to be translated into 8086 assembly programs, with 8080 registers replaced with the corresponding 8086 registers. The old 8086 register names can be seen in the 8086 patent, while the Accumulator, Base, Count, Data names are in the MCS-86 Assembly Language Reference Guide. See also this Stack Exchange discussion

  10. The all-ones source doesn't have any decoding; the ALU bus is precharged to the high state, so it is all ones by default. 

  11. The system of using bit fields in instructions to select registers is much older, of course. The groundbreaking IBM System/360 architecture (1964), for instance, used 4-bit fields in instructions to select one of the 16 general-purpose registers. 

  12. Note that with this instruction layout, the instruction set maps cleanly onto octal. The Datapoint 2200 used octal to describe the instruction set, but Intel switched to hexadecimal for its processors. Hexadecimal was becoming more popular than octal at the time, but the move to hexadecimal hides most of the inherent structure of the instructions. See x86 is an octal machine for details. 

  13. The Datapoint manufacturer talked to Intel and Texas Instruments about replacing the board of chips with a single processor chip. Texas Instruments produced the TMX 1795 microprocessor chip and Intel produced the 8008 shortly after, both copying the Datapoint 2200's architecture and instruction set. Datapoint didn't like the performance of these chips and decided to stick with a TTL-based processor. Texas Instruments couldn't find a customer for the TMX 1795 and abandoned it. Intel, on the other hand, sold the 8008 as an 8-bit microprocessor, creating the microprocessor market in the process. Register selection in these processors was pretty simple: the 3 instruction bits were decoded into 8 select lines that selected the appropriate register (or memory). Since these processors had hard-coded control instead of microcode, the control circuity generated other register selection lines directly. 

  14. While the 8086 has eight registers that can be viewed as general-purpose, they all have some specific purposes. The AX register acts as the accumulator and has several special functions, such as its use in the XCHG (Exchange) operation, I/O operations, multiplication, and division. The BX register has a special role as a base register for memory accesses. The CX register acts as the counter for string operations and for the JCXZ (Jump if CX Zero) instruction. The DX register can specify the port for I/O operations and is used for CWD (Convert Word to Doubleword) and in multiplication and division. The SP register has a unique role as the stack pointer. The SI and DI registers are used as index registers for string operations and memory accesses. Finally, the BP register has a unique role as base pointer into the stack segment. On the 8-bit side, AX, BX, CX, and DX can be accessed as 8-bit registers, while the other registers cannot. The 8-bit AL register is used specifically for XLAT (Translate) while AH is used for the flag operations LAHF and SAHF. Thus, the 8086's registers are not completely orthogonal, and each one has some special cases, often for historical reasons. 

  15. Another way of looking at the Motorola 68000's microcode is that the register controls come from "horizontal" microcode, a micro-instruction with many bits, and fields that control functional elements directly. The 8086's microcode is more "vertical"; the micro-instructions have relatively few bits and the fields are highly encoded. In particular, the 8086's source and destination register fields are highly encoded, while the 68000 has fields that control the connection of an individual register to the bus. 

3 comments:

Toivo Henningsson said...

It's interesting to see that they made the codes for e g AL and AH differ in the fourth and fifth bits, while they differ in the third bit instead in the instruction encoding. But I suppose it makes the hardware that needs to decode the register numbers simpler.

Unknown said...

I think your Mastodon address should be @[email protected]

Jecel said...

I mentioned on Hacker News an article that explained the microcode / nanocode thing for the 68000. I have now found it, but can't post to that thread.

https://dl.acm.org/doi/10.1145/1014198.804299

"Microprogrammed implementation of a single chip microprocessor"
Skip Stritter and Nick Tredennick
SIGMICRO Newsletter, Vol 9, Number 4, December 1978