Friday, July 19, 2013

Reverse-engineering the 8085's ALU and its hidden registers

This article describes how the ALU of the 8085 microprocessor works and how it interacts with the rest of the chip, based on reverse-engineering of the silicon. (This is part 2 of my ALU reverse-engineering; part 1 described the circuit for a single ALU bit.) Along with the accumulator, the ALU uses two undocumented registers - ACT and TMP - and this article describes how they work in detail, as well as how the ALU is controlled.

The arithmetic-logic unit is a key part of the microprocessor, performing operations and comparisons on data. In the 8085, the ALU is also a key part of the data path for moving data. The ALU and associated registers take up a fairly large part of the chip, the upper left of the photomicrograph image below. The control circuitry for the ALU is in the top center of the image. The data bus (dbus) is indicated in blue.

Photograph of the 8085 chip showing the location of the ALU, flags, and registers.

Photograph of the 8085 chip showing the location of the ALU, flags, and registers.

The real architecture of the 8085 ALU

The following architecture diagram shows how the ALU interacts with the rest of the 8085 at the block-diagram level. The data bus (dbus) conneccts the ALU and associated registers with the rest of the 8085 microprocessor. There are also numerous control lines, which are not shown.

The ALU uses two temporary registers that are not directly visible to the programmer. The Accumulator Temporary register (ACT) holds the accumulator value while an ALU operation is performed. This allows the accumulator to be updated with the new value without causing a race condition. The second temporary register (TMP) holds the other argument for the ALU operation. The TMP register typically holds a value from memory or another register.

Architecture of the 8085 ALU as determined from reverse-engineering.

Architecture of the 8085 ALU as determined from reverse-engineering.

The 8085 datasheet has an architecture diagram that is simplified and not quite correct. In particular, the ACT register is omitted and a data path from the data bus to the accumulator is shown, even though that path doesn't exist.

The accumulator and ACT registers

To the programmer, the accumulator is the key register for arithmetic operations. Reverse-engineering, however, shows the accumulator is not connected directly to the ALU, but works closely with the ACT (accumulator temporary) register.

The ACT register has several important functions. First, it holds the input to the ALU. This allows the results from the ALU to be written back to the accumulator without disturbing the input, which would cause instability. Second, the ACT can hold constant values (e.g. for incrementing or decrementing, or decimal adjustment) without affecting the accumulator. Finally, the ACT allows ALU operations that don't use the accumulator.

The accumulator and ACT (Accumulator Temporary) registers and their control lines in the 8085 microprocessor.

The accumulator and ACT (Accumulator Temporary) registers and their control lines in the 8085 microprocessor.

The diagram above shows how the accumulator and ACT registers are connected, and the control lines that affect them. One surprise is that the only way to put a value into the accumulator is through the ALU. This is controlled by the alu_to_a control line. You might expect that if you load a value into the accumulator, it would go directly from the data bus to the accumulator. Instead, the value is OR'd with 0 in the ALU and the result is stored in the accumulator.

The accumulator has two status outputs: a_hi_ge_10, if the four high-order bits are ≥ 10, and a_lo_ge_10, if the four low-order bits are ≥ 10. These outputs are used for decimal arithmetic, and will be explained in another article.

The accumulator value or the ALU result can be written to the databus through the sel_alu_a control (which selects between the ALU result and the accumulator), and the alu/a_to_dbus control line, which enables the superbuffer to write the value to the data bus. (Because the data bus is large and connects many parts of the chip, it requires high-current signals to overcome its capacitance. A "superbuffer" provides this high-current output.)

The ACT register can hold a variety of different values. In a typical arithmetic operation, the accumulator value is loaded into the ACT via the a_to_act control. The ACT can also load a value from the data bus via dbus_to_act. This is used for the ARHL/DAD/DSUB/LDHI/LDSI/RDEL instructions (all of which are undocumented except DAD). These instructions perform arithmetic operations without involving the accumulator, so they require a path into the ALU that bypasses the accumulator.

The control lines allow the ACT register to be loaded with a variety of constants. The 0/fe_to_act control line loads either 0 or 0xfe into the ACT; the value is selected by the sel_0_fe control line. The value 0 has a variety of uses. ORing a value with 0 allows the value to pass through the ALU unchanged. If the carry is set, ADDing to 0 performs an increment. The value 0xfe (signed -2) is used only for the DCR (decrement by 1) instruction. You might think the value 0xff (signed -1) would be more appropriate, but if the carry is set, ADDing 0xfe decrements by 1. I think the motivation is so both increments and decrements have the carry set, and thus can use the same logic to control the carry.

Since the 8085 has a 16-bit increment/decrement circuit, you might wonder why the ALU is also used for increment/decrement. The main reason is that using the ALU allows the condition flags to be set by INR and DCR. In contrast, the 16-bit increment and decrement instructions (INX and DCX) use the incrementer/decrementer, and as a consequence the flags are not updated.

To support BCD, the ACT can be loaded with decimal adjustment values 0x00, 0x06, 0x60, or 0x66. The top and bottom four bits of ACT are loaded with the value 6 with the 6x_to_act and x6_to_act control lines respectively.

It turns out that the decimal adjustment values are easily visible in the silicon. The following image shows the silicon that implements the ACT register. Each of the large pink structures is one bit. The eight bits are arranged with bit 7 on the left and bit 0 on the right. Note that half of the bits have pink loops at the top, in the pattern 0110 0110. These loops pull the associated bit high, and are used to set the high and/or low four bits to 6 (binary 0110).

The ACT register in the 8085. This image shows the silicon that implements the 8-bit register. Each of the large pink structures is one bit.  Bit 7 is on the left and bit 0 on the right.

The ACT register in the 8085. This image shows the silicon that implements the 8-bit register.

Building the 8-bit ALU from single-bit slices

In my previous article on the 8085 ALU I described how each bit of the ALU is implemented. Each bit slice of the ALU takes two inputs and performs a simple operation: or, add, xor and, shift right, complement, or subtract. The ALU has a shift right input and a carry input, and generates a carry output. In addition, each slice of the ALU contributes to the parity and zero calculations. The ALU has five control lines to select the operation.

One bit of the ALU in the 8085 microprocessor.

One bit of the ALU in the 8085 microprocessor

The ALU has seven basic operations: or, add, xor, and, shift right, complement, and subtract. The following table shows the five control lines that select the operation, and the meaning of the carry line for the operation. Note that the meaning of carry in and carry out is different for each operations. For bit operations, the implementation of the ALU circuitry depends on a particular carry in value, even though carry is meaningless for these operations.

Operationselect_neg_in2select_op1select_op2select_shift_rightselect_ncarry_1Carry in/out
or000011
add01000/carry
xor010011
and011010
shift right001110
complement100011
subtract11000borrow

The eight-bit ALU is formed by linking eight single-bit ALUs as shown below. The high-order bit is on the left, and the low-order bit on the right, matching the layout in silicon. The carry, parity, and zero values propagate through each ALU to form the final values on the left. The right shift input is simply the bit from the right, with the exception of the topmost bit which uses a special shift right input. The auxiliary carry is simply the carry out of bit three. The control lines to select the operation are fed into all eight ALU slices. By combining eight of these ALU slices, the whole 8-bit ALU is created. The values from the top bit are used to control the parity, zero, carry, and sign flags (as well as the undocumented K and V flags). Bit 3 generates the half carry flag.

The 8-bit ALU in the 8085 is formed by combining eight 1-bit slices.

The 8-bit ALU in the 8085 is formed by combining eight 1-bit slices.

The control lines

The ALU uses 29 control lines that are generated by a PLA that activates the right control lines based on the opcode and the position in the instruction cycle. For reference, the following table lists the 29 ALU control lines and the instructions that affect them.
Control lineRelevant instructions
ad_latch_dbus, write_dbus_to_alu_tmp, /ad_dbus IN/LDA/LHLD
/ad_dbus ARHL/DAD/DSUB/LDHI/LDSI/RDEL
/alu/a_to_dbus all
/dbus_to_act ARHL/DAD/DSUB/LDHI/LDSI/RDEL
a_to_act ACI/ADC/ADD/ADI/ANA/ANI/CMP/CPI/ORA/ORI/RAL/RAR/RLC/RRC/SBB/SBI/SUB/SUI/XRA/XRI
0/fe_to_act all
sel_alu_a all
alu_to_a ACI/ADC/ADD/ADI/ANA/ANI/CMA/CMC/DAA/DCR/IN/INR/LDA/LDAX/MOV/MVI/ORA/ORI/POP/RAL/RAR/RIM/RLC/RRC/SBB/SBI/SIM/STC/SUB/SUI/XRA/XRI
/daa DAA
sel_0_fe DCR
store_v_flag ACI/ADC/ADD/ADI/ANA/ANI/ARHL/CMP/CPI/DAA/DCR/INR/ORA/ORI/RAL/RAR/RLC/RRC/SBB/SBI/SUB/SUI/XRA/XRI
select_shift_right ARHL/RAR/RRC
arith_to_flags ACI/ADC/ADD/ADI/ANA/ANI/CMP/CPI/DAA/DCR/DSUB/INR/ORA/ORI/SBB/SBI/SUB/SUI/XRA/XRI
bus_to_flags POP PSW
/zero_flag_combine DAD/DSUB
/flags_to_bus ACI/ADC/ADD/ADI/ANA/ANI/ARHL/CALL/CC/CM/CMA/CMC/CMP/CNC/CNZ/CP/CPE/CPI//CPO/CZ/DAA/DAD/DCR/DCX/DI/DSUB/EI/HLT/IN/INR/INX/JC/JK/JM/JMP/JNC/JNK/JNZ/JP/JPE/JPO/JZ/LDA/LDAX/LDHI/LDSI/LHLD/LHLX/LXI/MOV/MVI/NOP/ORA/ORI/OUT/PCHL/POP/PUSH/RAL/RAR/RC/RDEL/RET/RIM/RLC/RM/RNC/RNZ/RP/RPE/RPO/RRC/RST/RSTV/RZ/SBB/SBI/SHLD/SHLX/SIM/SPHL/STA/STAX/STC/SUB/SUI/XCHG/XRA/XRI/XTHL
shift_right_in_select ARHL
xor_carry_in ANA/ANI/ARHL/CMP/CPI/DCR/DSUB/INR/RAR/RRC/SBB/SBI/SUB/SUI
select_op2 ANA/ANI/ARHL/RAR/RRC
/use_latched_carry /rotate_carry LDHI/LDSI/RLC/RRC
/carry_in_0 0 except for ACI/ADC/DAD/DSUB/LDHI/LDSI/RAL/RDEL/RLC/SBB/SBI
select_op1 ACI/ADC/ADD/ADI/ANA/ANI/CMP/CPI/DAA/DAD/DCR/DSUB/INR/LDHI/LDSI/RAL/RDEL/RLC/SBB/SBI/SUB/SUI/XRA/XRI
select_ncarry_1 ACI/ADC/ADD/ADI/CMP/CPI/DAA/DAD/DCR/DSUB/INR/LDHI/LDSI/RAL/RDEL/RLC/SBB/SBI/SUB/SUI
In combination with first control line, write_dbus_to_alu_tmp ADC/ADD/ANA/CMA/CMC/CMP/DAA/DCR/INR/MOV/ORA/RAL/RAR/RIM/RLC/RRC/SBB/SIM/STC/SUB/XRA
select_neg_in2 CMA/CMP/CPI/DSUB/SBB/SBI/SUB/SUI
carry_to_k_flag DCX/INX
store_carry_flag ACI/ADC/ADD/ADI/ANA/ANI/ARHL/CMC/CMP/CPI/DAA/DAD/DSUB/ORA/ORI/RAL/RAR/RDEL/RLC/RRC/SBB/SBI/STC/SUB/SUI/XRA/XRI
xor_carry_result xor for ANA/ANI/CMC/CMP/CPI/DSUB/SBB/SBI/STC/SUB/SUI
/latch_carry use_carry_flag CMC/LDHI/LDSI

Conclusions

By reverse-engineering the 8085, we can see how the ALU actually works at the gate and silicon level. The ALU uses many standard techniques, but there are also some surprises and tricks. There are two registers (ACT and TMP) that are invisible to the programmer. You'd expect a direct path from the data bus to the accumulator, but instead the data passes through the ALU. The increment/decrement logic uses the unexpected constant 0xfe, and there are two totally different ways of performing increment/decrement. Several undocumented instructions perform ALU operations without involving the accumulator at all.

This information builds on the 8085 reverse-engineering done by the visual 6502 team. This team dissolves chips in acid to remove the packaging and then takes many close-up photographs of the die inside. Pavel Zima converted these photographs into mask layer images, generated a transistor net from the layers, and wrote a transistor-level 8085 simulator.

2 comments:

Anonymous said...

沙发

Anonymous said...

Very interesting, I guess the 8086 has a very similar datapath since it takes 3 cycles for register-register ALU operations. At the time it looks like Intel were more concerned about reducing chip area than making things fast, hence single ALU, single-ported register file, big common databus, etc.