Showing posts with label intel. Show all posts
Showing posts with label intel. Show all posts

Inside the Intel 1405: die photos of a shift register memory from 1970

In 1970, MOS memory chips were just becoming popular, but were still very expensive. Intel had released their first product the previous year, the 3101 RAM chip with 64 bits of storage.[1] For this chip (with enough storage to hold the word "aardvark") you'd pay $99.50.[2] To avoid these astronomical prices, some computers used the cheaper alternative of shift register memory. Intel's 1405 shift register provided 512 bits of storage — 8 times as much as their RAM chip — at a significantly lower price.[3][4] In a shift register memory, the bits go around and around in a circle, with one bit available at each step. The big disadvantage is that you need to wait for the bit you want to come around, which can take half a millisecond.

One computer that used shift register memory is the Datapoint 2200 computer. (This is a very interesting computer — the 8008 was created for it following the architecture specified by Datapoint — but that's a topic for another blog post.) In the Datapoint 2200, each memory board had 32 shift registers, providing 2K of storage. The processor board used a counter to keep track of the shift register position, and would stop processing until the right bits were available. (Kind of like a cache miss in modern processors.)

I got a display board from a Datapoint 2200[5], which uses Intel 1405 shift registers for the display storage. This board uses 14 shift registers and holds 896 bytes.[6] Shift-register memory was convenient for a video display board, since the circuitry needed to access each character in sequence to display it.

Intel 1405 shift registers provide memory storage for a Datapoint 2200 display.

Intel 1405 shift registers provide memory storage for a Datapoint 2200 display.

I opened up one of the shift register chips with a hacksaw and looked at it under a metallurgical microscope to get some die photos. Since the shift registers are in metal cans, they are easy to open up, unlike the plastic packages used by most chips. The following photo shows the die. The chip is fairly simple, with most of the chip taken up with the shift register cells. Around the outside of the chip you can see the nine pads with black wires connected.

The die shows some of the reasons that shift registers were cheaper than RAM chips. Unlike a RAM chip, the chip does not need to form a regular grid — the rows in the middle are shorter than the others because of the pin on the right. In addition, the chip doesn't need any address decoding logic. Thus, more bits can be fit onto a chip. Because there are no address lines, the chip has fewer pins than a RAM chip and can fit into a smaller package.

Die shot of the Intel 1405 MOS 512-bit shift register memory.

Die shot of the Intel 1405 MOS 512-bit shift register memory.

The diagram below shows the flow of bits through the shift register, in yellow. Bits enter through the input pin at the bottom. They zig-zag through the 20 rows of the shift register and exit at the top through the output pin. Bits recirculate back to the input along the left. The clock lines are at the right and are connected to each cell of the shift register.

Labeled die shot of the Intel 1405 MOS 512-bit shift register memory.

Labeled die shot of the Intel 1405 MOS 512-bit shift register memory.

In the lower left is the circuit to control input to the shift register, which consists of a few gates. Either a new bit can be written to the shift register each cycle, or the exiting bit can recirculate and re-enter the shift register. The photo below zooms in on this circuit. The four vertical wires at the left are the chip select 2, chip select 1, recirculated bit, and Vdd.

Input circuit of the 1405 shift register.

Input circuit of the 1405 shift register.

The image below shows the circuit to control the output from the shift register, which is in the upper left of the chip. The chip has two chip select inputs, which makes it convenient to arrange the shift registers in a grid with one set of lines enabling a row and a perpendicular set of lines enabling a column.

Output circuit of the 1405 shift register.

Output circuit of the 1405 shift register.
The image below shows the shift register cells at high magnification. On the left is the actual die photo, while the right labels the components of the die. Bits flow to the right through the bottom half of the picture, and then back to the left in the top half.

The large U shapes at the bottom are transistors (red T's) that form inverters (drawn in yellow). Between each inverter is a pass transistor that controls the flow of bits from inverter to inverter. The first T is connected to clock 1, allowing the bit to flow from the first inverter to the second when clock 1 is activated. The next T is connected to clock 2, passing the bit along another step on clock 2. As the clock lines are triggered in sequence, the bits pass step-by-step through the shift register.

The chip uses silicon-gate technology. This was an important innovation in chip design that was developed in 1968 at Fairchild by Federico Faggin (who also developed the Z80), and became a core technology at Intel. With this technology, polysilicon is used as the gates for transistors instead of aluminum metal, as previous MOS integrated circuits used. For various reasons, this made chips much faster and easier to manufacture.

In the picture below, polysilicon is indicated in blue. Where it overlaps the underlying doped silicon, a transistor is formed (red T). The horizontal gray lines are the metal layer, with the voltage supplies and the clocks. The circles show connections between the different layers.[7]

Close up of the cells in an Intel 1405 512-bit shift register memory. The actual photo is on the left, and the circuit is drawn on the right.

Close up of the cells in an Intel 1405 512-bit shift register memory. The actual photo is on the left, and the circuit is drawn on the right.

The clock driver

The display circuit board below has 14 shift registers in round metal cans. But there's a huge metal can at the right — what is this IC? That turns out to be the driver chip that provides the clock signals for the shift registers, and it's pretty interesting inside.

The shift registers require two alternating clock signals to shift. These signals must not overlap, or else the data will get messed up. In addition, the shift registers require up to 30 volts in the clock, due to their old technology. Finally, a lot of current (500mA) is needed in the clock signals to drive all the chips. To meet these requirements, a special clock driver chip is used to generate the clock signals. This is the Fairchild SH0013-C "Two phase MOS clock driver".[8]

1405 shift registers provide 896 bytes of storage on a Datapoint 2200 display card.

1405 shift registers provide 896 bytes of storage on a Datapoint 2200 display card.

I expected to find an IC with big transistors inside the clock driver chip, but opening it up revealed something entirely different. Inside is a hybrid integrated circuit made up of eight separate silicon dies mounted on a tiny circuit board and connected with gold traces and gold wires. In addition, there are thick film resistors printed onto the board — these are the black "E" shapes in the picture below.

Interactive viewer

The image and schematic[8] below are an interactive exploration of the SH0013 clock driver. Click a component to see its location on the board and in the schematic highlighted. The box below will give an explanation of the component.

Click image below for details.

Conclusion

While using shift registers as memory seems bizarre now, it was a cost-effective way to implement storage in 1970. Looking inside the shift register chips shows how they work and how they could be implemented more cheaply than RAM. Providing the high-power clock signals required a special driver chip, which turns out to be a hybrid circuit with tiny semiconductors and resistors on a circuit board in a large metal IC package.

Notes and references

[1] Intel didn't invent the memory chip, of course. There were many companies making memory chips in the 1960s. For instance, Texas Instruments announced the SN5481 bipolar memory chip in 1966 (Electronics, V39 #1, p151) and Transitron had the TMC 3162 and 3164 16-bit RAM (Electrical Design News, Volume 11, p14). In 1968, RCA made 72-bit CMOS memories for the Air Force (document, photo). Lee Boysel built 256-bit dynamic RAMs at Fairchild in 1968 and 1K dynamic RAMs at Four Phase Systems in 1969 (1970 — MOS Dynamic RAM Competes with Magnetic Core Memory on Price and Boysel presentation). For more information on the history of memory technology, see 1966 — Semiconductor RAMs Serve High-speed Storage Needs and History of Semiconductor Engineering, p215. Another source for memory history is To the Digital Age: Research Labs, Start-up Companies, and the Rise of MOS Technology, p193.

[2] Memory chips started out very expensive, but prices rapidly dropped. Computer Design Volume 9 page 28, 1970, announced a price drop of the 3101 from $99.50 to $40 in small volumes. Electrical Design News Volume 15, 1970 gave the initial price of the 1405 as $13.30 in quantities of 100. Ironically, the Intel 3101 is now a collector's item and costs much more than the original price on eBay — hundreds of dollars for the right package.

[3] The datasheet for the 1405A shift register is available at Intel-vintage.info or Intel's data catalog 1976 (at archive.org).

[4] Many companies made shift register memories. For instance, in 1969 Philco (an electronics manufacturer owned by Ford Motor Company) claimed to have the longest commercially available shift register at 256 bits (Electronic Design, Volume 17, p251). For lots more information on shift register memory, see Don Lancaster's December 1974 Radio-Electronics article, " How it works: IC MOS shift registers.

[5] I obtained the Datapoint display board on eBay from Zuigadrummer, who currently has other Datapoint boards for sale. She was very helpful to me and I recommend her.

[6] The Datapoint 2200's display provided 12 lines of 80 characters. The display memory held 1024 7-bit ASCII characters. A pair of shift registers provided 1024 bits of storage, with 7 pairs in total.

[7] For those who want to know more details of the layout... The resistor symbols are not actually resistors, but clocked precharge transistors that pull the inverter outputs high. A few years later, MOS chips would use depletion transistors instead.

The metal rectangles form connections between the silicon layer and the polysilicon layer. This technique was soon obsoleted by buried contacts which connected the two layers directly without using the metal layer. This made chip layout easier, since the metal layer could be used for interconnections without being interrupted by these connections.

The gray blobs show the undoped silicon, which can be considered non-conductive. The doped silicon is conductive, except where the polysilicon crosses it and forms a transistor. Doped and undoped silicon are hard to distinguish in the die photo, but the boundary between them is visible as a faint black line. The polysilicon is much more visible in the die photo; it is orange, or red when it forms a transistor. The colors are due to the thicknesses of the layers.

[8] A datasheet for the SH0013 clock driver is in the 1973 Fairchild Linear Integrated Circuits Data Catalog, page 6-126. A datasheet for the equivalent MH0013 is in the 1972 National MOS Integrated Circuits databook, page 123.

Intel x86 documentation has more pages than the 6502 has transistors

Microprocessors have become immensely more complex thanks to Moore's Law, but one thing that has been lost is the ability to fully understand them. The 6502 microprocessor was simple enough that its instruction set could almost be memorized. But now processors are so complex that understanding their architecture and instruction set even at a superficial level is a huge task. I've been reverse-engineering parts of the 6502, and with some work you can understand the role of each transistor in the 6502. After studying the x86 instruction set, I started wondering which was bigger: the number of transistors in the 6502 or the number of pages of documentation for the x86.

It turns out that Intel's Intel® 64 and IA-32 Architectures Software Developer Manuals (2011) have 4181 pages in total, while the 6502 has 3510 transistors. There are actually more pages of documentation for the x86 than the number of individual transistors in the 6502.

The above photo shows Intel's IA-32 software developer's manuals from 2004 on top of the 6502 chip's schematic. Since then the manuals have expanded to 7 volumes.

The 6502 has 3510 transistors, or 4528, or 6630, or maybe 9000?

As a slight tangent, it's actually hard to define the transistor count of a chip. The 6502 is usually reported as having 3510 transistors. This comes from the Visual 6502 team, which dissolved a 6502 chip in acid, photographed the die (below), traced every transistor in the image, and built a transistor-level simulator that runs 6502 code (which you really should try). Their number is 3510 transistors.

The 6502 processor chip

One complication is the 6502 is built with NMOS logic which builds gates out of active "enhancement" transistors as well as pull-up "depletion" transistors which basically act as resistors. The count of 3510 is just the enhancement transistors. If you include the 2102 1018 depletion transistors, the total transistor count is 5612 4528.

A second complication is that when manufacturers report the transistor count of chips, they often report "potential" transistors. Chips that include a ROM or PLA will have different numbers of transistors depending on the values stored in the ROM. Since marketing doesn't want to publish different transistor numbers depending on the number of 1 bits and 0 bits programmed into the chip, they often count ROM or PLA sites: places that could have transistors, but might not. By my count, the 6502 decode PLA has 21×131=2751 PLA sites, of which 649 actually have transistors. Adding these 2102 "potential" transistors yields a count of 6630 transistors.

Finally, some sources such as Microsoft Encarta and A History of the Personal Computer state the 6502 contains 9000 transistors, but I don't know how they could have come up with that value.

(The number of pages of Intel documentation is also not constant; the latest 2013 Software Developer Manuals have shrunk to 3251 pages.)

Thus, the x86 has more pages of documentation than the 6502 has transistors, but it depends how you count.

Reverse-engineering the flag circuits in the 8085 processor

Processors all have status flags to keep track of conditions such as a zero value, a carry, or a negative value. Whenever you write a loop or conditional, these flags ultimately are in control. But how are these flags implemented in the chip's silicon? I've reverse-engineered the flag circuits in the 8085 microprocessor and explain what is really going on.

The photograph below is a highly magnified image of the 8085's silicon, showing the relevant parts of the chip. In the upper-left, the arithmetic logic unit (ALU) performs 8-bit arithmetic operations. The status flag circuitry is below the ALU and the flags are connected to the data bus (indicated in blue). To the right of the ALU, the control PLA decodes the instructions into control lines that control the operations of the ALU and flag circuits.

Photograph of the 8085 chip showing the location of the ALU, flags, and registers.

The 8085 has seven status flags.
  • Bit 7 is the sign flag, indicating a negative two's-complement value, which is simply a byte with the top bit set.
  • Bit 6 is the zero flag, indicating a value that is all zeros.
  • Bit 5 is the undocumented K (or X5) flag, indicating either a carry from the 16-bit incrementer/decrementer or the result of a signed comparison. See my article on the undocumented K and V flags.
  • Bit 4 is the auxiliary carry, indicating a carry out of the 4 low-order bits. This is typically used for BCD (binary-coded decimal) arithmetic.
  • Bit 3 is unused and set to 0. Interestingly, a fairly large transistor drives the data bus line to 0 when reading the flags, so this unused flag bit doesn't come for free.
  • Bit 2 is the parity flag, which is set if the result has an even number of 1 bits.
  • Bit 1 is the undocumented signed overflow flag V (details).
  • Bit 0 is the carry flag.

The image below zooms in on the flag silicon, showing individual transistors. The large transistors labeled with the flag name drive the flag value onto the data bus. From the data bus, the flag values control the results of conditional jumps, calls, and returns. The complex circuits above these transistors compute and store the flag values.

The silicon that implements the flags in the 8085 microprocessor.

The schematic below shows the flag circuit that is implemented in the silicon above.

Schematic of the flag storage in the 8085 microprocessor.

Schematic of the flag storage in the 8085 microprocessor.

Each flag bit has a latch and control lines to write a value to the latch. Most flags are updated by the same arithmetic instructions and controlled by the arith_to_flags control line. The carry flag is affected by additional instructions and has its own control line. The undocumented K and V flags are updated in different circumstances and have their own control lines.

The bus_to_flags control loads the flags from the data bus for the POP PSW instruction, while the flags_to_bus control sends the flag values over the data bus for the PUSH PSW instruction or for conditional branches.

The circuitry to compute most flag values is straightforward. The sign flag is set based on bit 7 of the result. The auxiliary carry flag is set on the carry out of bit 3. The K and V flags are set based on the top two bits (details). The zero flag is normally set from the alu_zero signal that indicates all bits are zero.

The zero flag has support for multi-byte zero: at each step it can AND the existing zero flag with the current ALU zero value, so the zero flag will be set if both bytes are zero. This is only used for the (undocumented) DSUB 16-bit subtract instruction. Strangely, this circuit is also activated for the 16-bit DAD instructions, but the result is not stored in the flag.

If you look at the chip photograph at the top of the article, the flags are arranged in apparently-random order, not in their bit order as you might expect. Presumably the layout used is more efficient. Also notice that the carry flag C is off to the right of the ALU. Because of the complexity of the carry logic, which will be discussed next, the circuitry wouldn't fit under the ALU with the rest of the flag logic.

The carry logic

The schematic below shows the circuit for the carry flag. The logic for carry is more complex than for the other flags because carry is used in a variety of ways.

Schematic of the carry circuitry in the 8085 microprocessor.

Schematic of the carry circuitry in the 8085 microprocessor.

The value stored in the carry flag

The top part of the circuit computes carry_result, the value stored in the carry flag. This value has several different meanings depending on the instruction:
  • For arithmetic operations, the carry flag is loaded with the value generated by the ALU. That is, alu_carry_7 (the high-order carry from bit 7 of the ALU) is used. (See Inside the ALU of the 8085 microprocessor for details on how this is computed.)
  • For DAA (decimal adjust accumulator), the carry flag is set if the high-order digit is >= 10. This value is alu_hi_ge_10, which is selected by the daa control line.
  • For CMC (complement carry), the carry flag value is complemented. To compute this, the previous carry flag value c_flag is selected by use_carry_flag and complemented by the xor_carry_result control line.
  • For ARHL/RAR/RRC (rotate right operations), bit 0 of the rotated value goes into the carry. In the circuit, reg_act_0 (the low-order bit in the undocumented ACT (accumulator temp) register) is selected by the alu_shift_right control line.
The xor_carry_result control inverts the carry value in a few cases. For subtraction and comparison, it flips the carry bit to be the borrow bit. For STC (set carry), the xor_carry_result control forces the carry to 1. For AND operations, it forces the carry to 0.

Generating the carry input signal

The middle part of the circuit selects the appropriate carry_in value that is supplied to the ALU.
  • The first option is to set the carry in to either 0 or 1, by using carry_in_0 and optionally xor_carry_in. This is used for most instructions.
  • The next option is to use the current carry flag value as an input for additions or subtractions (allowing multi-byte arithmetic). For subtraction, this is inverted to convert borrow to carry; the xor_carry_in control does this.
  • The final option uses the carry latch to temporarily hold the carry for the undocumented LDHI and LDSI instructions. These instructions add a constant to a 16-bit register pair, so they need to add the carry from of the low-order sum to the high-order byte. The carry latch temporarily holds the carry, and this value is selected by the use_latched_carry control line. You might wonder why not just use the normal carry flag; the LDHI and LDSI instructions are designed to leave the carry flag unchanged, so they need somewhere else to temporarily store the carry. The surprising conclusion that Intel deliberately included circuitry in the 8085 specifically to support these undocumented instructions, and then decided not to support these instructions. (In contrast, the 6502's unsupported instructions are just random consequences of unsupported opcodes.)

    Generating the shift_right input signal

    Each bit of the ALU has a shift right input. For most of the bits, the input comes from the bit to the left, but the high-order bit uses different inputs depending on the instruction. The bottom circuit in the schematic below generates the shift right input for the ALU. This circuit has two simple options.
    • Normally the carry flag is fed into shift_right_in. For the ARHL and RAR instructions, this causes the carry flag to go into the high-order bit.
    • For the RRC and RLC instructions (rotate A left/right), the rotate_carry control selects bit 0 as the shift right input.

    Conclusions

    By reverse-engineering the 8085, we can see how the flag circuits in the 8085 actually works at the gate and silicon level. One interesting feature is the circuitry to implement undocumented instructions and flags. Another interesting feature is the complexity of the carry flag compared to the other flags.

    This information is based on the 8085 reverse-engineering done by the visual 6502 team. This team dissolves chips in acid to remove the packaging and then takes many close-up photographs of the die inside. Pavel Zima converted these photographs into mask layer images, generated a transistor net from the layers, and wrote a transistor-level 8085 simulator.

    Footnotes on rotate

    I recommend you skip this section, but there are few confusing things about the rotate logic that I wanted to write down.

    For some reason the rotate operations are named very strangely in the 8080 and 8085. RRC is the "rotate accumulator right" instruction and RAR is the "rotate accumulator right through carry" instruction. Based on the abbreviations, the names seem reversed. The left rotates RLC and RAL are similar. The Z-80 processor has a similar RRC instruction, but calls it "rotate right circular", making the abbreviation slightly less nonsensical.

    Bit 0 of ACT is fed into shift_right_in for both RRC and RLC. However, this input is just ignored for RLC since the rotation is the other direction, so I assume this is just a result of the control logic treating RRC and RLC the same.)

    To reduce the control circuitry, the rotate_carry and use_latched_carry control lines are actually the same control line since the instructions that use them don't conflict. In other words, there is just one control line, but it has two distinct functions.

  • Silicon reverse engineering: The 8085's undocumented flags

    The 8085 microprocessor has two undocumented status flags: V and K. These flags can be reverse-engineered by looking at the silicon of the chip, and their function turns out to be different from previous explanations. In addition, the implementation of these flags shows that they were deliberately implemented, which raises the question of why there were not documented or supported by Intel. Finally, examining how these flag circuits were implemented in silicon provides an interesting look at how microprocessors are physically implemented.

    Like most microprocessors, the 8085 has a flag register that holds status information on the results of an operation. The flag register is 8 bits: bit 0 holds the carry flag, bit 2 holds the parity, bit 3 is always 0, bit 4 holds the half-carry, bit 6 holds the zero status, and bit 7 holds the sign. But what about the missing bits: 1 and 5?

    Back in 1979, users of the 8085 determined that these flag bits had real functions.[1] Bit 1 is a signed-number overflow flag, called V, indicating that the result of a signed add or subtract won't fit in a byte.[2] Bit 5 of the flag is poorly understood and has been given the names K, X5, or UI. For an increment/decrement operation it simply indicates 16-bit overflow or underflow. But it has a totally diffrent value for arithmetic operations. The flag has been described[1][3] as:

    K =  O1·O2 + O1·R + O2·R, where:
    O1 = sign of operand 1
    O2 = sign of operand 2
    R = sign of result
    For subtraction and comparisons, replace O2 with complement of O2.
    
    As I will show, that published description is mistaken. The K flag actually is the V flag exclusive-ored with the sign of the result. And the purpose of the K flag is to compare signed numbers.

    The circuit for the K and V flags

    The following schematic shows the reverse-engineered circuit for the K and V flags in the 8085. The V flag is simply the exclusive-or of the carry into the top bit and the carry out of the top bit. This is a standard formula for computing overflow[2] for signed addition and subtraction. (The 6502 computes the same overflow value through different logic.) The V flag has values for other arithmetic operations, but the values aren't useful.[4] A latch stores the value of the V flag. The computed V value is stored in the latch under the control of a store_v_flag control signal. Alternatively, the flag value can be read off the bus and stored in the latch under the control of the bus_to_flags control signal; this is how the POP PSW instruction, which pops the flags from the stack, is implemented. Finally, a tri-state superbuffer (the large triangle) writes the flag value to the bus when needed.

    The K flag circuitry is on the right. The first function of the K flag is overflow/underflow for an INX/DEX instruction. This is implemented simply: the carry_to_k_flag control line sets the K flag according to the carry from the incrementer/decrementer. The next function of K flag is reading from the databus for the POP PSW instruction, which is the same as for the V flag. The final function of the K flag is the result of a signed comparison. The K flag is the exclusive-or of the V flag and the sign bit of the result. For subtraction and comparison, the K flag is 1 if the second value is larger than the first.[5] The K flag is set for other arithmetic operations, but doesn't have a useful value except for signed comparison and subtraction.[4]

    The circuit in the 8085 for the undocumented V and K flags. The flags are generated from the carries and results from the ALU. The K flag can also be set by the carry from the incrementer/decrementer.

    The circuit in the 8085 for the undocumented V and K flags. The flags are generated from the carries and results from the ALU. The K flag can also be set by the carry from the incrementer/decrementer.
    One mystery was the purpose of the K flag: "It does not resemble any normal flag bit."[1] Its use for increment and decrement is clear, but for arithmetic operations why would you want the exclusive-or of the overflow and sign? It turns out the the K flag is useful for signed comparisons. If you're comparing two signed values, the first is smaller if the exclusive-or of the sign and overflow is 1.[6] This is exactly what the K flag computes.

    From the circuit above, it is clear that the V and K flags were deliberately added to the chip. (This is in contrast to the 6502, where undocumented opcodes have arbitrary results due to how the circuitry just happens to work for unexpected inputs.[7]) Why would Intel add the above circuitry to the chip and then not document or support it? My theory is that Intel decided they didn't want to support K or (8-bit) V flags in the 8086, so in order to make the 8086 source-compatible with the 8085, they dropped those flags from the 8085 documentation, but the circuitry remained in the chip.

    The silicon

    The 8085 microprocessor showing the data bus, ALU, flag logic, registers, and incrementer/decrementer.
    The 8085 microprocessor showing the data bus, ALU, flag logic, registers, and incrementer/decrementer.
    The remainder of this article will show how the V and K flag circuits work, diving all the way down to the silicon circuits. The above image of the 8085 chip shows the layout of the chip and the components that are important to the discussion. In the upper left of the chip is the ALU (arithmetic-logic unit), where computations happen (details). The data bus is the main interconnect in the chip, connects the data pins (upper left), the ALU, the data registers, the flag register, and the instruction decoding (upper right). In the lower left of the chip is the 16-bit register file. Underneath the register file is a 16-bit increment/decrement circuit which handles incrementing the program counter, as well as supporting 16-bint increment and decrement instructions. The increment/decrement circuit has a carry-out in the lower right corner - this will be important for the discussion of the K flag. For some reason, the ALU has the low-order bit on the right, while the registers have the low-order bit on the left.

    The flag logic circuitry sits underneath the ALU, with high-current drivers right on top of the data bus. The flags are arranged in apparently-random order with bit 7 (sign) on the left and bit 6 (zero) on the right. Because the carry logic is much more complicated (handling not only arithmetic operations but shifts and rotates, carry complement, and decimal adjust), the carry logic is stuck off to the right of the ALU where there was enough room.

    Zooming in

    Next we will zoom in on the V flag circuitry, labeled V1 above. Looking at the die under a microscope shows the metal layer of the chip, consisting of mostly-horizontal metal interconnects, which are the white lines below. The bottom part of the chip has the 8-bit data bus. Other wires are the VCC power supply, ground, and a variety of signals. While modern processors can have ten or more metal layers, the 8085 only has a single layer. Some of the circuitry underneath the metal is visible.

    The metal layer of the 8085 microprocessor, zoomed in on the V flag circuit.

    The metal layer of the 8085 microprocessor, zoomed in on the V flag circuit.
    If the metal is removed from the chip, the silicon layer becomes visible. The blotchy green/purple is plain silicon. The pink regions are N-type doped silicon. The grayish regions are polysilicon, which can be considered as simply conductive wires. When polysilicon crosses doped silicon, it forms a transistor, which appears light green in this image. Note that transistors form a fairly small portion of the chip; there is a lot more connection and wiring than actual transistors. The small squares are vias, connections to the metal layer.

    The V flag circuit in the 8085 CPU. This is the silicon/polysilicon after the metal layer has been removed. The data bus is not visible as it is in the metal layer, but it is in the lower third of the image. The rectangles at the bottom connect the data bus to the registers.

    The V flag circuit in the 8085 CPU. This is the silicon/polysilicon after the metal layer has been removed. The data bus is not visible as it is in the metal layer, but it is in the lower third of the image. The rectangles at the bottom connect the data bus to the registers.

    MOSFET transistors

    For this discussion, a MOSFET can be considered simply a switch that closes if the gate input is 1 and opens if the gate input is 0. A MOSFET transistor is implemented by separating two diffusion regions, and putting a polysilicon wire over the gate. An insulating layer prevents any current from flowing between the gate and the rest of the transistor. In the following diagram, the n+ diffusion regions are pink, the polysilicon gate conductor is dull green, and the insulating oxide layer is turquoise.

    Structure of a MOSFET transistor. The n+ diffusion regions are pink, the polysilicon gate conductor is dull green, and the insulating oxide layer is turquoise.

    NOR gate

    The NOR gate is a fundamental building block in the 8085, since it is a very simple gate that can form more complex logic. A NOR gate is implemented through two transistors and a pullup transistor. If either input (or both) is 1, the corresponding transistor connects the output to ground. Otherwise, the transistors are open, and the pullup pulls the output high. The pullup is shown as a resistor in the schematic, but it is actually a type of transistor called a depletion-mode transistor for better performance.

    A NOR gate is implemented through two transistors and a pullup transistor. If either (or both) input is 1, the corresponding transistor connects the output to ground. Otherwise, the transistors are open, and the pullup pulls the output high.

    By zooming in to a single NOR gate in the 8085, we can see how the gate is actually implemented. One surprise is that the circuit is almost all wiring; the transistors form a very small part of the circuit. The two transistors are connected to ground on the left, and tied together on the right. The pullup transistor is much larger than the other transistors for technical reasons.[8]

    To understand the circuit, trace the path from ground to each transistor, across the gate, and to the output. In this way you can see there are two paths from ground to the output, and if either input is 1 the output will be 0.

    The layout of the gate is intended to be as efficient as possible, given the constraints of where the power (VCC), ground, and other connections are, yielding a layout that looks a bit unusual. The power, ground, and input signals are all in the metal layer above (not shown here), and are connected to this circuit through vias between the metal and the silicon below.

    A NOR gate in the 8085 microprocessor, showing the components.If either input is high, the associated transistor will connect the output to ground. Otherwise the pullup transistor will pull the output high.

    A NOR gate in the 8085 microprocessor, showing the components.If either input is high, the associated transistor will connect the output to ground. Otherwise the pullup transistor will pull the output high.

    Exclusive-or gates

    The exclusive-or circuit (which outputs a 1 if exactly one input is 1) is a key component of the flag circuitry, and illustrates how more complex logic can be formed out of simpler gates. The schematic below shows how the exclusive-or is built from a NOR gate and an AND-NOR gate; it is straightforward to verify that if both inputs are 0 or both inputs are 1, the output is will be 0.

    You may wonder why the 8085 uses so many "strange" gates such as a combined AND-NOR, instead of "normal" gates like AND. The transistor-level schematic shows that an AND-NOR gate can actually be implemented very simply with MOSFETs, in fact simpler than a plain AND gate. The two rightmost transistors form the "AND" - if they both have 1 inputs, they connect the output to ground. The transistor to the left forms the other part of the NOR - if it has a 1 input, it pulls the output to ground.

    The exclusive-or circuit used in the 8085: gate-level and transistor-level.

    The following diagram shows an XOR circuit in the 8085 that matches the schematic above. (This is the XOR gate that generates the K flag.) On the left is the NOR gate discussed above, and on the right is the AND-NOR circuit, both outlined with a dotted line. As before, the circuit is mostly wiring, with the transistors forming a small part of the circuit (the green regions between pink diffusion regions).

    An XOR gate in the 8085 microprocessor, formed from a NOR gate and an AND-NOR gate. If both inputs are 0, the NOR gate output will be 1, and the NOR transistor will pull the output to 0. If both inputs are 1, the AND transistors will pull the output to 0. Otherwise the pullup transistor will pull the output 1.

    An XOR gate in the 8085 microprocessor, formed from a NOR gate and an AND-NOR gate. If both inputs are 0, the NOR gate output will be 1, and the NOR transistor will pull the output to 0. If both inputs are 1, the AND transistors will pull the output to 0. Otherwise the pullup transistor will pull the output 1.

    The flag latch

    Each flag bit is stored in a simple latch circuit made up of two inverters. To store a 1, the inverter on the right outputs a 0, which is fed into the inverter on the left, which outputs a 1, which is fed back to the inverter on the right. A zero is stored in a similar (but opposite) manner. When the clock input is low, the pass transistor opens, breaking the feedback loop, and new data can be written into the latch. The complemented output (/out) is taken from the inverter.

    You might wonder why the latch doesn't lose its data whenever the clock goes low. There's an interesting trick here called dynamic logic. Because the gate of a MOSFET consists of an insulating layer it has very high resistance. Thus, any electrical charge on the gate will remain there for some time[9] when the pass transistor opens. When the pass transistor closes, the charge is refreshed.

    The latch used in the 8085 to store a flag value. The latch uses two inverters to store the data. When the clock is low, a new value can be written to the latch.

    The latch used in the 8085 to store a flag value. The latch uses two inverters to store the data. When the clock is low, a new value can be written to the latch.

    The following part of the 8085 chip shows the implementation of the latch for the V flag. The circuit closely matches the schematic above. The two inverters are outlined with dotted lines. The red arrows show the flow of data through the circuit. As before, the wiring and pullup transistors take up most of the silicon real estate.

    Each flag in the 8085 uses a two-inverter latch to store the flag. This shows the latch for the undocumented V flag. The red arrows show the flow of data.

    Each flag in the 8085 uses a two-inverter latch to store the flag. This shows the latch for the undocumented V flag. The red arrows show the flow of data.

    Driving the data bus with a superbuffer

    Another interesting feature of the flag circuit is the "superbuffer". Most transistors in the 8085 only send a signal a short distance. However, to send a signal on the data bus across the whole chip takes a lot more power, so a superbuffer is used. In the superbuffer, one transistor is driven to pull the output low, while a second transistor is driven to pull the output high. (This is in contrast to a regular gate, which uses a depletion-mode pullup transistor to pull the output high.) In addition, these transistors are considerably larger, to provide more current.[8] These two transistors are shown at the bottom the schematic below.

    The other feature of this superbuffer is that it is tri-state. In addition to a 0 or 1 output, it has a third state, which basically consists of providing no output. This way, the flags do not affect the data bus except when desired. In the schematic, it can be seen that if the control input is 1, both NOR gates will output 0, and both transistors will do nothing.

    The superbuffer used in the 8085 to drive the data bus.

    The superbuffer used in the 8085 to drive the data bus.

    The following diagram shows the two drive transistors, as well as the line used to read the flag from the data bus. (The NOR gates are not shown.) Note the size of these transistors compared to transistors seen earlier. Each flag bit requires a superbuffer such as this. Even flag bit 3, which is always 0, requires a large transistor to drive the 0 onto the bus - it's surprising that a do-nothing flag still takes up a fair bit of silicon.

    Each flag in the 8085 uses a superbuffer to drive the value onto the data bus. This figure shows the two large transistors that drive the V flag onto bit 1 of the data bus.

    Each flag in the 8085 uses a superbuffer to drive the value onto the data bus. This figure shows the two large transistors that drive the V flag onto bit 1 of the data bus.

    Putting it all together

    The above discussion has shown the details of the XOR gate that computes the K flag, and the latch and superbuffer for the V flag. The following diagram shows how these pieces fit into the overall circuitry. The latch and driver for the K flag are outside this image, to the right. The circuits below are tied together by the metal layer, which isn't shown. Compare this diagram with the schematic at the top of the article to see how the components are implemented. The two XOR circuits look totally different, since their layouts have been optimized to fit with the signals they need.

    The 8085 circuits to implement the undocumented V and K flags. The ALU provides /carry6, /carry7, and result7. The XOR circuit on the left generates V, and the XOR circuit in the middle generates K. On the right are the latch for the V flag, and the superbuffer that outputs the flag to the data bus. The K flag latch and superbuffer are to the right, not shown.

    The 8085 circuits to implement the undocumented V and K flags. The ALU provides /carry6, /carry7, and result7. The XOR circuit on the left generates V, and the XOR circuit in the middle generates K. On the right are the latch for the V flag, and the superbuffer that outputs the flag to the data bus. The K flag latch and superbuffer are to the right, not shown.
    By looking at the silicon chip carefully, the transistors, gates, and complex circuits start to make sense. It's amazing to think that the complex computers we use are built out of these simple components. Of course, processors now are way more complex than the 8085, with billions of transistors instead of thousands, but the basic principles are still the same.

    If you found this discussion interesting, check out my earlier analysis of the 6502's overflow flag and the 8085's ALU. You may also be interested in the book The Elements of Computing Systems, which describes how to build a computer starting with Boolean logic.

    Credits

    The chip images are from visual6502.org. The visual6502 team did the hard work of dissolving chips in acid to remove the packaging and then taking many close-up photographs of the die inside. Pavel Zima converted these photographs into mask layer images, a transistor net, and an 8085 simulator.

    Notes and references

    [1] The undocumented instructions and flags of the 8085 were discovered by Wolfgang Sehnhardt and Villy M. Sorensen in the process of writing an 8085 assembler, and were written up in the article Unspecified 8085 op codes enhance programming, Engineer's Notebook, "Electronics" magazine, Jan 18, 1979 p 144-145.

    [2] See my article The 6502 overflow flag explained mathematically for details on overflow. There are multiple ways of computing overflow, and the 6502 uses a different technique.

    [3] Tundra Semiconductor sold the CA80C85B, a CMOS version of the 8085. Interestingly, the undocumented opcodes and flags are described in the datasheet for this part: CA80C85B datasheet, 8000-series components.

    The interesting thing about the Tundra datasheet is the descriptions of the "new" flags and instructions are copied almost exactly from Dehnhardt's article except for the introduction of errors, missing parentheses, and renaming the K flag as UI. In addition, as I described earlier, the published K/UI flag formula doesn't always work. Thus, it appears that despite manufacturing the chip, Tundra didn't actually know how these circuits worked.

    [4] The V flag makes sense for signed addition and subtraction, and the K flag makes sense for signed subtraction and comparison. Many other operations affect these flags, but the flags may not have any useful meaning.

    The V flag is 0 for RRC, RAR, AND, OR, and XOR operations, since these operations have constant carry values inside the ALU (details). The RLC and RAL operations add the accumulator to itself, so they can be treated the same as addition: V is set if the signed result is too big for a byte. The V flag for DAA can also be understood in terms of the underlying addition: V will only be set if the top digit goes from 7 to 8. However, since BCD digits are unsigned, V has no useful meaning with DAA. DAD is an interesting case, since the V flag indicates 16-bit signed overflow; it is actually computed from the result of the high-order addition. For INR, the only overflow case is going from 0x7f to 0x80 (127 to -128); note that going from 0xff to 0x00 corresponds to -1 to 0, which is not signed overflow even though it is unsigned overflow. Likewise, DCR sets the V flag going from hex 80 to 7f (-128 to 127); likewise 0x00 to 0xff is not signed overflow.

    The K flag has a few special cases. For AND, OR, and XOR, the K flag is the same as the sign, since the V flag is 0. Note that the K flag is computed entirely differently for INR/DCR compared to INX/DCX. For INR and DCR, the K flag is S^V, which almost always is S. The K flag is set for DAA if S^V is true, which doesn't have any useful meaning since BCD values are unsigned.

    The published formula for the K flag gives the wrong value for XOR if both arguments are negative.

    [5] The following table illustrates the 8 possible cases when comparing signed numbers A and B. The inputs are the top bit of A, the top bit of B, and the carry from bit 6 when subtracting B from A. The outputs are the carry, borrow (complement of carry), sign, overflow, and K flags. An example is given for each row. Note that the K flag is set if A is less than B when treated as signed numbers.

    InputsOutputsExample
    A7B7C6CBSVKHexSigned comparison
    010010000x50 - 0xf0 = 0x6080 - -16 = 96
    011011100x50 - 0xb0 = 0xa080 - -80 = -96
    000011010x50 - 0x70 = 0xe080 - 112 = -32
    001100000x50 - 0x30 = 0x12080 - 48 = 32
    110011010xd0 - 0xf0 = 0xe0-48 - -16 = -32
    111100000xd0 - 0xb0 = 0x120-48 - -80 = 32
    100100110xd0 - 0x70 = 0x160-48 - 112 = 96
    101101010xd0 - 0x30 = 0x1a0-48 - 48 = -96

    [6] A detailed explanation of signed comparisons is given in Beyond 8-bit Unsigned Comparisons by Bruce Clark, section 5. While this article is in the context of the 6502, the discussion applies equally to the 8085.

    [7] The illegal opcodes in the 6502 are discussed in detail in How MOS 6502 Illegal Opcodes really work. In the 6502, the operations performed by illegal opcodes are unintended, just chance based on what the chip logic happens to do with unexpected inputs. In contrast, the undocumented opcodes in the 8085, like the undocumented flags, are deliberately implemented.

    [8] The key parameter in the performance of a MOSFET transistor is the width to length ratio of the gate. Oversimplifying slightly, the current provided by the transistors is proportional to this ratio. (Width is the width of the source or drain, and length is the length across the gate from source to drain.) For an inverter, the W/L ratio of the pullup should be approximately 1/4 the W/L ratio of the input transistor for best performance. (See Introduction to VLSI Systems, Mead, Conway, p 8.) The result is that pullup transistors are big and blocky compared to pulldown transistors. Another consequence is that high-current transistors in a superbuffer have a very wide gate. The 8085 register file has some transistors where the W/L ratios are carefully configured so one transistor will "win" over the other if both are on at the same time. (This is why the 8085 simulator is more complex than the 6502 simulator, needing to take transistor sizes into account.)

    [9] One effect of using pass-transistor dynamic buffers is that if the clock speed is too small, the charge will eventually drain away causing data loss. As a result the 8085 has a minimum clock speed of 500 kHz. Likewise, the 6502 has a minimum clock speed. The Z-80 in contrast is designed with static logic, so it has no minimum clock speed - the clock can be stepped as slowly as desired.