Ken Shirriff's blog: 2014

Inside the Intel 1405: die photos of a shift register memory from 1970

In 1970, MOS memory chips were just becoming popular, but were still very expensive. Intel had released their first product the previous year, the 3101 RAM chip with 64 bits of storage.[1] For this chip (with enough storage to hold the word "aardvark") you'd pay $99.50.[2] To avoid these astronomical prices, some computers used the cheaper alternative of shift register memory. Intel's 1405 shift register provided 512 bits of storage — 8 times as much as their RAM chip — at a significantly lower price.[3][4] In a shift register memory, the bits go around and around in a circle, with one bit available at each step. The big disadvantage is that you need to wait for the bit you want to come around, which can take half a millisecond.

One computer that used shift register memory is the Datapoint 2200 computer. (This is a very interesting computer — the 8008 was created for it following the architecture specified by Datapoint — but that's a topic for another blog post.) In the Datapoint 2200, each memory board had 32 shift registers, providing 2K of storage. The processor board used a counter to keep track of the shift register position, and would stop processing until the right bits were available. (Kind of like a cache miss in modern processors.)

I got a display board from a Datapoint 2200[5], which uses Intel 1405 shift registers for the display storage. This board uses 14 shift registers and holds 896 bytes.[6] Shift-register memory was convenient for a video display board, since the circuitry needed to access each character in sequence to display it.

Intel 1405 shift registers provide memory storage for a Datapoint 2200 display.

I opened up one of the shift register chips with a hacksaw and looked at it under a metallurgical microscope to get some die photos. Since the shift registers are in metal cans, they are easy to open up, unlike the plastic packages used by most chips. The following photo shows the die. The chip is fairly simple, with most of the chip taken up with the shift register cells. Around the outside of the chip you can see the nine pads with black wires connected.

The die shows some of the reasons that shift registers were cheaper than RAM chips. Unlike a RAM chip, the chip does not need to form a regular grid — the rows in the middle are shorter than the others because of the pin on the right. In addition, the chip doesn't need any address decoding logic. Thus, more bits can be fit onto a chip. Because there are no address lines, the chip has fewer pins than a RAM chip and can fit into a smaller package.

Die shot of the Intel 1405 MOS 512-bit shift register memory.

The diagram below shows the flow of bits through the shift register, in yellow. Bits enter through the input pin at the bottom. They zig-zag through the 20 rows of the shift register and exit at the top through the output pin. Bits recirculate back to the input along the left. The clock lines are at the right and are connected to each cell of the shift register.

Labeled die shot of the Intel 1405 MOS 512-bit shift register memory.

In the lower left is the circuit to control input to the shift register, which consists of a few gates. Either a new bit can be written to the shift register each cycle, or the exiting bit can recirculate and re-enter the shift register. The photo below zooms in on this circuit. The four vertical wires at the left are the chip select 2, chip select 1, recirculated bit, and Vdd.

Input circuit of the 1405 shift register.

The image below shows the circuit to control the output from the shift register, which is in the upper left of the chip. The chip has two chip select inputs, which makes it convenient to arrange the shift registers in a grid with one set of lines enabling a row and a perpendicular set of lines enabling a column.

Output circuit of the 1405 shift register.

The image below shows the shift register cells at high magnification. On the left is the actual die photo, while the right labels the components of the die. Bits flow to the right through the bottom half of the picture, and then back to the left in the top half.

The large U shapes at the bottom are transistors (red T's) that form inverters (drawn in yellow). Between each inverter is a pass transistor that controls the flow of bits from inverter to inverter. The first T is connected to clock 1, allowing the bit to flow from the first inverter to the second when clock 1 is activated. The next T is connected to clock 2, passing the bit along another step on clock 2. As the clock lines are triggered in sequence, the bits pass step-by-step through the shift register.

The chip uses silicon-gate technology. This was an important innovation in chip design that was developed in 1968 at Fairchild by Federico Faggin (who also developed the Z80), and became a core technology at Intel. With this technology, polysilicon is used as the gates for transistors instead of aluminum metal, as previous MOS integrated circuits used. For various reasons, this made chips much faster and easier to manufacture.

In the picture below, polysilicon is indicated in blue. Where it overlaps the underlying doped silicon, a transistor is formed (red T). The horizontal gray lines are the metal layer, with the voltage supplies and the clocks. The circles show connections between the different layers.[7]

Close up of the cells in an Intel 1405 512-bit shift register memory. The actual photo is on the left, and the circuit is drawn on the right.

The clock driver

The display circuit board below has 14 shift registers in round metal cans. But there's a huge metal can at the right — what is this IC? That turns out to be the driver chip that provides the clock signals for the shift registers, and it's pretty interesting inside.

The shift registers require two alternating clock signals to shift. These signals must not overlap, or else the data will get messed up. In addition, the shift registers require up to 30 volts in the clock, due to their old technology. Finally, a lot of current (500mA) is needed in the clock signals to drive all the chips. To meet these requirements, a special clock driver chip is used to generate the clock signals. This is the Fairchild SH0013-C "Two phase MOS clock driver".[8]

1405 shift registers provide 896 bytes of storage on a Datapoint 2200 display card.

I expected to find an IC with big transistors inside the clock driver chip, but opening it up revealed something entirely different. Inside is a hybrid integrated circuit made up of eight separate silicon dies mounted on a tiny circuit board and connected with gold traces and gold wires. In addition, there are thick film resistors printed onto the board — these are the black "E" shapes in the picture below.

Interactive viewer

The image and schematic[8] below are an interactive exploration of the SH0013 clock driver. Click a component to see its location on the board and in the schematic highlighted. The box below will give an explanation of the component.

Click image below for details.

Conclusion

While using shift registers as memory seems bizarre now, it was a cost-effective way to implement storage in 1970. Looking inside the shift register chips shows how they work and how they could be implemented more cheaply than RAM. Providing the high-power clock signals required a special driver chip, which turns out to be a hybrid circuit with tiny semiconductors and resistors on a circuit board in a large metal IC package.

Notes and references

[1] Intel didn't invent the memory chip, of course. There were many companies making memory chips in the 1960s. For instance, Texas Instruments announced the SN5481 bipolar memory chip in 1966 (Electronics, V39 #1, p151) and Transitron had the TMC 3162 and 3164 16-bit RAM (Electrical Design News, Volume 11, p14). In 1968, RCA made 72-bit CMOS memories for the Air Force (document, photo). Lee Boysel built 256-bit dynamic RAMs at Fairchild in 1968 and 1K dynamic RAMs at Four Phase Systems in 1969 (1970 — MOS Dynamic RAM Competes with Magnetic Core Memory on Price and Boysel presentation). For more information on the history of memory technology, see 1966 — Semiconductor RAMs Serve High-speed Storage Needs and History of Semiconductor Engineering, p215. Another source for memory history is To the Digital Age: Research Labs, Start-up Companies, and the Rise of MOS Technology, p193.

[2] Memory chips started out very expensive, but prices rapidly dropped. Computer Design Volume 9 page 28, 1970, announced a price drop of the 3101 from $99.50 to $40 in small volumes. Electrical Design News Volume 15, 1970 gave the initial price of the 1405 as $13.30 in quantities of 100. Ironically, the Intel 3101 is now a collector's item and costs much more than the original price on eBay — hundreds of dollars for the right package.

[3] The datasheet for the 1405A shift register is available at Intel-vintage.info or Intel's data catalog 1976 (at archive.org).

[4] Many companies made shift register memories. For instance, in 1969 Philco (an electronics manufacturer owned by Ford Motor Company) claimed to have the longest commercially available shift register at 256 bits (Electronic Design, Volume 17, p251). For lots more information on shift register memory, see Don Lancaster's December 1974 Radio-Electronics article, " How it works: IC MOS shift registers.

[5] I obtained the Datapoint display board on eBay from Zuigadrummer, who currently has other Datapoint boards for sale. She was very helpful to me and I recommend her.

[6] The Datapoint 2200's display provided 12 lines of 80 characters. The display memory held 1024 7-bit ASCII characters. A pair of shift registers provided 1024 bits of storage, with 7 pairs in total.

[7] For those who want to know more details of the layout... The resistor symbols are not actually resistors, but clocked precharge transistors that pull the inverter outputs high. A few years later, MOS chips would use depletion transistors instead.

The metal rectangles form connections between the silicon layer and the polysilicon layer. This technique was soon obsoleted by buried contacts which connected the two layers directly without using the metal layer. This made chip layout easier, since the metal layer could be used for interconnections without being interrupted by these connections.

The gray blobs show the undoped silicon, which can be considered non-conductive. The doped silicon is conductive, except where the polysilicon crosses it and forms a transistor. Doped and undoped silicon are hard to distinguish in the die photo, but the boundary between them is visible as a faint black line. The polysilicon is much more visible in the die photo; it is orange, or red when it forms a transistor. The colors are due to the thicknesses of the layers.

[8] A datasheet for the SH0013 clock driver is in the 1973 Fairchild Linear Integrated Circuits Data Catalog, page 6-126. A datasheet for the equivalent MH0013 is in the 1972 National MOS Integrated Circuits databook, page 123.

Down to the silicon: how the Z80's registers are implemented

The 8-bit Z80 microprocessor is famed for use in many early personal computers such the Osborne 1, TRS-80, and Sinclair ZX Spectrum. The Z80 has an innovative design for its internal registers, with two sets of general-purpose registers. The diagram below shows a highly-magnified photo of the Z80 chip, from the Visual 6502 team. Zooming in on the register file at the right, the transistors that make up the registers are visible (with difficulty). Each register is in a column, with the low bit on top and high bit on the bottom. This article explains the details of the Z80's register structure: its architecture, how it works, and exactly how it is implemented, based on my reverse-engineering of the chip.

The die of the Z80 microprocessor, zooming in on the register file. Each register is stored vertically, with bit 0 and the top and bit 15 at the bottom. There are two sets of AF/BC/DE/HL registers. At the right, drivers connect the registers to the data buses. At the top, circuitry selects a register.

The Z80's architecture is often described with the diagram below, which shows the programmer's model of the chip.[1][2] But as we will see, the Z80's actual register and bus organization differs from this diagram in many ways. For instance, the data bus on the real chip has multiple segments. The diagram shows a separate incrementer for the refresh register (IR), an adder for IX and IY offsets, and a W'Z' register but those don't exist on the real chip. The Z80 shows that the physical implementation of a chip may be very different from how it appears logically.

Programmer's model of Z80 architecture from Wikipedia. Diagram by Appaloosa CC BY-SA 3.0. Original by Rodnay Zaks.

Register overview and layout

The diagram below shows how the Z80's registers are physically arranged on the chip, matching the die photo above. The register file consists of 14 pairs of 8-bit registers. In many cases, a pair of 8-bit registers is treated as a single 16-bit register. The bits are ordered from 0 at the top to 15 at the bottom, so the low-order byte is on the top and the high-order byte is on the bottom.

At the right of the register file are the 8-bit accumulator (A) and 8-bit flag register (F). The accumulator holds the result of arithmetic and logic operations, so it is a very important register. The flag register holds condition flags, for instance indicating a zero value, negative value, overflow value or other conditions.

Note that there are two A registers and two F registers, along with two of BC, DE, and HL. The Z80 is described as having a main register set (AF, BC, DE, and HL) and an alternate register set (A'F', B'C', D'E', and H'L'), and this is how the architecture diagram earlier is drawn. It turns out, though, that this is not how the Z80 is actually implemented. There isn't a main register set and an alternate register set. Instead, there are two of each register and either one can be the main or alternate. This will be explained in more detail below.

Structure of the Z-80's register file as implemented on the chip. The address is 16 bits wide, while the data buses are 8 bits wide. Gray lines show switches between bus segments.

To the left of the AF registers are the two general-purpose BC registers. These can be used as 8-bit registers (B or C), or a 16-bit register (BC). Next to them are the similar DE and HL registers. The HL register is often used to reference a location in memory; H holds the high byte of the address, and L holds the low byte. This register structure is based on the earlier 8080 microprocessor. (As will be explained later, DE and HL can swap roles, so these registers should really be labeled H/D and L/E.)

Next to the left are the 16-bit IX and IY index registers. These are used to point to the start of a region in memory, such as a table of data. The 16-bit stack pointer SP is to the left of the index registers. The stack pointer indicates the top of the stack in memory. Data is pushed and popped from the stack, for instance in subroutine calls. To the left of the stack pointer are the 8-bit W and Z registers. As will be discussed below, these are internal registers used for temporary storage and are invisible to the programmer.

Separated from the previous registers is the special-purpose memory refresh register R, which simplifies the hardware when dynamic memory is used.[3] The interrupt page address register I is below R, and is used for interrupt handling. (It provides the high-order byte of an interrupt handler address.)

Finally, at the left is the 16-bit PC (Program Counter), which steps through memory to fetch instructions. Since it is 16 bits, the Z80 can address 64K of memory. Its position next to the incrementer/decrementer is important and will be discussed below.

The Z80's register buses

An important part of the Z80's architecture is how the registers are connected to other parts of the system by various buses. The Z80 is described as having a 16-bit address bus and an 8-bit data bus, but the implementation is more complicated.[3][4] The point of this complexity is to permit multiple register activities as the same time, so the chip can execute faster.

The PC and IR registers are separated from the rest of the registers. As the diagram above shows, these registers are connected to the other registers through a 16-bit bus (thick black line). However, this bus can be connected or disconnected as needed (by pass transistors indicated by the vertical gray line). When disconnected, the PC and R registers can be updated while registers on the right are in use.

The internal register bus connects the PC and IR registers to an incrementer/decrementer/latch circuit. It has multiple uses, but the main purpose is to step the PC from one instruction to the next, and to increment the R register to refresh memory. The resulting address goes to the address pins via the address bus (magenta). I describe the incrementer/decrementer/latch in detail here.

At the right, separate 8-bit data buses connect to the low-order and high-order registers. These two buses can be connected or disconnected as needed. The lower bus (orange) provides access to the ALU (arithmetic logic unit). The upper bus (green) connects to another data bus (red) that accesses the data pins and instruction decoder.

Photo of the Z80 die. The address bus is indicated in purple. The data bus segments are in red, green, and orange.

Specifying registers in the opcodes

The Z80 uses 8-bit opcodes to specify its instructions, and these instructions are carefully designed to efficiently specify which registers to use. Register instructions normally use three bits to specify the register used: 000=B, 001=C, 010=D, 011=E, 100=H, 101=L, 110=indirect through HL, 111=A.[5] For instance, the ADD instructions have the 8-bit binary values 10000rrr, where the rrr bits specify the register to use as above. Note that in this pattern the two high-order bits specify the register pair, while the low order bit specifies which half of the pair to use; for example 00x is BC, 000 is B, and 001 is C. For instructions operating on a register pair (such as 16-bit increment INC), the opcode uses just the two bits to specify the pair.

By using this structure for opcodes, the instruction decoding logic is simplified since the same circuitry can be reused to select a register or register pair for many different instructions. Instruction decode circuitry located above the register file uses the two bits to select the register pair and then uses the third bit to pick the lower or upper half of the register file.

The register selection bits can be in bits 2-0 of the instruction, for example AND; in bits 5-3 of the instruction, for example DEC (decrement); or in both positions, for example register-to-register LD.[6] To handle this, a multiplexer selects the appropriate group of bits and feeds them into the register select logic. Thus, the same circuit efficiently handles register bits in either position. By designing the instruction set in this way, the Z80 combines the ability to use a large register set with a compact hardware implementation.

Swapping registers through register renaming

The Z80 has several instructions to swap registers or register sets. The EX DE, HL instruction exchanges the DE and HL registers. The EX AF, AF' instruction exchanges the AF and AF' registers. The EXX instruction exchanges the BC, DE, and HL registers with the BC', DE', and HL' registers. These instructions complete very quickly, which raises the question of how multiple 16-bit register values can move around the chip at once.

It turns out that these instructions don't move anything. They just toggle a bit that renames the appropriate registers. For example, consider exchanging the DE and HL registers. If the DE/HL bit is set, an instruction acting on DE uses the first register and an instruction acting on HL uses the second register. If the bit is cleared, a DE instruction uses the second register and a HL instruction uses the first register. Thus, from the programmer's perspective, it looks like the values in the registers have been swapped, but in fact just the meanings/names/labels of the registers have been swapped. Likewise, a bit selects between AF and AF', and a bit selects between BC, DE, HL and the alternates. In all, there are four registers that can be used for DE or HL; physically there aren't separate DE and HL registers.

The hardware to implement register renaming is interesting, using four toggle flip flops.[7] These flip flops are toggled by the appropriate EX and EXX instructions. One flip flop handles AF/AF'. The second flip flop handles BC/DE/HL vs BC'/DE'/HL'. The last two flip flops handle DE vs HL and DE' vs HL'. Note that two flip flops are required since DE and HL can be swapped independently in either register bank.

The flags

The flags have a dual existence. The flags are stored inside the register file, but at the start of every instruction,[8] they are copied into latches above the ALU. From this location, the flags can be used and modified by the ALU. (For example, add or shift operations use the carry flag.) At the end of an instruction that affects flags, the flags are copied from the latches back to the register file.

Most of the flags are generated by the ALU (details here). The circuitry to set and use the carry is complicated, since it is used in different ways by shifts and rotates, as well as arithmetic. Conditional operations are another important use of the flags.[9]

The WZ temporary registers

The Z80 (like the 8080 and 8085) has a WZ register pair that is used for temporary storage but is invisible to the programmer. The primary use of WZ is to hold an operand from a two or three byte instruction until it can be used.[10]

The JP (jump) instruction shows why the WZ registers are necessary. This instruction reads a two-byte address following the opcode and jumps to that address. Since the Z80 only reads one byte at a time, the address bytes must be stored somewhere while being read in, before the jump takes place. (If you read the bytes directly into the program counter, you'd end up jumping to a half-old half-new address.) The WZ register pair is used to hold the target address as it gets read in. The CALL (subroutine call) instruction is similar.

Another example is EX (SP), HL which exchanges two bytes on the stack with the HL register. The WZ register pair holds the values at (SP+1) and (SP) temporarily during the exchange.

How the registers are implemented in silicon

The building block for the registers is a simple circuit to store one bit, consisting of two inverters in a feedback loop. In the diagram below, if the top wire has a 0, the right inverter will output a 1 to the bottom wire. The left inverter will then output a 0 to the top wire, completing the cycle. Thus, the circuit is stable and will "remember" the 0. Likewise, if the top wire is a 1, this will get inverted to a 0 at the bottom wire, and back to a 1 at the top. Thus, this circuit can store either a 0 or a 1, forming a 1-bit memory.[11]

In the Z80, two coupled inverters hold a single bit in the register. This circuit is stable in either the 0 or 1 state.

How does a value get stored into this inverter pair? Surprisingly, the Z80 just puts stronger signals on the wires, forcing the inverters to take the new values.[12] There's no logic involved, just "might makes right". (In contrast, the 6502 uses an additional transistor in the inverter feedback loop to break the feedback loop when writing a new value.)

To support multiple registers, each register bit is connected to bus lines by two pass transistors. These transistors act as switches that turn on to connect one register to the bus. Each register has a separate bus control signal, connecting the register to the bus when needed. Note that there are two bus lines for each bit - the value and its complement. As explained above, to write a new value to the bit, the new value is forced into the inverters. There are 16 pairs of bus lines running horizontally through the register file, one for each bit.

Each bit of register storage is connected to the bus by pass transistors, allowing the bit to be read or written.

Next, to see how an inverter works, the schematic below shows how an inverter is implemented in the Z80. The Z80 uses NMOS transistors, which can be viewed as simple switches. If a 1 is input to the transistor's gate, the switch closes, connecting ground (0) to the output. If a 0 is input to the gate, the switch opens and the output is pulled high (1) by the resistor. Thus, the output is the opposite of the input.[13]

Implementation of an inverter in NMOS.

Putting this all together - the two inverters and the pass transistors - yields the following schematic for a single bit in the register file. The layout of the schematic matches the actual silicon where the inverters are positioned to minimize the space they take up. The bus lines and ground run horizontally. The control line to connect a register to the buses runs vertically, along with the 5V power line.

Schematic of one bit inside the Z80's register file.

The diagram below shows the physical implementation of a register bit in the Z80, superimposed on a photo of the die. It's tricky to understand this, but comparing with the schematic above should help. The silicon is in green, the polysilicon is in red, and the metal lines are in blue. Transistors occur where the polysilicon (red) crosses the silicon (green). The X in a box indicates a contact connecting two layers. Note the large area taken up by the resistors (which are formed from depletion-mode transistors). Additional register bits can be seen in the photo, surrounding the bit illustrated.

This diagram shows the layout on silicon of one bit of register storage. Green indicates silicon, red indicates polysilicon, and blue is the metal layer.

Zooming out, the picture below shows the upper right part of the register file. Each bit consists of a structure like the one above. Each column is a separate register, with a separate control line, and each row is one of the bits. The columns are in groups of two, with the register control lines between the pairs of columns. Zooming out more, the image at the top of the article shows the full register file and its location in the chip. Thus, you can see how the entire register file is built up from simple transistors.

A detail of the Z80 chip, showing part of the register file.

Comparison with the 6502 and 8085

While the Z80's register complement is tiny compared to current processors, it has a solid register set by 1976 standards - about twice as many registers as the 8085 and about four times as many registers as the 6502. Because they share the 8080 heritage, many of the 8085's registers are similar to the Z80, but the Z80 adds the IX and IY index registers, as well as the second set of registers.

The physical structure of the Z80's register file is similar to the 8085 register file. Both use 6-transistor static latches arranged into a 16-bit wide grid. The 8085, however, uses complex differential sense amplifiers to read the values from the registers. The Z80, by contrast, just uses regular gates. I suspect the 8085's designers saved space by making the register transistors as small as possible, requiring extra circuitry to read the weak values on the bus lines.

The 6502, on the other hand, doesn't have a separate register file. Instead, registers are put on the chip where it turns out to be convenient. Since the 6502 has fewer registers, the register circuitry doesn't need to be as optimized and each bit is more complex. The 6502 adds a transistor to each bit so it is clocked, and separate pass transistors for read and write. One consequence is direct register-to-register transfers are possible on the 6502, since the source and destination registers can be distinguished. Instead of a separate incrementer unit, the 6502's program counter is tangled in with the incrementer circuitry.

Conclusion

By looking at the silicon of the Z80 in detail, we can determine exactly how it works. The Z80's register file has more complexity than you'd expect and the hardware implementation is different from published architecture diagrams. By splitting the register file in two, the Z80 runs faster since registers can be updated in parallel. The Z80 includes a WZ register pair for temporary storage that isn't visible to the programmer. The Z80's register storage has many similarities to the 8085, both in the registers provided and their hardware implementation, but is very different from the 6502.

Credits: This couldn't have been done without the Visual 6502 team especially Chris Smith, Ed Spittles, Pavel Zima, Phil Mainwaring, and Julien Oster. All die photos are from the visual 6502 team.

Notes and references

[1] There are many variants of that architecture diagram; the one above is from Wikipedia. The original source of the common Z80 architecture diagram is the book Programming the Z80 by Rodnay Zaks, page 65 (HTML or PDF). The book is an extremely detailed guide to the Z80, down to the instruction cycles. I don't mean to criticize the architecture diagram by pointing out differences between it and the actual silicon. After all, it is a logic-level diagram intended for use by programmers, not a hardware reference. But it is interesting to see the differences between the programmer's view and the hardware implementation.

[2] Zilog's Z80 CPU user manual is a key reference on the instruction set and operation of the Z80, but it doesn't provide any information on the internal architecture.

[3] The Z80's memory refresh feature is described in patent 4332008. Figure 15 in the patent shows the segmented data bus used by the Z80, although it is a mirror image of the actual die.

[4] I wrote more about the data buses in the Z-80 in Why the Z-80's data pins are scrambled.

[5] The bit pattern 110 is an exception to the encoding of registers in instructions, since it refers to a memory location indexed by the HL register pair, rather than a register. Likewise the bit pattern 11x referring to a register pair is also an exception. It can indicate the SP register, for example in 16-bit LD, INC and DEC instructions.

[6] The Z80 specifies registers in instruction bits 0-2 and bits 3-5. This maps cleanly onto octal, but not hexadecimal. One consequence is the opcodes are more logical if you arrange them in octal (like this), instead of hexadecimal (like this). Perhaps the designers of the Z80 were thinking in octal and not hex.

[7] The toggle flip flops are unlike standard flip flops formed from gates. Instead they use pass transistors; this lets it hold the previous state while toggling to avoid oscillation. Because the pass transistor circuits depend on capacitance holding the values, you have to keep the clock running. This is one reason the clock in the Z80 can't stop for more than a couple microseconds. (The CMOS version is different and the clock can stop arbitrarily long.) From looking at the silicon, it appears that these flip flops required some modifications to work reliably, probably to ensure they toggled exactly once.

These flip flops have no reset logic, so it is unpredictable how the registers get assigned on power-up. Since there's no way to tell which register is which, this doesn't matter.

The active DE vs HL flip flop swaps the DE and HL register control lines using pass-gate multiplexers. The main vs alternate register set flip flops direct each AF/BC/DE/HL register control line to one of the two registers in the pair.

[8] Like many processors of its era, the Z80 starts fetching a new instruction before the previous instruction is finished; this is known as fetch/execute overlap. As a result, the flags are actually written from the latches to the register file three cycles into the next instruction (i.e. T3), and the flags are read from the register file into the latches four cycles into the instruction (i.e. T4).

[9] I'll explain briefly how conditional instructions such as jump (JP) work with the flags. Bits 4 and 5 of the opcode select the flag to use (via a multiplexer just to the right of the registers). Bit 3 of the opcode indicates the desired value (clear or set); this bit is XORed with the selected flag's value. The result indicates if the desired condition is satisfied or not, and is fed into the control logic to determine the appropriate action. The JR and DJNZ don't exactly fit the pattern so a couple additional gates adjust their bits to pick the right flags.

[10] For more explanation of the WZ registers, see Programming the Z80, pages 87-91.

[11] The register storage in the Z80 is called "static" memory, since it will store data as long as the circuit has power. In contrast, your computer uses dynamic memory, which will lose data in milliseconds if the data isn't constantly refreshed. The advantage of dynamic memory is it is much simpler (a transistor and a capacitor), and thus each cell is much smaller. (This is how DRAM can fit gigabits onto a single chip.) Another alternative is flash memory, which has the big advantage of keeping its contents while the power is turned off.

[12] If you've built electronic circuits, it may seem dodgy to force the inverters to change values by overpowering the outputs. But this is a standard technique in chips. To understand what happens, remember that in an NMOS circuit, a 0 output is created by a transistor to ground, while a 1 output is made by a much weaker resistor. So if one of the inverters is outputting a 1 and a 0 is connected to the output, the 0 will "win". This will cause the other inverter to immediately switch to 1. At this point, the original inverter will switch to output 0 and the inverter pair is now stable with the new values.

To improve speed, and to prevent a low voltage on the bus from accidentally clearing a bit while reading a register, the bus lines are all precharged to +5 every clock cycle. A low output from an inverter will have no trouble pulling the bus line low, and a high output will leave the bus line high. The precharging is done through transistors in the space between the IR and WZ registers.

[13] One disadvantage of NMOS logic is the pull-up resistors waste power. In addition, the output is fairly slow (by computer standards) to change from 0 to 1 because of the limited current through the resistor. For these, reasons, NMOS has been almost entirely replaced by CMOS logic which instead of resistors uses complementary transistors to pull the output high. (As a result, CMOS uses almost no power except while switching outputs from one state to another. For this reason, CMOS power usage scales up with frequency, which is why CPUs are hitting clock limits - they're too hot to run any faster.)

Mining Bitcoin with pencil and paper: 0.67 hashes per day

This article is now available in Japanese: 紙と鉛筆でビットコインをマイニング：1日に0.67ハッシュ and Russian: Майним Bitcoin с помощью бумаги и ручки.

I decided to see how practical it would be to mine Bitcoin with pencil and paper. It turns out that the SHA-256 algorithm used for mining is pretty simple and can in fact be done by hand. Not surprisingly, the process is extremely slow compared to hardware mining and is entirely impractical. But performing the algorithm manually is a good way to understand exactly how it works.

A pencil-and-paper round of SHA-256

The mining process

Bitcoin mining is a key part of the security of the Bitcoin system. The idea is that Bitcoin miners group a bunch of Bitcoin transactions into a block, then repeatedly perform a cryptographic operation called hashing zillions of times until someone finds a special extremely rare hash value. At this point, the block has been mined and becomes part of the Bitcoin block chain. The hashing task itself doesn't accomplish anything useful in itself, but because finding a successful block is so difficult, it ensures that no individual has the resources to take over the Bitcoin system. For more details on mining, see my Bitcoin mining article.

A cryptographic hash function takes a block of input data and creates a smaller, unpredictable output. The hash function is designed so there's no "short cut" to get the desired output - you just have to keep hashing blocks until you find one by brute force that works. For Bitcoin, the hash function is a function called SHA-256. To provide additional security, Bitcoin applies the SHA-256 function twice, a process known as double-SHA-256.

In Bitcoin, a successful hash is one that starts with enough zeros.[1] Just as it is rare to find a phone number or license plate ending in multiple zeros, it is rare to find a hash starting with multiple zeros. But Bitcoin is exponentially harder. Currently, a successful hash must start with approximately 17 zeros, so only one out of 1.4x10²⁰ hashes will be successful. In other words, finding a successful hash is harder than finding a particular grain of sand out of all the grains of sand on Earth.

The following diagram shows a block in the Bitcoin blockchain along with its hash. The yellow bytes are hashed to generate the block hash. In this case, the resulting hash starts with enough zeros so mining was successful. However, the hash will almost always be unsuccessful. In that case, the miner changes the nonce value or other block contents and tries again.

Structure of a Bitcoin block

The SHA-256 hash algorithm used by Bitcoin

The SHA-256 hash algorithm takes input blocks of 512 bits (i.e. 64 bytes), combines the data cryptographically, and generates a 256-bit (32 byte) output. The SHA-256 algorithm consists of a relatively simple round repeated 64 times. The diagram below shows one round, which takes eight 4-byte inputs, A through H, performs a few operations, and generates new values of A through H.

One round of the SHA-256 algorithm showing the 8 input blocks A-H, the processing steps, and the new blocks. Diagram created by kockmeyer, CC BY-SA 3.0.

The blue boxes mix up the values in non-linear ways that are hard to analyze cryptographically. Since the algorithm uses several different functions, discovering an attack is harder. (If you could figure out a mathematical shortcut to generate successful hashes, you could take over Bitcoin mining.)

The Ma majority box looks at the bits of A, B, and C. For each position, if the majority of the bits are 0, it outputs 0. Otherwise it outputs 1. That is, for each position in A, B, and C, look at the number of 1 bits. If it is zero or one, output 0. If it is two or three, output 1.

The Σ0 box rotates the bits of A to form three rotated versions, and then sums them together modulo 2. In other words, if the number of 1 bits is odd, the sum is 1; otherwise, it is 0. The three values in the sum are A rotated right by 2 bits, 13 bits, and 22 bits.

The Ch "choose" box chooses output bits based on the value of input E. If a bit of E is 1, the output bit is the corresponding bit of F. If a bit of E is 0, the output bit is the corresponding bit of G. In this way, the bits of F and G are shuffled together based on the value of E.

The next box Σ1 rotates and sums the bits of E, similar to Σ0 except the shifts are 6, 11, and 25 bits.

The red boxes perform 32-bit addition, generating new values for A and E. The input W_t is based on the input data, slightly processed. (This is where the input block gets fed into the algorithm.) The input K_t is a constant defined for each round.[2]

As can be seen from the diagram above, only A and E are changed in a round. The other values pass through unchanged, with the old A value becoming the new B value, the old B value becoming the new C value and so forth. Although each round of SHA-256 doesn't change the data much, after 64 rounds the input data will be completely scrambled.[3]

Manual mining

The video below shows how the SHA-256 hashing steps described above can be performed with pencil and paper. I perform the first round of hashing to mine a block. Completing this round took me 16 minutes, 45 seconds.

To explain what's on the paper: I've written each block A through H in hex on a separate row and put the binary value below. The maj operation appears below C, and the shifts and Σ0 appear above row A. Likewise, the choose operation appears below G, and the shifts and Σ1 above E. In the lower right, a bunch of terms are added together, corresponding to the first three red sum boxes. In the upper right, this sum is used to generate the new A value, and in the middle right, this sum is used to generate the new E value. These steps all correspond to the diagram and discussion above.

I also manually performed another hash round, the last round to finish hashing the Bitcoin block. In the image below, the hash result is highlighted in yellow. The zeroes in this hash show that it is a successful hash. Note that the zeroes are at the end of the hash. The reason is that Bitcoin inconveniently reverses all the bytes generated by SHA-256.[4]

Last pencil-and-paper round of SHA-256, showing a successfully-mined Bitcoin block.

What this means for mining hardware

Each step of SHA-256 is very easy to implement in digital logic - simple Boolean operations and 32-bit addition. (If you've studied electronics, you can probably visualize the circuits already.) For this reason, custom ASIC chips can implement the SHA-256 algorithm very efficiently in hardware, putting hundreds of rounds on a chip in parallel. The image below shows a mining chip that runs at 2-3 billion hashes/second; Zeptobars has more photos.

The silicon die inside a Bitfury ASIC chip. This chip mines Bitcoin at 2-3 Ghash/second. Image from Zeptobars. (CC BY 3.0)

In contrast, Litecoin, Dogecoin, and similar altcoins use the scrypt hash algorithm, which is intentionally designed to be difficult to implement in hardware. It stores 1024 different hash values into memory, and then combines them in unpredictable ways to get the final result. As a result, much more circuitry and memory is required for scrypt than for SHA-256 hashes. You can see the impact by looking at mining hardware, which is thousands of times slower for scrypt (Litecoin, etc) than for SHA-256 (Bitcoin).

Conclusion

The SHA-256 algorithm is surprisingly simple, easy enough to do by hand. (The elliptic curve algorithm for signing Bitcoin transactions would be very painful to do by hand since it has lots of multiplication of 32-byte integers.) Doing one round of SHA-256 by hand took me 16 minutes, 45 seconds. At this rate, hashing a full Bitcoin block (128 rounds)[3] would take 1.49 days, for a hash rate of 0.67 hashes per day (although I would probably get faster with practice). In comparison, current Bitcoin mining hardware does several terahashes per second, about a quintillion times faster than my manual hashing. Needless to say, manual Bitcoin mining is not at all practical.[5]

A Reddit reader asked about my energy consumption. There's not much physical exertion, so assuming a resting metabolic rate of 1500kcal/day, manual hashing works out to almost 10 megajoules/hash. A typical energy consumption for mining hardware is 1000 megahashes/joule. So I'm less energy efficient by a factor of 10^16, or 10 quadrillion. The next question is the energy cost. A cheap source of food energy is donuts at $0.23 for 200 kcalories. Electricity here is $0.15/kilowatt-hour, which is cheaper by a factor of 6.7 - closer than I expected. Thus my energy cost per hash is about 67 quadrillion times that of mining hardware. It's clear I'm not going to make my fortune off manual mining, and I haven't even included the cost of all the paper and pencils I'll need.

2017 edit: My Bitcoin mining on paper system is part of the book The Objects That Power the Global Economy, so take a look.

Follow me on Twitter to find out about my latest blog posts.

Notes

[1] It's not exactly the number of zeros at the start of the hash that matters. To be precise, the hash must be less than a particular value that depends on the current Bitcoin difficulty level.

[2] The source of the constants used in SHA-256 is interesting. The NSA designed the SHA-256 algorithm and picked the values for these constants, so how do you know they didn't pick special values that let them break the hash? To avoid suspicion, the initial hash values come from the square roots of the first 8 primes, and the K_t values come from the cube roots of the first 64 primes. Since these constants come from a simple formula, you can trust that the NSA didn't do anything shady (at least with the constants).

[3] Unfortunately the SHA-256 hash works on a block of 512 bits, but the Bitcoin block header is more than 512 bits. Thus, a second set of 64 SHA-256 hash rounds is required on the second half of the Bitcoin block. Next, Bitcoin uses double-SHA-256, so a second application of SHA-256 (64 rounds) is done to the result. Adding this up, hashing an arbitrary Bitcoin block takes 192 rounds in total. However there is a shortcut. Mining involves hashing the same block over and over, just changing the nonce which appears in the second half of the block. Thus, mining can reuse the result of hashing the first 512 bits, and hashing a Bitcoin block typically only requires 128 rounds.

[4] Obviously I didn't just have incredible good fortune to end up with a successful hash. I started the hashing process with a block that had already been successfully mined. In particular I used the one displayed earlier in this article, #286819.

[5] Another problem with manual mining is new blocks are mined about every 10 minutes, so even if I did succeed in mining a block, it would be totally obsolete (orphaned) by the time I finished.

Why the Z-80's data pins are scrambled

If you look closely at the datasheet for a Z-80 chip, you'll notice the data pins are in a random-looking order. The address pins (A) are nicely arranged in order counterclockwise from 0 to 15, but the data pins (D) are all shuffled around.[1] After studying the internals of the chip, I have a hypothesis to explain this.

Pinout of the Z-80, from the Zilog Data Book.

I have been reverse-engineering the Z-80 processor using images and data from the Visual 6502 team. The image below is a photograph of the Z-80 die. Around the outside of the chip are the pads that connect to the external pins. (The die photo is rotated 180° compared to the datasheet pinout, if you try to match up the pins.) At the right are the 8 data pins for the Z-80's 8-bit data bus in a strange order.

The 8-bit data bus in the Z-80 is used for communication among the different parts of the chip. But instead of a single data bus, the Z-80 has a complex data bus that is split into 3 segments. The first segment of the data bus (in red) connects to the data pins to the instruction decode logic. The first segment is also connected to the second segment (green). The green data bus provides access to the lower byte of registers and is also connected to the fourth segment of the data bus (orange). The orange data bus is connected to the high byte of registers and also to the ALU (Arithmetic Logic Unit)[2]. Note that because the green segment splits off from the red segment, only half of the red bus (4 bits) goes down to the lower part of the chip. [There was an extra segment in an earlier version of this article.]

The Z-80's silicon die, showing the data and address pins, data buses and other internal components.

The motivation behind splitting the data bus is to allow the chip to perform activities in parallel. For instance an instruction can be read from the data pins into the instruction logic at the same time that data is being copied between the ALU and registers. The partitioned data bus is described briefly in the Z-80 oral history[3], but doesn't appear in architecture diagrams.

The complex structure of the data buses is closely connected to the ordering of the data pins. But before explaining the data pin layout, a few more features of the Z-80 need to be discussed.

How the Z-80 processes instructions

To execute an instruction, the Z-80 loads the instruction from memory through the data pins and feeds it into the instruction decode logic via the red segment of the data bus. First, the instruction is stored from the data bus into the instruction register, which is a simple latch that holds the instruction while it is being executed. This feeds the instruction into the PLA (Programmable Logic Array), which decodes the instruction into approximately 98 different categories (details). The instruction logic below the PLA combines these signals with timing signals and determines exactly what should happen at what step. This logic generates the control signals that control the operation of the register file, ALU, and other parts of the chip.

Since the Z-80 is an 8-bit processor, instruction op codes are 8 bits long. Many of the instructions have the bits arranged as follows:

ggiiirrr

In that arrangement, the two gg bits select a group of instructions (e.g. load or arithmetic), the next three iii bits select the particular instruction, and the last three rrr bits select the register to use. There are many exceptions to this format, but it provides an underlying structure. (This instruction structure was inherited from the 8080 microprocessor, since the Z-80 was designed to be backwards compatible with it.)

Bit instructions in the Z-80

One feature the Z-80 has that goes beyond the 8080 is instructions to set, clear, or test a single bit in a register.[4] These instructions fit the pattern described above, with the top two bits in the instruction selecting the test, clear, or set function, the next three bits in the instruction selecting which bit in the byte to operate on, and the final three bits selecting the register. That is, bits 5, 4, and 3 of the instruction select which of the eight bits in the register to operate on.

The processing of the Z-80's bit operations is unusual compared to other instructions. While most of the instruction execution is controlled by the same instruction decoding logic described above, the bit selection is done by feeding the three instruction bits directly into the ALU, bypassing the instruction decoding logic entirely. That is, there are simple circuits (at the right side of the ALU) to select one of the 8 bits, depending on the instruction that was read in. In the diagram of the chip, you can see the connection between the data bus (red) and the ALU to accomplish this.

The hardware to do the bit selection is fairly straightforward. There are eight 3-input NOR gates, each looking at a different combination of the instruction bits, either inverted or non-inverted. For example, an instruction that operates on bit 2 will have an opcode of the form gg010rrr (since 010 is binary 2). Instruction bit 5 is 0, instruction bit 4 is 1, and instruction bit 3 is 0. (Don't confuse the bits in the instruction with the bit being selected.) In logic, this becomes:

modify_data_bit_2 = (NOT bit5) AND bit4 AND (NOT bit3).

It turns out that NOR gates are easiest to build in silicon (as will be explained below), so the logic in hardware is the equivalent:

modify_data_bit_2 = bit5 NOR (NOT bit4) NOR bit3.

Selection of the other 7 bits is done with similar functions of the instruction bits bit5, bit4, and bit3.

The hardware implementation of bit instructions

For a bit operation, one of the 8 bits will be selected and fed into the ALU. The ALU will then test, clear, or set that bit in the appropriate register. Below is a zoomed-in look at the portion of the die that selects bit 2. This is in the lower right corner of the chip, to the right of the ALU by the D3 pad. The white vertical stripes are metal lines, providing the data lines, control lines, and power and ground. Underneath the metal is the polysilicon layer. Underneath this is silicon layer, where the transistors are. It's hard to make out the polysilicon and silicon structures in this photo, but at the left you can see the horizontal polysilicon bus lines for ALU bits 5 and 2. These lines provide data flow through the ALU, and are how the selected bit is fed into the ALU.

The circuitry in the Z-80 to handle bit operations on bit 2.

The data bus provides bits 5, 4, and 3 to this part of the chip. (Just to make thing confusing, the data on the Z-80's data bus is inverted, which is indicated by a slash.) Underneath this bus is the NOR gate (outlined in blue) that computes the function described earlier: bit5 NOR (NOT bit4) NOR bit3. The inverter to bit3 from the inverted bit3 on the data bus is also visible (outlined in yellow). A buffer (green) strengthens this signal. The "ALU load bit value" control line is activated by the instruction decode logic; this control line allows the selected bit to pass into the ALU only for a bit operation.

A NOR gate is a simple circuit when implemented with MOS transistors, as the schematic below shows. The transistors can be thought of as switches that close if their gate (middle connection) receives a 1 input. In the circuit below, if any of the inputs are 1, the corresponding transistor will connect the output to ground, and the output will be 0. Otherwise, the resistor (which is actually a special depletion-mode transistor) will pull the output high and the output is 1. Thus, the output is the NOR of the three inputs.

Schematic of a 3-input NOR gate in the Z-80.

The diagram below shows how the above NOR gate is actually implemented in silicon. The diagram is a zoomed-in version of the image above, focusing on the NOR gate (blue outline). Instead of a photograph, the diagram shows the different layers in the chip as extracted by the Visual 6502 team: blue is metal, brown is polysilicon, green is silicon, and orange is a connection between layers. A transistor is formed when polysilicon crosses silicon.

Implementation of a 3-input NOR gate in the Z-80 chip.

The "T" symbols indicate the three transistors that are connected to ground (as shown by yellow arrows). The transistors are all connected together in the middle, and the final yellow arrow shows the connection to the output. Finally, the pull-up resistor is at the lower left. The cyan outline matches the outline in the die photo and with difficulty you can find the structures in the photo.

The important thing to notice in the diagram above is that everything is packed together as tightly as possible to get the Z-80 to fit on the available silicon. The layout of the Z-80 was done by hand, with each transistor and connection manually positioned. Every possible trick was used to minimize space - for example, each transistor above is oriented in a different direction. Drafting this layout was an extremely time-consuming task that took Zilog founder Federico Faggin 3 1/2 months of 80-hour weeks[3]. (Yes, the CEO drafted the chip himself!) But you can see from the result that there is very little wasted space in the chip.

The data pins

This article has looked at many different aspects of the Z-80 design, and now it's time to see how they constrain the position of the data pins. First, because the Z-80 splits the data bus into multiple segments, only four data lines run to the lower right corner of the chip. And because the Z-80 was very tight for space, running additional lines would be undesirable. Next, the BIT instructions use instruction bits 3, 4, and 5 to select a particular bit. This was motivated by the instruction structure the Z-80 inherited from the 8080. Finally, the Z-80's ALU requires direct access to instruction bits 3, 4, and 5 to select the particular data bit. Putting these factors together, data pins 3, 4, and 5 are constrained to be in the lower right corner of the chip next to the ALU. This forces the data pins to be out of sequence, and that's why the Z-80 has out-of-order data pins.[5]

Credits: The chip analysis couldn't have been done without the Visual 6502 team especially Chris Smith, Ed Spittles, Pavel Zima, Phil Mainwaring, and Julien Oster.

Notes and references

[1] Even though the Z-80 has out-of-order data pins, it is an improvement over the 8080, where both address and data pins are in a strange order. The 6502, on the other hand, has a nice linear order for its pins.

[2] Unexpectedly, the Z-80's ALU is 4 bits wide. I've written up details here.

[3] The Computer History Museum created an oral history of the Z-80, which is very interesting. A couple parts of it are especially relevant to this article. Page 10 discusses the segmented data bus. Pages 5, 9, and 19 discuss Zilog CEO Federico Faggin laying out the chip over several months. One interesting story is how he ran out of room and had to erase two weeks of work and start over. In the end he completed the layout with only a couple mils of space left.

[4] The Z-80 has multiple operations to set, clear, or test a single bit. These instructions are expressed by two bytes. The first byte is the prefix CB, and the second byte is the specific instruction. The top two bits (ii) of the instruction are 01 for BIT (test bit), 10 for RES (reset bit), and 11 for SET (set bit). For more details, see the Z-80 User Manual, page 240.

[5] Even with pins 3, 4, and 5 out of order, the Z-80 could have used a "semi-linear" sequence such as 0,1,2,6,7,3,4,5. Why didn't the Z-80 do this? My hypothesis is that once some pins were forced out of sequence, the Z-80's designers decided to take advantage of any other micro-optimizations from reordering the pins. For example, pins D0 and D1 have their drivers in order on the chip, but the routing from the drivers to the pins swaps the order to avoid crossing. Pin D7 is probably where it is because its driver lines up well with bit 7 in the PLA. Switching the positions of pins D3 and D4 would make the routing a tiny bit longer.

There are a bunch of good comments on this article at Hacker News.

Reverse engineering a counterfeit 7805 voltage regulator

Update: It turns out my 7805 isn't counterfeit. eclectro did an in-depth search (details on reddit) and found an old 7805 datasheet from Thomson Semiconductors that exactly matches my chip. And Thomson is the T in STMicroelectronics. So that explains how this die ended up with a ST label. More in this thread.

Under a microscope, a silicon chip is a mysterious world with puzzling shapes and meandering lines zigzagging around, as in the magnified image of a 7805 voltage regulator below. But if you study the chip closely, you can identify the transistors, resistors, diodes, and capacitors that make it work and even understand how these components function together. This article explains how the 7805 voltage regulator works, all the way down to how the transistors on the silicon operate. And while exploring the chip, I discovered that it is probably counterfeit.

Die photograph of a 7805 voltage regulator. Click to enlarge.

A voltage regulator takes an unregulated input voltage and converts it to the exact regulated voltage an electronic circuit requires. Voltage regulators are used in almost every electronic circuit, and the popular 7805 has been used everywhere from computers[1] to satellites, from DVD player and video games to Arduinos[2]. and robots. Even though it was introduced in 1972 and more advanced regulators[3] are now available, the 7805 is still in use, especially with hobbyists.

The 7805 is a common type of regulator known as a linear regulator. (As its name hints, the 7805 produces 5 volts.) A linear regulator is built around a large transistor that controls the amount of power flowing to the output, acting similar to a variable resistor. (This transistor is visible in the right half of the die photo above.) A drawback of a linear regulator is that all the "extra" voltage gets converted into heat. If you put 9 volts into a linear regulator and get 5 volts out, the extra 4 volts gets turned into heat in the regulator, so the regulator is only about 56% efficient. (The main competitor to a linear regulator is a switching power supply - a much more efficient, but much more complicated way to produce regulated voltage. Switching power supplies have replaced linear regulators in many applications, such as phone chargers and computer power supplies.)

A 7805 voltage regulator in a metal TO-3 package. The 7805 is more commonly found in a smaller plastic package.

Linear regulators such as the 7805 became very popular because they are extremely easy to use: just feed the unregulated voltage into one pin, ground the second pin, and get regulated voltage out the third pin[4]. Another feature that made the 7805 popular is it is almost indestructible - if you short-circuit it, put too much voltage in, or run it too hot, it will shut down before getting damaged, due to internal protection circuits.

The components of the integrated circuit

Like most chips, the 7805 is built from a tiny piece of silicon. To make the chip function, a process called doping treats regions of the silicon with elements such as phosphorus or boron. In the die photo, these regions have a slightly different color, which makes the structure of the chip visible. Phosphorus gives the region excess electrons (i.e. negative), so it is known as N silicon. Boron has the opposite effect, creating positive P silicon. The amount of doping in a silicon chip is surprisingly small, varying from 1 foreign atom for every thousand atoms of silicon down to one foreign atom per billion atoms of silicon. Because silicon is so sensitive to impurities, the original silicon wafer must be an insanely pure crystal, up to 99.999999999% pure - a level known as eleven nines.

On top of the silicon, a thin layer of metal connects different parts of the chip. This metal is clearly visible in the die photo as white traces and regions.[5] A thin, glassy silicon dioxide layer provides insulation between the metal and the silicon, except where rectangular contact holes in the silicon dioxide allow the metal to connect to the silicon. Around the edge of the chip, thin wires connect the metal pads to the chip's external pins - the black blobs in the photo show where the wires were attached.

Transistors inside the IC

Transistors are the key components in the chip. The 7805 uses NPN and PNP bipolar transistors (unlike digital chips which usually have CMOS transistors). If you've studied electronics, you've probably seen a diagram of a NPN transistor like the one below, showing the collector (C), base (B), and emitter (E) of the transistor, The transistor is illustrated as a sandwich of P silicon in between two symmetric layers of N silicon; the N-P-N layers make a NPN transistor. It turns out that transistors on a chip look nothing like this, and the base often isn't even in the middle!

An NPN transistor and its oversimplified structure.

The photo below shows one of the transistors in the 7805 as it appears on the chip.[6] The different brown and purple colors are regions of silicon that has been doped differently, forming N and P regions. The gray areas are the metal layer of the chip on top of the silicon - these form the wires connecting to the collector, emitter, and base.

Structure of a NPN transistor inside the 7805 voltage regulator.

Underneath the photo is a cross-section drawing showing approximately how the transistor is constructed. There's a lot more than just the N-P-N sandwich you see in books, but if you look carefully at the vertical cross section below the 'E', you can find the N-P-N that forms the transistor. The emitter (E) wire is connected to N+ silicon. Below that is a P layer connected to the base contact (B). And below that is a N+ layer connected (indirectly) to the collector (C).[7] The transistor is surrounded by a P+ ring that isolates it from neighboring components.

Resistors inside the IC

Resistors are a key component of analog chips and are formed from strips of silicon doped to have high resistance. The photo below shows two resistors in the 7805 voltage regulator, formed from greenish-purple strips of P silicon. (The gray metals strips connect to the resistors at the square contacts and wire the resistors to other parts of the chip.) The value of the resistor is proportional to its length[8], so the short resistor on the right (850Ω) is smaller than the meandering resistor on the left (4000Ω). Resistors with large values take up an inconveniently large area on the chip - in the top left of the die photo you can see the serpentine path of an 80KΩ resistor.

Two resistors on the 7805 voltage regulator's silicon die.

How the 7805 works

I've colored the following schematic[9] to indicate the main blocks of the 7805 regulator. The heart of the 7805 chip is a large transistor that controls the current between the input and output, and thus controls the output voltage. This transistor (Q16) is red on the diagram below. On the die, it takes up most of the right half of the chip because it needs to handle over 1 amp of current.

Components of the 7805 regulator: bandgap (yellow), error amp (orange), output transistor (red), protection (purple), startup (green).

The bandgap reference (yellow) is what keeps the voltage stable. It takes the scaled output voltage as input (Q1 and Q6), and provides an error signal (to Q7) indicating if the voltage is too high or too low. The key feature of the bandgap is it provides a stable and accurate reference, even as the chip's temperate changes. The next section will discuss the bandgap in detail.

The error signal from the bandgap reference is amplified by the error amplifier (orange). The amplified signal controls the output transistor through large driver Q15. This closes the negative feedback loop that controls the output voltage. The startup circuit (green) provides initial current to the bandgap circuit, so it doesn't get stuck in an off state.[10] The circuits in purple provide protection against overheating (Q13), excessive input voltage (Q19), and excessive output current (Q14). If there is a fault, these circuits reduce the output current or shut down the regulator, protecting it from damage.

The voltage divider (blue) scales down the voltage on the output pin for use by the bandgap reference. It has an interesting implementation that allows different chips in the 78XX family to produce different voltages. (For instance 12 volts from the 7812 and 24 volts from the 7824.) The image below shows the square contacts between the metal (white) and the resistor (turquoise) that control the values of R20 and R21. For a different regulator, a simple change to the position of the variable contact increases the resistance of R20 and thus the output voltage of the chip.

The feedback voltage divider inside the 7805 voltage regulator consists of two resistors.

How a bandgap reference works

The main problem with producing a stable voltage from an IC is the chip's parameters change as temperature changes: it's no good if your 5 volt phone charger starts producing 3 or 7 volts on a hot day. The trick to building a stable voltage reference is to create one voltage that goes down with temperature and another than goes up with temperature. If you add them together correctly, you get a voltage that is stable with temperature. This circuit is called a "bandgap reference".

To create a voltage that goes down with temperature, you put a constant current through the transistor and look at the voltage between the base and emitter, called V_BE. The graph below shows how this voltage drops as the temperature increases. At the left, the line hits the bandgap voltage of silicon, about 1.2 volts; this will be important later.

Vbe vs temperature for a transistor

If you set up a second transistor this way but with a lower current[11], you get the same effect but the voltage V_BE curve drops faster. This may not seem helpful since we need a voltage that goes up with temperature. But here's the trick: if you subtract the two V_BE voltages, the difference increases as temperature increases, since the lines get farther apart. The difference is called ΔV_BE. The graph below shows the V_BE curves for two different transistors, and you can see how the difference ΔV_BE between the curves increases with temperature, even though both curves decrease with temperature.

Voltages in a bandgap reference: Vbe for two transistors as temperature changes.

The final step to a bandgap reference is to combine V_BE and ΔV_BE in the right ratio so the result is constant with temperature. It turns out that if the values sum to the bandgap voltage, the drop in V_BE and the increase in ΔV_BE cancel out. In the graph below, adding 10 copies of ΔV_BE is the right ratio; the exact ratio depends on the particular transistors. The important thing to notice in the graph below is that as the temperature changes, V_BE+nΔV_BE remains constant - the top of the of purple ΔV_BEs remains at the bandgap voltage.

By adding multiples of ΔVbe to Vbe, the bandgap voltage is reached regardless of temperature.[12]

How the 7805's bandgap reference works

The 7805's bandgap reference uses the above bandgap principles, but there are several important differences. First, the bandgap voltage in practice turns out to be about 1.25 volts instead of 1.2. Second, the 7805's bandgap creates a larger (and thus more accurate) 2ΔV_BE by taking the difference between two high-current V_BEs and two low-current V_BEs. Finally, 2ΔV_BE is scaled and added to three V_BEs to form three times the bandgap voltage, or about 3.75V.

The diagram below shows the 7805's bandgap circuit with arrows showing voltage changes (not currents). Starting at ground, the red arrow shows an increase of (large) V_BE across Q3, and another (large)V_BE across Q2. The green arrows show drops of (small) V_BE across Q4 and Q5. The result is the difference 2ΔV_BE ends up across R6.

The next step is very important as it scales up the voltage. The current through R7 will be the same as the current through R6 (ignoring small base currents). But R7 is 16.5 times as large as R6, so by Ohm's law, the voltage across R7 will be 16.5 times as large, i.e. 33ΔV_BE.

Finally, we can see the bandgap's voltage by looking at the purple lines. Starting at ground, the voltage goes up by V_BE across Q8, another V_BE across Q7, then the R7 voltage, and finally a third V_BE across Q6. Assuming the chip designers picked the scale factor of 33 correctly, the final voltage will be three bandgap voltages, or 3.75V.[13] (Vin here is the voltage input to the bandgap, not the voltage input to the 7805.)

How the bandgap voltage is generated in the 7805 voltage regulator.

A traditional bandgap circuit generates a stable reference voltage, but discussions of bandgaps usually ignore a big issue: in devices such as the 7805 or the TL431, the bandgap circuit does not generating a stable reference voltage. Instead, the 7805's bandgap works "backwards". The 7805's scaled output voltage provides the input voltage (Vin) to the bandgap reference, and the bandgap provides an error signal as output. The 7805's bandgap circuit removes the feedback loop that exists inside a traditional bandgap reference. Instead, the entire chip becomes the feedback loop.

In more detail, if the output voltage is correct (5V), then the voltage divider provides 3.75V at Vin, and the V_BE and ΔV_BE voltages are as described above. If the output voltage rises or falls slightly, this change propagates through Q6 and R7, causing the voltage at the base of Q7 to rise or fall accordingly. This change is amplified by Q7 and Q8, generating the error output.[14] The error output, in turn, decreases or increases the current through the output transistor, and this negative feedback loop adjusts the output voltage until it is correct.

Interactive chip viewer

The image and schematic[9] below are an interactive exploration of the 7805. Click a component to see its location on the die and in the schematic highlighted. The box below will give an explanation of the component. For transistors, the emitter, base, and collector will be indicated on the die.

Why I think this chip is counterfeit

The outside of the package has the ST Microelectronics logo, but for several reasons I think the chip is counterfeit and manufactured by someone else. First, on the die itself (below) there is no ST logo, no mask copyright, and no manufacturer information at all. (I have no explanation for why the die is labeled 2805 and not 7805, or what P414 means.) In addition, the circuit on the die is totally different from the internal circuit in the ST Microelectronics 7805 datasheet. The metal of the package looks grainy and low quality. Finally, I bought the part off eBay, not from a reputable supplier, so it could have come from anywhere. For these reasons, I conclude that the part I got is counterfeit and not a genuine ST Microelectronics LM7805. From what I hear, there's a lot of semiconductor counterfeiting happening so I'm not surprised to get a counterfeit part. (But see a dissenting opinion.)

Label on the die of a 7805 voltage regulator.

7805 history, and a look at some other designs

I had assumed that all 7805 chips were pretty much the same. But one surprise from studying datasheets is that different manufacturers use totally different internal circuitry for the same 7805 chip and the name "7805" doesn't mean much more than "some sort of 5 volt regulator."

To explain this, I'll start with a brief history of voltage regulators. Simple IC voltage regulators got their start way back in 1968 when Fairchild introduced the µA723 voltage regulator, which used a temperature-compensated Zener diode to provide an adjustable voltage. In 1969 analog design genius Robert Widlar[15] developed the National LM109 5-volt regulator, which was much simpler to use. It was followed in 1972 by Fairchild's 7800 series of voltage regulators, ranging from 5 volts to 24 volts. In 1973 National came out with an improved regulator series, the LM340-XX.

From this history, you'd expect that there's a LM109 design, a 7805 design, and a LM340 design. However, it turns out that the part numbers are really just marketing, and have little to do with what's inside the chip. Some 7805s are closer to the LM109 than to other 7805s, and some LM340s are closer to 7805s than to other LM340s.

For instance, the Fairchild µA109 uses the common Fairchild 7800 series design. On the other hand, the National LM7805 is very different from the Fairchild 7805, but is identical to the National LM340, even sharing the same datasheet. This design is very close to the original National LM109, so in effect National sold the same design under three different names.[16] Thus, it looks like companies reuse the same voltage regulator design, changing little more than the part number between devices. I suspect manufacturers are constrained by patents[17], so they use the part numbers they want on the devices they can make.

How a different, more popular 7805 design works

It turns out that 7805 design I reverse-engineered above is fairly rare, and most 7805 chips use a different design, shown below.[16] While the overall architecture of this design is similar to the LM109-derived 7805 chip I examined, most of the pieces have substantial changes. The current mirror[18], the startup circuit, the bandgap regulator, and the protection circuitry are all different.

Internal schematic of the Signetics µA7805 regulator from the datasheet.

Since this design is so popular, I'll give a brief explanation of how its bandgap circuit works.[19] In the figure below, there's a large V_BE (red arrow) across high-current transistor Q1, and a small V_BE (green arrow) across low-current transistor Q2. Thus, ΔV_BE appears across R3, generating a current through R3, Q2, and R2. Since R2 has 20 times the resistance as R3, 20ΔV_BE appears across R2, by Ohm's law.

Now, to find the temperature-compensated stable voltage for this circuit, follow the blue arrows up from ground. (As before, the arrows do not indicate current flow, and Vin is the input to the bandgap not the chip.) Going through Q3, Q4, R2, Q5 and Q6, the voltages sum to 4V_BE+20ΔV_BE. Since there are four V_BEs, the circuit must be designed for four times the bandgap voltage, or approximately 5V. Thus, this circuit's stable point is 5V. At this voltage, the error amplifying transistors (Q4/Q3) will be in the active region and will respond to any variation away from it.[20]

How the bandgap voltage is generated in the Signetics 7805 regulator.

How I looked at the 7805 die, and how you can too

Usually getting the die out of an IC requires concentrated acid to dissolve the epoxy package. But some ICs, such as the 7805, are available in metal cans which can be easily opened with a hacksaw. I used a metallurgical microscope for my die photos, but even a basic middle-school microscope shows you the metal layer at at low magnification. If you're at all interested in IC structure, or want to show kids what ICs look like inside, you should get an IC in a metal can, saw it open yourself, and take a look. (But first read the warning about beryllium inside some chips.) Many different ICs in metal cans are available for under $5 on eBay; search for "TO-99 IC". I find older chips such as the 7805 are better for this than modern chips: the simpler circuits and larger features make it easier to see the internals.

Inside a 7805 voltage regulator. The tiny silicon die is visible in the middle of the TO-5 package.

The photo above shows the 7805 regulator after removing the top with a hacksaw. The metal package is almost entirely empty inside - the silicon die is very small compared to the space available. The metal acts as an effective heat sink to cool the chip under high load. Even without magnification, the large output transistor is visible at the right side of the die. The thin wires between the pins and die are visible, including the two separate wires to the output pin.

Conclusion

I hope this article has given you a better understanding of how a voltage regulator works and what's inside a silicon chip. Perhaps it has even inspired you to saw open some chips of your own to explore the tiny world on a silicon chip for yourself. And while you sit at your computer, think of the many voltage regulators around you quietly keeping your electronics working smoothly, whether made by their supposed manufacturer or not.

Notes and references

[1] Computers usually get most of their power from switching power supplies for efficiency, but linear regulators still have their place. Older ATX power supplies used the 7805 for the 5V standby power, while others used the related 7905 and 7912 regulators for -5V and -12V. Modern computers still use linear regulators in surprising numbers. For instance the MacBook Pro (A1278) uses a low-dropout regulator to generate 1.8 volts, a switching controller with 3.3 and 5V linear regulators inside, a main switching controller with a 5V regulator inside, a low-noise 4.6V regulator for audio and another regulator to generate 3.3V for the keyboard.

[2] Earlier Arduinos such as the Arduino USB, NG and Severino were powered through a 7805 regulator. Recent Arduino models, however, use a switching step-down converter and an ultra-low-dropout 3.3V regulator. This regulator uses the same principles as the 7805, but is much more advanced.

[3] A big advantage of more modern voltage regulators is they don't require as large an input voltage. The 7805 requires at least two extra volts input (i.e. 7 volts in to produce 5 volts out) - this is the dropout voltage. Newer low-dropout (LDO) regulators can require as little as 0.1 extra volts. Modern regulators (such as the TPS796xx) also have much less noise in the output. Despite this, the 7805 is still popular, especially with hobbyists. Adafruit has a nice comparison of regulators.

[4] Depending on the application, you'd probably want to add input and output capacitors to the 7805 regulator to filter out transients due to fluctuations in the input voltage or output load.

[5] While the 7805 chip has a single layer of metal over the silicon to interconnect the circuitry, modern CPUs use many more layers of metal due to their complexity. For example, Haswell uses 11 layers while IBM's POWER8 uses an astounding 15 metal layers. Needless to say, I'm not going to figure out how those chips work with my microscope.

[6] The 7805 uses a wide variety of transistor layouts, as you can see from the labeled die photo. Several transistors in the bandgap use two emitters for one transistor (e.g. Q2, Q3, Q4, Q5) to improve matching between transistors; the PNP current mirror transistors Q11 and Q11-1 also have multiple emitters. Pairs of transistors can share a single base (e.g. Q11 and Q11-1), share a single collector (Q17 and Q18), or share both (Q14 and Q19). Some transistors move the base to the middle (e.g. Q6). To support high current, the output transistors (Q15, Q16) have a totally different, much larger structure.

[7] You might have wondered why there is a distinction between the collector and emitter of a transistor, when the simple picture of a transistor is totally symmetrical. As you can see from the die photo, the collector and emitter are very different in a real transistor. In addition to the very large size difference, the silicon doping is different. The result is a transistor will have poor gain if the collector and emitter are swapped.

[8] The resistance of a resistor in silicon is proportional to its length divided by its width. If you double the length, it's like two resistors in series, so the resistance doubles. If you double the width, it's like two resistors in parallel, so the resistance is cut in half. One convenient consequence is if the chip is scaled down (Moore's law), the resistors keep the same values, since the width and length scale equally.

Silicon resistance is measured with the unusual unit ohms per square (Ω/□). Note that there's no distance unit - it doesn't matter if you have a square millimeter or square inch of material; the resistance is the same because the dimensions cancel out. For the 7805, I estimate 140 ohms/square for the resistors.

[9] I looked at dozens of datasheets and the chip I examined almost exactly matches the schematic for the Korean Electronics KIA7805. The National LM340/LM78XX schematic is very similar

[10] Bandgap circuits usually have two stable voltages - the desired voltage and 0 volts. To keep the bandgap from getting stuck at 0 volts, a startup circuit will "push" the bandgap away from 0 volts so it will settle at the desired voltage. The startup circuit is discussed in Widlar's application note AN-42 for the similar LM109 (page 5).

[11] When building a bandgap reference, what really matters for V_BE is the current density through the transistors - the current divided by the area of the emitter. Decreasing the current through the transistor decreases the current density. The second way to decrease current density is to use a larger transistor with a larger emitter. Often five or ten identical transistors in parallel will be combined to form this large transistor to ensure the large transistor and the small transistor are exactly matched.

[12] The V_BE line for a bandgap reference is only perfectly straight in theory, so the resulting bandgap voltage will vary slightly with temperature. To increase stability, some more complex bandgap references compensate for second-order effects.

[13] Bandgap reference references: How to make a bandgap voltage reference in one easy lesson by Paul Brokaw, inventor of the Brokaw bandgap reference. A presentation on the bandgap reference is here. The Design of Band-Gap Reference Circuits: Trials and Tribulations by analog chip design legend Bob Pease discusses real-world bandgap designs.

[14] You might wonder how the error output knows what voltage to switch at. For a Darlington pair (Q7/Q8) to be active, the base voltage must go above 2V_BE (Wikipedia). The bandgap reference was constructed assuming that at the reference voltage, there will be V_BE drops across Q7 and Q8. Thus, it's not a coincidence that Darlington pair Q7/Q8 is right in the active region (2V_BE) at the bandgap voltage making the error output very sensitive to any moves away from the reference voltage. If the output voltage rises or falls, the voltage at the base of Q7 rises or falls accordingly, and the transistors greatly amplify this change. Also note that an increase in output voltage causes a decrease in the error output, yielding negative feedback for the whole chip.

[15] By all reports, Robert Widlar was an amazing analog engineer, as well as an alcoholic crazy guy. Widlar invented key analog IC circuits such as the Widlar current source as well as groundbreaking ICs such as the µA702 and µA723. In 1970 he sold his stock options for a million dollars (about 6 million adjusted for inflation) and retired to Mexico at 33. Some entertaining stories about him are here, on Wikipedia, and pictures of his sheep.

[16] Most 7805 datasheets show the same internal schematic. Some chips using the common design are Fairchild 7800 series, Hi-Sincerity H78XX, FCI LM7800, MCC MC7805, Microelectronics ML7800, Motorola MCT7800, uPC7800H, JRC NJM7800, TI uA7800, Signetics uA7800, and ST L7805. Other chips use variants of the common design: AS78XXA, UTC LM78XX, L78L05 and Motorola MC7800.

The LM109-based design of the 7805 that I looked at is very different from the common design and appears to be fairly rare; it is used by National LM340/LM7800 and KEC KIA7805AF. There are a few differences to note between this design and the original National LM109. In order to support multiple output voltages, the 7805 design uses a resistor divider and a different circuit feeding the bandgap reference. This probably also motivated the removal of a couple transistors from the bandgap circuit so its voltage is one V_BE drop lower. The startup circuit is also slightly changed.

[17] Widlar's patent on the bandgap reference is 3617859. A later patent with a bandgap reference very similar to the LM109's is 4249122.

[18] A current mirror is a very useful way of connecting transistors so the current through the second transistor matches the current through the first transistor. For more information about current mirrors, you can check Wikipedia or any analog IC book such as chapter 3 of Designing Analog Chips.

[19] Several sources give an explanation of the common 7805 design that is plausible but wrong. The faulty explanation is that Zener D1 provides the reference voltage. It feeds into a comparator built from Q13 and Q10 (or Q6) as a differential pair and Q1, Q7, and Q2 forming a current mirror active load. The most obvious problem with this is Q13, Q6, R1, and R2 are all tied together which would short out the two sides of the supposed differential pair / current mirror.

Ironically, the design of the 7905 (the negative-voltage version of the 7805) is similar to the erroneous 7805 explanation. The 7905 uses a Zener diode to provide the reference voltage. A comparator with a current mirror active load generates the error signal by comparing the reference voltage with the feedback voltage. Meanwhile another current mirror ensures a constant (probably temperature-compensated) current flows through the Zener diode. I had expected the 79XX chips would be mirror-images of the 78XX chips, but the internal design turns out to be fundamentally different. This explains why the block diagrams in 7905 datasheets show a comparator and 7805 datasheets just show an "error amplifier" box.

[20] In the common 7805 design, I believe the purpose of Q7 and R10 is to pull the same current from Q1's base that Q4 and R14 pull from Q2's base, to keep both sides balanced. Because R1 is 1KΩ and R2+R3 is 21kΩ, 21 times the current should flow through Q1 as through Q2.

Inside the Intel 1405: die photos of a shift register memory from 1970

The clock driver

Interactive viewer

Conclusion

Notes and references

Down to the silicon: how the Z80's registers are implemented

Register overview and layout

The Z80's register buses

Specifying registers in the opcodes

Swapping registers through register renaming

The flags

The WZ temporary registers

How the registers are implemented in silicon

Comparison with the 6502 and 8085

Conclusion

Notes and references

Mining Bitcoin with pencil and paper: 0.67 hashes per day

The mining process

The SHA-256 hash algorithm used by Bitcoin

Manual mining

What this means for mining hardware

Conclusion

Notes

Why the Z-80's data pins are scrambled

How the Z-80 processes instructions

Bit instructions in the Z-80

The hardware implementation of bit instructions

The data pins

Notes and references

Reverse engineering a counterfeit 7805 voltage regulator

The components of the integrated circuit

Transistors inside the IC

Resistors inside the IC

How the 7805 works

How a bandgap reference works

How the 7805's bandgap reference works

Interactive chip viewer

Why I think this chip is counterfeit

7805 history, and a look at some other designs

How a different, more popular 7805 design works

How I looked at the 7805 die, and how you can too

Conclusion

Notes and references

Don't miss a post!