Ken Shirriff's blog: September 2014

Mining Bitcoin with pencil and paper: 0.67 hashes per day

This article is now available in Japanese: 紙と鉛筆でビットコインをマイニング：1日に0.67ハッシュ and Russian: Майним Bitcoin с помощью бумаги и ручки.

I decided to see how practical it would be to mine Bitcoin with pencil and paper. It turns out that the SHA-256 algorithm used for mining is pretty simple and can in fact be done by hand. Not surprisingly, the process is extremely slow compared to hardware mining and is entirely impractical. But performing the algorithm manually is a good way to understand exactly how it works.

A pencil-and-paper round of SHA-256

The mining process

Bitcoin mining is a key part of the security of the Bitcoin system. The idea is that Bitcoin miners group a bunch of Bitcoin transactions into a block, then repeatedly perform a cryptographic operation called hashing zillions of times until someone finds a special extremely rare hash value. At this point, the block has been mined and becomes part of the Bitcoin block chain. The hashing task itself doesn't accomplish anything useful in itself, but because finding a successful block is so difficult, it ensures that no individual has the resources to take over the Bitcoin system. For more details on mining, see my Bitcoin mining article.

A cryptographic hash function takes a block of input data and creates a smaller, unpredictable output. The hash function is designed so there's no "short cut" to get the desired output - you just have to keep hashing blocks until you find one by brute force that works. For Bitcoin, the hash function is a function called SHA-256. To provide additional security, Bitcoin applies the SHA-256 function twice, a process known as double-SHA-256.

In Bitcoin, a successful hash is one that starts with enough zeros.[1] Just as it is rare to find a phone number or license plate ending in multiple zeros, it is rare to find a hash starting with multiple zeros. But Bitcoin is exponentially harder. Currently, a successful hash must start with approximately 17 zeros, so only one out of 1.4x10²⁰ hashes will be successful. In other words, finding a successful hash is harder than finding a particular grain of sand out of all the grains of sand on Earth.

The following diagram shows a block in the Bitcoin blockchain along with its hash. The yellow bytes are hashed to generate the block hash. In this case, the resulting hash starts with enough zeros so mining was successful. However, the hash will almost always be unsuccessful. In that case, the miner changes the nonce value or other block contents and tries again.

Structure of a Bitcoin block

The SHA-256 hash algorithm used by Bitcoin

The SHA-256 hash algorithm takes input blocks of 512 bits (i.e. 64 bytes), combines the data cryptographically, and generates a 256-bit (32 byte) output. The SHA-256 algorithm consists of a relatively simple round repeated 64 times. The diagram below shows one round, which takes eight 4-byte inputs, A through H, performs a few operations, and generates new values of A through H.

One round of the SHA-256 algorithm showing the 8 input blocks A-H, the processing steps, and the new blocks. Diagram created by kockmeyer, CC BY-SA 3.0.

The blue boxes mix up the values in non-linear ways that are hard to analyze cryptographically. Since the algorithm uses several different functions, discovering an attack is harder. (If you could figure out a mathematical shortcut to generate successful hashes, you could take over Bitcoin mining.)

The Ma majority box looks at the bits of A, B, and C. For each position, if the majority of the bits are 0, it outputs 0. Otherwise it outputs 1. That is, for each position in A, B, and C, look at the number of 1 bits. If it is zero or one, output 0. If it is two or three, output 1.

The Σ0 box rotates the bits of A to form three rotated versions, and then sums them together modulo 2. In other words, if the number of 1 bits is odd, the sum is 1; otherwise, it is 0. The three values in the sum are A rotated right by 2 bits, 13 bits, and 22 bits.

The Ch "choose" box chooses output bits based on the value of input E. If a bit of E is 1, the output bit is the corresponding bit of F. If a bit of E is 0, the output bit is the corresponding bit of G. In this way, the bits of F and G are shuffled together based on the value of E.

The next box Σ1 rotates and sums the bits of E, similar to Σ0 except the shifts are 6, 11, and 25 bits.

The red boxes perform 32-bit addition, generating new values for A and E. The input W_t is based on the input data, slightly processed. (This is where the input block gets fed into the algorithm.) The input K_t is a constant defined for each round.[2]

As can be seen from the diagram above, only A and E are changed in a round. The other values pass through unchanged, with the old A value becoming the new B value, the old B value becoming the new C value and so forth. Although each round of SHA-256 doesn't change the data much, after 64 rounds the input data will be completely scrambled.[3]

Manual mining

The video below shows how the SHA-256 hashing steps described above can be performed with pencil and paper. I perform the first round of hashing to mine a block. Completing this round took me 16 minutes, 45 seconds.

To explain what's on the paper: I've written each block A through H in hex on a separate row and put the binary value below. The maj operation appears below C, and the shifts and Σ0 appear above row A. Likewise, the choose operation appears below G, and the shifts and Σ1 above E. In the lower right, a bunch of terms are added together, corresponding to the first three red sum boxes. In the upper right, this sum is used to generate the new A value, and in the middle right, this sum is used to generate the new E value. These steps all correspond to the diagram and discussion above.

I also manually performed another hash round, the last round to finish hashing the Bitcoin block. In the image below, the hash result is highlighted in yellow. The zeroes in this hash show that it is a successful hash. Note that the zeroes are at the end of the hash. The reason is that Bitcoin inconveniently reverses all the bytes generated by SHA-256.[4]

Last pencil-and-paper round of SHA-256, showing a successfully-mined Bitcoin block.

What this means for mining hardware

Each step of SHA-256 is very easy to implement in digital logic - simple Boolean operations and 32-bit addition. (If you've studied electronics, you can probably visualize the circuits already.) For this reason, custom ASIC chips can implement the SHA-256 algorithm very efficiently in hardware, putting hundreds of rounds on a chip in parallel. The image below shows a mining chip that runs at 2-3 billion hashes/second; Zeptobars has more photos.

The silicon die inside a Bitfury ASIC chip. This chip mines Bitcoin at 2-3 Ghash/second. Image from Zeptobars. (CC BY 3.0)

In contrast, Litecoin, Dogecoin, and similar altcoins use the scrypt hash algorithm, which is intentionally designed to be difficult to implement in hardware. It stores 1024 different hash values into memory, and then combines them in unpredictable ways to get the final result. As a result, much more circuitry and memory is required for scrypt than for SHA-256 hashes. You can see the impact by looking at mining hardware, which is thousands of times slower for scrypt (Litecoin, etc) than for SHA-256 (Bitcoin).

Conclusion

The SHA-256 algorithm is surprisingly simple, easy enough to do by hand. (The elliptic curve algorithm for signing Bitcoin transactions would be very painful to do by hand since it has lots of multiplication of 32-byte integers.) Doing one round of SHA-256 by hand took me 16 minutes, 45 seconds. At this rate, hashing a full Bitcoin block (128 rounds)[3] would take 1.49 days, for a hash rate of 0.67 hashes per day (although I would probably get faster with practice). In comparison, current Bitcoin mining hardware does several terahashes per second, about a quintillion times faster than my manual hashing. Needless to say, manual Bitcoin mining is not at all practical.[5]

A Reddit reader asked about my energy consumption. There's not much physical exertion, so assuming a resting metabolic rate of 1500kcal/day, manual hashing works out to almost 10 megajoules/hash. A typical energy consumption for mining hardware is 1000 megahashes/joule. So I'm less energy efficient by a factor of 10^16, or 10 quadrillion. The next question is the energy cost. A cheap source of food energy is donuts at $0.23 for 200 kcalories. Electricity here is $0.15/kilowatt-hour, which is cheaper by a factor of 6.7 - closer than I expected. Thus my energy cost per hash is about 67 quadrillion times that of mining hardware. It's clear I'm not going to make my fortune off manual mining, and I haven't even included the cost of all the paper and pencils I'll need.

2017 edit: My Bitcoin mining on paper system is part of the book The Objects That Power the Global Economy, so take a look.

Follow me on Twitter to find out about my latest blog posts.

Notes

[1] It's not exactly the number of zeros at the start of the hash that matters. To be precise, the hash must be less than a particular value that depends on the current Bitcoin difficulty level.

[2] The source of the constants used in SHA-256 is interesting. The NSA designed the SHA-256 algorithm and picked the values for these constants, so how do you know they didn't pick special values that let them break the hash? To avoid suspicion, the initial hash values come from the square roots of the first 8 primes, and the K_t values come from the cube roots of the first 64 primes. Since these constants come from a simple formula, you can trust that the NSA didn't do anything shady (at least with the constants).

[3] Unfortunately the SHA-256 hash works on a block of 512 bits, but the Bitcoin block header is more than 512 bits. Thus, a second set of 64 SHA-256 hash rounds is required on the second half of the Bitcoin block. Next, Bitcoin uses double-SHA-256, so a second application of SHA-256 (64 rounds) is done to the result. Adding this up, hashing an arbitrary Bitcoin block takes 192 rounds in total. However there is a shortcut. Mining involves hashing the same block over and over, just changing the nonce which appears in the second half of the block. Thus, mining can reuse the result of hashing the first 512 bits, and hashing a Bitcoin block typically only requires 128 rounds.

[4] Obviously I didn't just have incredible good fortune to end up with a successful hash. I started the hashing process with a block that had already been successfully mined. In particular I used the one displayed earlier in this article, #286819.

[5] Another problem with manual mining is new blocks are mined about every 10 minutes, so even if I did succeed in mining a block, it would be totally obsolete (orphaned) by the time I finished.

Why the Z-80's data pins are scrambled

If you look closely at the datasheet for a Z-80 chip, you'll notice the data pins are in a random-looking order. The address pins (A) are nicely arranged in order counterclockwise from 0 to 15, but the data pins (D) are all shuffled around.[1] After studying the internals of the chip, I have a hypothesis to explain this.

Pinout of the Z-80, from the Zilog Data Book.

I have been reverse-engineering the Z-80 processor using images and data from the Visual 6502 team. The image below is a photograph of the Z-80 die. Around the outside of the chip are the pads that connect to the external pins. (The die photo is rotated 180° compared to the datasheet pinout, if you try to match up the pins.) At the right are the 8 data pins for the Z-80's 8-bit data bus in a strange order.

The 8-bit data bus in the Z-80 is used for communication among the different parts of the chip. But instead of a single data bus, the Z-80 has a complex data bus that is split into 3 segments. The first segment of the data bus (in red) connects to the data pins to the instruction decode logic. The first segment is also connected to the second segment (green). The green data bus provides access to the lower byte of registers and is also connected to the fourth segment of the data bus (orange). The orange data bus is connected to the high byte of registers and also to the ALU (Arithmetic Logic Unit)[2]. Note that because the green segment splits off from the red segment, only half of the red bus (4 bits) goes down to the lower part of the chip. [There was an extra segment in an earlier version of this article.]

The Z-80's silicon die, showing the data and address pins, data buses and other internal components.

The motivation behind splitting the data bus is to allow the chip to perform activities in parallel. For instance an instruction can be read from the data pins into the instruction logic at the same time that data is being copied between the ALU and registers. The partitioned data bus is described briefly in the Z-80 oral history[3], but doesn't appear in architecture diagrams.

The complex structure of the data buses is closely connected to the ordering of the data pins. But before explaining the data pin layout, a few more features of the Z-80 need to be discussed.

How the Z-80 processes instructions

To execute an instruction, the Z-80 loads the instruction from memory through the data pins and feeds it into the instruction decode logic via the red segment of the data bus. First, the instruction is stored from the data bus into the instruction register, which is a simple latch that holds the instruction while it is being executed. This feeds the instruction into the PLA (Programmable Logic Array), which decodes the instruction into approximately 98 different categories (details). The instruction logic below the PLA combines these signals with timing signals and determines exactly what should happen at what step. This logic generates the control signals that control the operation of the register file, ALU, and other parts of the chip.

Since the Z-80 is an 8-bit processor, instruction op codes are 8 bits long. Many of the instructions have the bits arranged as follows:

ggiiirrr

In that arrangement, the two gg bits select a group of instructions (e.g. load or arithmetic), the next three iii bits select the particular instruction, and the last three rrr bits select the register to use. There are many exceptions to this format, but it provides an underlying structure. (This instruction structure was inherited from the 8080 microprocessor, since the Z-80 was designed to be backwards compatible with it.)

Bit instructions in the Z-80

One feature the Z-80 has that goes beyond the 8080 is instructions to set, clear, or test a single bit in a register.[4] These instructions fit the pattern described above, with the top two bits in the instruction selecting the test, clear, or set function, the next three bits in the instruction selecting which bit in the byte to operate on, and the final three bits selecting the register. That is, bits 5, 4, and 3 of the instruction select which of the eight bits in the register to operate on.

The processing of the Z-80's bit operations is unusual compared to other instructions. While most of the instruction execution is controlled by the same instruction decoding logic described above, the bit selection is done by feeding the three instruction bits directly into the ALU, bypassing the instruction decoding logic entirely. That is, there are simple circuits (at the right side of the ALU) to select one of the 8 bits, depending on the instruction that was read in. In the diagram of the chip, you can see the connection between the data bus (red) and the ALU to accomplish this.

The hardware to do the bit selection is fairly straightforward. There are eight 3-input NOR gates, each looking at a different combination of the instruction bits, either inverted or non-inverted. For example, an instruction that operates on bit 2 will have an opcode of the form gg010rrr (since 010 is binary 2). Instruction bit 5 is 0, instruction bit 4 is 1, and instruction bit 3 is 0. (Don't confuse the bits in the instruction with the bit being selected.) In logic, this becomes:

modify_data_bit_2 = (NOT bit5) AND bit4 AND (NOT bit3).

It turns out that NOR gates are easiest to build in silicon (as will be explained below), so the logic in hardware is the equivalent:

modify_data_bit_2 = bit5 NOR (NOT bit4) NOR bit3.

Selection of the other 7 bits is done with similar functions of the instruction bits bit5, bit4, and bit3.

The hardware implementation of bit instructions

For a bit operation, one of the 8 bits will be selected and fed into the ALU. The ALU will then test, clear, or set that bit in the appropriate register. Below is a zoomed-in look at the portion of the die that selects bit 2. This is in the lower right corner of the chip, to the right of the ALU by the D3 pad. The white vertical stripes are metal lines, providing the data lines, control lines, and power and ground. Underneath the metal is the polysilicon layer. Underneath this is silicon layer, where the transistors are. It's hard to make out the polysilicon and silicon structures in this photo, but at the left you can see the horizontal polysilicon bus lines for ALU bits 5 and 2. These lines provide data flow through the ALU, and are how the selected bit is fed into the ALU.

The circuitry in the Z-80 to handle bit operations on bit 2.

The data bus provides bits 5, 4, and 3 to this part of the chip. (Just to make thing confusing, the data on the Z-80's data bus is inverted, which is indicated by a slash.) Underneath this bus is the NOR gate (outlined in blue) that computes the function described earlier: bit5 NOR (NOT bit4) NOR bit3. The inverter to bit3 from the inverted bit3 on the data bus is also visible (outlined in yellow). A buffer (green) strengthens this signal. The "ALU load bit value" control line is activated by the instruction decode logic; this control line allows the selected bit to pass into the ALU only for a bit operation.

A NOR gate is a simple circuit when implemented with MOS transistors, as the schematic below shows. The transistors can be thought of as switches that close if their gate (middle connection) receives a 1 input. In the circuit below, if any of the inputs are 1, the corresponding transistor will connect the output to ground, and the output will be 0. Otherwise, the resistor (which is actually a special depletion-mode transistor) will pull the output high and the output is 1. Thus, the output is the NOR of the three inputs.

Schematic of a 3-input NOR gate in the Z-80.

The diagram below shows how the above NOR gate is actually implemented in silicon. The diagram is a zoomed-in version of the image above, focusing on the NOR gate (blue outline). Instead of a photograph, the diagram shows the different layers in the chip as extracted by the Visual 6502 team: blue is metal, brown is polysilicon, green is silicon, and orange is a connection between layers. A transistor is formed when polysilicon crosses silicon.

Implementation of a 3-input NOR gate in the Z-80 chip.

The "T" symbols indicate the three transistors that are connected to ground (as shown by yellow arrows). The transistors are all connected together in the middle, and the final yellow arrow shows the connection to the output. Finally, the pull-up resistor is at the lower left. The cyan outline matches the outline in the die photo and with difficulty you can find the structures in the photo.

The important thing to notice in the diagram above is that everything is packed together as tightly as possible to get the Z-80 to fit on the available silicon. The layout of the Z-80 was done by hand, with each transistor and connection manually positioned. Every possible trick was used to minimize space - for example, each transistor above is oriented in a different direction. Drafting this layout was an extremely time-consuming task that took Zilog founder Federico Faggin 3 1/2 months of 80-hour weeks[3]. (Yes, the CEO drafted the chip himself!) But you can see from the result that there is very little wasted space in the chip.

The data pins

This article has looked at many different aspects of the Z-80 design, and now it's time to see how they constrain the position of the data pins. First, because the Z-80 splits the data bus into multiple segments, only four data lines run to the lower right corner of the chip. And because the Z-80 was very tight for space, running additional lines would be undesirable. Next, the BIT instructions use instruction bits 3, 4, and 5 to select a particular bit. This was motivated by the instruction structure the Z-80 inherited from the 8080. Finally, the Z-80's ALU requires direct access to instruction bits 3, 4, and 5 to select the particular data bit. Putting these factors together, data pins 3, 4, and 5 are constrained to be in the lower right corner of the chip next to the ALU. This forces the data pins to be out of sequence, and that's why the Z-80 has out-of-order data pins.[5]

Credits: The chip analysis couldn't have been done without the Visual 6502 team especially Chris Smith, Ed Spittles, Pavel Zima, Phil Mainwaring, and Julien Oster.

Notes and references

[1] Even though the Z-80 has out-of-order data pins, it is an improvement over the 8080, where both address and data pins are in a strange order. The 6502, on the other hand, has a nice linear order for its pins.

[2] Unexpectedly, the Z-80's ALU is 4 bits wide. I've written up details here.

[3] The Computer History Museum created an oral history of the Z-80, which is very interesting. A couple parts of it are especially relevant to this article. Page 10 discusses the segmented data bus. Pages 5, 9, and 19 discuss Zilog CEO Federico Faggin laying out the chip over several months. One interesting story is how he ran out of room and had to erase two weeks of work and start over. In the end he completed the layout with only a couple mils of space left.

[4] The Z-80 has multiple operations to set, clear, or test a single bit. These instructions are expressed by two bytes. The first byte is the prefix CB, and the second byte is the specific instruction. The top two bits (ii) of the instruction are 01 for BIT (test bit), 10 for RES (reset bit), and 11 for SET (set bit). For more details, see the Z-80 User Manual, page 240.

[5] Even with pins 3, 4, and 5 out of order, the Z-80 could have used a "semi-linear" sequence such as 0,1,2,6,7,3,4,5. Why didn't the Z-80 do this? My hypothesis is that once some pins were forced out of sequence, the Z-80's designers decided to take advantage of any other micro-optimizations from reordering the pins. For example, pins D0 and D1 have their drivers in order on the chip, but the routing from the drivers to the pins swaps the order to avoid crossing. Pin D7 is probably where it is because its driver lines up well with bit 7 in the PLA. Switching the positions of pins D3 and D4 would make the routing a tiny bit longer.

There are a bunch of good comments on this article at Hacker News.

Reverse engineering a counterfeit 7805 voltage regulator

Update: It turns out my 7805 isn't counterfeit. eclectro did an in-depth search (details on reddit) and found an old 7805 datasheet from Thomson Semiconductors that exactly matches my chip. And Thomson is the T in STMicroelectronics. So that explains how this die ended up with a ST label. More in this thread.

Under a microscope, a silicon chip is a mysterious world with puzzling shapes and meandering lines zigzagging around, as in the magnified image of a 7805 voltage regulator below. But if you study the chip closely, you can identify the transistors, resistors, diodes, and capacitors that make it work and even understand how these components function together. This article explains how the 7805 voltage regulator works, all the way down to how the transistors on the silicon operate. And while exploring the chip, I discovered that it is probably counterfeit.

Die photograph of a 7805 voltage regulator. Click to enlarge.

A voltage regulator takes an unregulated input voltage and converts it to the exact regulated voltage an electronic circuit requires. Voltage regulators are used in almost every electronic circuit, and the popular 7805 has been used everywhere from computers[1] to satellites, from DVD player and video games to Arduinos[2]. and robots. Even though it was introduced in 1972 and more advanced regulators[3] are now available, the 7805 is still in use, especially with hobbyists.

The 7805 is a common type of regulator known as a linear regulator. (As its name hints, the 7805 produces 5 volts.) A linear regulator is built around a large transistor that controls the amount of power flowing to the output, acting similar to a variable resistor. (This transistor is visible in the right half of the die photo above.) A drawback of a linear regulator is that all the "extra" voltage gets converted into heat. If you put 9 volts into a linear regulator and get 5 volts out, the extra 4 volts gets turned into heat in the regulator, so the regulator is only about 56% efficient. (The main competitor to a linear regulator is a switching power supply - a much more efficient, but much more complicated way to produce regulated voltage. Switching power supplies have replaced linear regulators in many applications, such as phone chargers and computer power supplies.)

A 7805 voltage regulator in a metal TO-3 package. The 7805 is more commonly found in a smaller plastic package.

Linear regulators such as the 7805 became very popular because they are extremely easy to use: just feed the unregulated voltage into one pin, ground the second pin, and get regulated voltage out the third pin[4]. Another feature that made the 7805 popular is it is almost indestructible - if you short-circuit it, put too much voltage in, or run it too hot, it will shut down before getting damaged, due to internal protection circuits.

The components of the integrated circuit

Like most chips, the 7805 is built from a tiny piece of silicon. To make the chip function, a process called doping treats regions of the silicon with elements such as phosphorus or boron. In the die photo, these regions have a slightly different color, which makes the structure of the chip visible. Phosphorus gives the region excess electrons (i.e. negative), so it is known as N silicon. Boron has the opposite effect, creating positive P silicon. The amount of doping in a silicon chip is surprisingly small, varying from 1 foreign atom for every thousand atoms of silicon down to one foreign atom per billion atoms of silicon. Because silicon is so sensitive to impurities, the original silicon wafer must be an insanely pure crystal, up to 99.999999999% pure - a level known as eleven nines.

On top of the silicon, a thin layer of metal connects different parts of the chip. This metal is clearly visible in the die photo as white traces and regions.[5] A thin, glassy silicon dioxide layer provides insulation between the metal and the silicon, except where rectangular contact holes in the silicon dioxide allow the metal to connect to the silicon. Around the edge of the chip, thin wires connect the metal pads to the chip's external pins - the black blobs in the photo show where the wires were attached.

Transistors inside the IC

Transistors are the key components in the chip. The 7805 uses NPN and PNP bipolar transistors (unlike digital chips which usually have CMOS transistors). If you've studied electronics, you've probably seen a diagram of a NPN transistor like the one below, showing the collector (C), base (B), and emitter (E) of the transistor, The transistor is illustrated as a sandwich of P silicon in between two symmetric layers of N silicon; the N-P-N layers make a NPN transistor. It turns out that transistors on a chip look nothing like this, and the base often isn't even in the middle!

An NPN transistor and its oversimplified structure.

The photo below shows one of the transistors in the 7805 as it appears on the chip.[6] The different brown and purple colors are regions of silicon that has been doped differently, forming N and P regions. The gray areas are the metal layer of the chip on top of the silicon - these form the wires connecting to the collector, emitter, and base.

Structure of a NPN transistor inside the 7805 voltage regulator.

Underneath the photo is a cross-section drawing showing approximately how the transistor is constructed. There's a lot more than just the N-P-N sandwich you see in books, but if you look carefully at the vertical cross section below the 'E', you can find the N-P-N that forms the transistor. The emitter (E) wire is connected to N+ silicon. Below that is a P layer connected to the base contact (B). And below that is a N+ layer connected (indirectly) to the collector (C).[7] The transistor is surrounded by a P+ ring that isolates it from neighboring components.

Resistors inside the IC

Resistors are a key component of analog chips and are formed from strips of silicon doped to have high resistance. The photo below shows two resistors in the 7805 voltage regulator, formed from greenish-purple strips of P silicon. (The gray metals strips connect to the resistors at the square contacts and wire the resistors to other parts of the chip.) The value of the resistor is proportional to its length[8], so the short resistor on the right (850Ω) is smaller than the meandering resistor on the left (4000Ω). Resistors with large values take up an inconveniently large area on the chip - in the top left of the die photo you can see the serpentine path of an 80KΩ resistor.

Two resistors on the 7805 voltage regulator's silicon die.

How the 7805 works

I've colored the following schematic[9] to indicate the main blocks of the 7805 regulator. The heart of the 7805 chip is a large transistor that controls the current between the input and output, and thus controls the output voltage. This transistor (Q16) is red on the diagram below. On the die, it takes up most of the right half of the chip because it needs to handle over 1 amp of current.

Components of the 7805 regulator: bandgap (yellow), error amp (orange), output transistor (red), protection (purple), startup (green).

The bandgap reference (yellow) is what keeps the voltage stable. It takes the scaled output voltage as input (Q1 and Q6), and provides an error signal (to Q7) indicating if the voltage is too high or too low. The key feature of the bandgap is it provides a stable and accurate reference, even as the chip's temperate changes. The next section will discuss the bandgap in detail.

The error signal from the bandgap reference is amplified by the error amplifier (orange). The amplified signal controls the output transistor through large driver Q15. This closes the negative feedback loop that controls the output voltage. The startup circuit (green) provides initial current to the bandgap circuit, so it doesn't get stuck in an off state.[10] The circuits in purple provide protection against overheating (Q13), excessive input voltage (Q19), and excessive output current (Q14). If there is a fault, these circuits reduce the output current or shut down the regulator, protecting it from damage.

The voltage divider (blue) scales down the voltage on the output pin for use by the bandgap reference. It has an interesting implementation that allows different chips in the 78XX family to produce different voltages. (For instance 12 volts from the 7812 and 24 volts from the 7824.) The image below shows the square contacts between the metal (white) and the resistor (turquoise) that control the values of R20 and R21. For a different regulator, a simple change to the position of the variable contact increases the resistance of R20 and thus the output voltage of the chip.

The feedback voltage divider inside the 7805 voltage regulator consists of two resistors.

How a bandgap reference works

The main problem with producing a stable voltage from an IC is the chip's parameters change as temperature changes: it's no good if your 5 volt phone charger starts producing 3 or 7 volts on a hot day. The trick to building a stable voltage reference is to create one voltage that goes down with temperature and another than goes up with temperature. If you add them together correctly, you get a voltage that is stable with temperature. This circuit is called a "bandgap reference".

To create a voltage that goes down with temperature, you put a constant current through the transistor and look at the voltage between the base and emitter, called V_BE. The graph below shows how this voltage drops as the temperature increases. At the left, the line hits the bandgap voltage of silicon, about 1.2 volts; this will be important later.

Vbe vs temperature for a transistor

If you set up a second transistor this way but with a lower current[11], you get the same effect but the voltage V_BE curve drops faster. This may not seem helpful since we need a voltage that goes up with temperature. But here's the trick: if you subtract the two V_BE voltages, the difference increases as temperature increases, since the lines get farther apart. The difference is called ΔV_BE. The graph below shows the V_BE curves for two different transistors, and you can see how the difference ΔV_BE between the curves increases with temperature, even though both curves decrease with temperature.

Voltages in a bandgap reference: Vbe for two transistors as temperature changes.

The final step to a bandgap reference is to combine V_BE and ΔV_BE in the right ratio so the result is constant with temperature. It turns out that if the values sum to the bandgap voltage, the drop in V_BE and the increase in ΔV_BE cancel out. In the graph below, adding 10 copies of ΔV_BE is the right ratio; the exact ratio depends on the particular transistors. The important thing to notice in the graph below is that as the temperature changes, V_BE+nΔV_BE remains constant - the top of the of purple ΔV_BEs remains at the bandgap voltage.

By adding multiples of ΔVbe to Vbe, the bandgap voltage is reached regardless of temperature.[12]

How the 7805's bandgap reference works

The 7805's bandgap reference uses the above bandgap principles, but there are several important differences. First, the bandgap voltage in practice turns out to be about 1.25 volts instead of 1.2. Second, the 7805's bandgap creates a larger (and thus more accurate) 2ΔV_BE by taking the difference between two high-current V_BEs and two low-current V_BEs. Finally, 2ΔV_BE is scaled and added to three V_BEs to form three times the bandgap voltage, or about 3.75V.

The diagram below shows the 7805's bandgap circuit with arrows showing voltage changes (not currents). Starting at ground, the red arrow shows an increase of (large) V_BE across Q3, and another (large)V_BE across Q2. The green arrows show drops of (small) V_BE across Q4 and Q5. The result is the difference 2ΔV_BE ends up across R6.

The next step is very important as it scales up the voltage. The current through R7 will be the same as the current through R6 (ignoring small base currents). But R7 is 16.5 times as large as R6, so by Ohm's law, the voltage across R7 will be 16.5 times as large, i.e. 33ΔV_BE.

Finally, we can see the bandgap's voltage by looking at the purple lines. Starting at ground, the voltage goes up by V_BE across Q8, another V_BE across Q7, then the R7 voltage, and finally a third V_BE across Q6. Assuming the chip designers picked the scale factor of 33 correctly, the final voltage will be three bandgap voltages, or 3.75V.[13] (Vin here is the voltage input to the bandgap, not the voltage input to the 7805.)

How the bandgap voltage is generated in the 7805 voltage regulator.

A traditional bandgap circuit generates a stable reference voltage, but discussions of bandgaps usually ignore a big issue: in devices such as the 7805 or the TL431, the bandgap circuit does not generating a stable reference voltage. Instead, the 7805's bandgap works "backwards". The 7805's scaled output voltage provides the input voltage (Vin) to the bandgap reference, and the bandgap provides an error signal as output. The 7805's bandgap circuit removes the feedback loop that exists inside a traditional bandgap reference. Instead, the entire chip becomes the feedback loop.

In more detail, if the output voltage is correct (5V), then the voltage divider provides 3.75V at Vin, and the V_BE and ΔV_BE voltages are as described above. If the output voltage rises or falls slightly, this change propagates through Q6 and R7, causing the voltage at the base of Q7 to rise or fall accordingly. This change is amplified by Q7 and Q8, generating the error output.[14] The error output, in turn, decreases or increases the current through the output transistor, and this negative feedback loop adjusts the output voltage until it is correct.

Interactive chip viewer

The image and schematic[9] below are an interactive exploration of the 7805. Click a component to see its location on the die and in the schematic highlighted. The box below will give an explanation of the component. For transistors, the emitter, base, and collector will be indicated on the die.

Why I think this chip is counterfeit

The outside of the package has the ST Microelectronics logo, but for several reasons I think the chip is counterfeit and manufactured by someone else. First, on the die itself (below) there is no ST logo, no mask copyright, and no manufacturer information at all. (I have no explanation for why the die is labeled 2805 and not 7805, or what P414 means.) In addition, the circuit on the die is totally different from the internal circuit in the ST Microelectronics 7805 datasheet. The metal of the package looks grainy and low quality. Finally, I bought the part off eBay, not from a reputable supplier, so it could have come from anywhere. For these reasons, I conclude that the part I got is counterfeit and not a genuine ST Microelectronics LM7805. From what I hear, there's a lot of semiconductor counterfeiting happening so I'm not surprised to get a counterfeit part. (But see a dissenting opinion.)

Label on the die of a 7805 voltage regulator.

7805 history, and a look at some other designs

I had assumed that all 7805 chips were pretty much the same. But one surprise from studying datasheets is that different manufacturers use totally different internal circuitry for the same 7805 chip and the name "7805" doesn't mean much more than "some sort of 5 volt regulator."

To explain this, I'll start with a brief history of voltage regulators. Simple IC voltage regulators got their start way back in 1968 when Fairchild introduced the µA723 voltage regulator, which used a temperature-compensated Zener diode to provide an adjustable voltage. In 1969 analog design genius Robert Widlar[15] developed the National LM109 5-volt regulator, which was much simpler to use. It was followed in 1972 by Fairchild's 7800 series of voltage regulators, ranging from 5 volts to 24 volts. In 1973 National came out with an improved regulator series, the LM340-XX.

From this history, you'd expect that there's a LM109 design, a 7805 design, and a LM340 design. However, it turns out that the part numbers are really just marketing, and have little to do with what's inside the chip. Some 7805s are closer to the LM109 than to other 7805s, and some LM340s are closer to 7805s than to other LM340s.

For instance, the Fairchild µA109 uses the common Fairchild 7800 series design. On the other hand, the National LM7805 is very different from the Fairchild 7805, but is identical to the National LM340, even sharing the same datasheet. This design is very close to the original National LM109, so in effect National sold the same design under three different names.[16] Thus, it looks like companies reuse the same voltage regulator design, changing little more than the part number between devices. I suspect manufacturers are constrained by patents[17], so they use the part numbers they want on the devices they can make.

How a different, more popular 7805 design works

It turns out that 7805 design I reverse-engineered above is fairly rare, and most 7805 chips use a different design, shown below.[16] While the overall architecture of this design is similar to the LM109-derived 7805 chip I examined, most of the pieces have substantial changes. The current mirror[18], the startup circuit, the bandgap regulator, and the protection circuitry are all different.

Internal schematic of the Signetics µA7805 regulator from the datasheet.

Since this design is so popular, I'll give a brief explanation of how its bandgap circuit works.[19] In the figure below, there's a large V_BE (red arrow) across high-current transistor Q1, and a small V_BE (green arrow) across low-current transistor Q2. Thus, ΔV_BE appears across R3, generating a current through R3, Q2, and R2. Since R2 has 20 times the resistance as R3, 20ΔV_BE appears across R2, by Ohm's law.

Now, to find the temperature-compensated stable voltage for this circuit, follow the blue arrows up from ground. (As before, the arrows do not indicate current flow, and Vin is the input to the bandgap not the chip.) Going through Q3, Q4, R2, Q5 and Q6, the voltages sum to 4V_BE+20ΔV_BE. Since there are four V_BEs, the circuit must be designed for four times the bandgap voltage, or approximately 5V. Thus, this circuit's stable point is 5V. At this voltage, the error amplifying transistors (Q4/Q3) will be in the active region and will respond to any variation away from it.[20]

How the bandgap voltage is generated in the Signetics 7805 regulator.

How I looked at the 7805 die, and how you can too

Usually getting the die out of an IC requires concentrated acid to dissolve the epoxy package. But some ICs, such as the 7805, are available in metal cans which can be easily opened with a hacksaw. I used a metallurgical microscope for my die photos, but even a basic middle-school microscope shows you the metal layer at at low magnification. If you're at all interested in IC structure, or want to show kids what ICs look like inside, you should get an IC in a metal can, saw it open yourself, and take a look. (But first read the warning about beryllium inside some chips.) Many different ICs in metal cans are available for under $5 on eBay; search for "TO-99 IC". I find older chips such as the 7805 are better for this than modern chips: the simpler circuits and larger features make it easier to see the internals.

Inside a 7805 voltage regulator. The tiny silicon die is visible in the middle of the TO-5 package.

The photo above shows the 7805 regulator after removing the top with a hacksaw. The metal package is almost entirely empty inside - the silicon die is very small compared to the space available. The metal acts as an effective heat sink to cool the chip under high load. Even without magnification, the large output transistor is visible at the right side of the die. The thin wires between the pins and die are visible, including the two separate wires to the output pin.

Conclusion

I hope this article has given you a better understanding of how a voltage regulator works and what's inside a silicon chip. Perhaps it has even inspired you to saw open some chips of your own to explore the tiny world on a silicon chip for yourself. And while you sit at your computer, think of the many voltage regulators around you quietly keeping your electronics working smoothly, whether made by their supposed manufacturer or not.

Notes and references

[1] Computers usually get most of their power from switching power supplies for efficiency, but linear regulators still have their place. Older ATX power supplies used the 7805 for the 5V standby power, while others used the related 7905 and 7912 regulators for -5V and -12V. Modern computers still use linear regulators in surprising numbers. For instance the MacBook Pro (A1278) uses a low-dropout regulator to generate 1.8 volts, a switching controller with 3.3 and 5V linear regulators inside, a main switching controller with a 5V regulator inside, a low-noise 4.6V regulator for audio and another regulator to generate 3.3V for the keyboard.

[2] Earlier Arduinos such as the Arduino USB, NG and Severino were powered through a 7805 regulator. Recent Arduino models, however, use a switching step-down converter and an ultra-low-dropout 3.3V regulator. This regulator uses the same principles as the 7805, but is much more advanced.

[3] A big advantage of more modern voltage regulators is they don't require as large an input voltage. The 7805 requires at least two extra volts input (i.e. 7 volts in to produce 5 volts out) - this is the dropout voltage. Newer low-dropout (LDO) regulators can require as little as 0.1 extra volts. Modern regulators (such as the TPS796xx) also have much less noise in the output. Despite this, the 7805 is still popular, especially with hobbyists. Adafruit has a nice comparison of regulators.

[4] Depending on the application, you'd probably want to add input and output capacitors to the 7805 regulator to filter out transients due to fluctuations in the input voltage or output load.

[5] While the 7805 chip has a single layer of metal over the silicon to interconnect the circuitry, modern CPUs use many more layers of metal due to their complexity. For example, Haswell uses 11 layers while IBM's POWER8 uses an astounding 15 metal layers. Needless to say, I'm not going to figure out how those chips work with my microscope.

[6] The 7805 uses a wide variety of transistor layouts, as you can see from the labeled die photo. Several transistors in the bandgap use two emitters for one transistor (e.g. Q2, Q3, Q4, Q5) to improve matching between transistors; the PNP current mirror transistors Q11 and Q11-1 also have multiple emitters. Pairs of transistors can share a single base (e.g. Q11 and Q11-1), share a single collector (Q17 and Q18), or share both (Q14 and Q19). Some transistors move the base to the middle (e.g. Q6). To support high current, the output transistors (Q15, Q16) have a totally different, much larger structure.

[7] You might have wondered why there is a distinction between the collector and emitter of a transistor, when the simple picture of a transistor is totally symmetrical. As you can see from the die photo, the collector and emitter are very different in a real transistor. In addition to the very large size difference, the silicon doping is different. The result is a transistor will have poor gain if the collector and emitter are swapped.

[8] The resistance of a resistor in silicon is proportional to its length divided by its width. If you double the length, it's like two resistors in series, so the resistance doubles. If you double the width, it's like two resistors in parallel, so the resistance is cut in half. One convenient consequence is if the chip is scaled down (Moore's law), the resistors keep the same values, since the width and length scale equally.

Silicon resistance is measured with the unusual unit ohms per square (Ω/□). Note that there's no distance unit - it doesn't matter if you have a square millimeter or square inch of material; the resistance is the same because the dimensions cancel out. For the 7805, I estimate 140 ohms/square for the resistors.

[9] I looked at dozens of datasheets and the chip I examined almost exactly matches the schematic for the Korean Electronics KIA7805. The National LM340/LM78XX schematic is very similar

[10] Bandgap circuits usually have two stable voltages - the desired voltage and 0 volts. To keep the bandgap from getting stuck at 0 volts, a startup circuit will "push" the bandgap away from 0 volts so it will settle at the desired voltage. The startup circuit is discussed in Widlar's application note AN-42 for the similar LM109 (page 5).

[11] When building a bandgap reference, what really matters for V_BE is the current density through the transistors - the current divided by the area of the emitter. Decreasing the current through the transistor decreases the current density. The second way to decrease current density is to use a larger transistor with a larger emitter. Often five or ten identical transistors in parallel will be combined to form this large transistor to ensure the large transistor and the small transistor are exactly matched.

[12] The V_BE line for a bandgap reference is only perfectly straight in theory, so the resulting bandgap voltage will vary slightly with temperature. To increase stability, some more complex bandgap references compensate for second-order effects.

[13] Bandgap reference references: How to make a bandgap voltage reference in one easy lesson by Paul Brokaw, inventor of the Brokaw bandgap reference. A presentation on the bandgap reference is here. The Design of Band-Gap Reference Circuits: Trials and Tribulations by analog chip design legend Bob Pease discusses real-world bandgap designs.

[14] You might wonder how the error output knows what voltage to switch at. For a Darlington pair (Q7/Q8) to be active, the base voltage must go above 2V_BE (Wikipedia). The bandgap reference was constructed assuming that at the reference voltage, there will be V_BE drops across Q7 and Q8. Thus, it's not a coincidence that Darlington pair Q7/Q8 is right in the active region (2V_BE) at the bandgap voltage making the error output very sensitive to any moves away from the reference voltage. If the output voltage rises or falls, the voltage at the base of Q7 rises or falls accordingly, and the transistors greatly amplify this change. Also note that an increase in output voltage causes a decrease in the error output, yielding negative feedback for the whole chip.

[15] By all reports, Robert Widlar was an amazing analog engineer, as well as an alcoholic crazy guy. Widlar invented key analog IC circuits such as the Widlar current source as well as groundbreaking ICs such as the µA702 and µA723. In 1970 he sold his stock options for a million dollars (about 6 million adjusted for inflation) and retired to Mexico at 33. Some entertaining stories about him are here, on Wikipedia, and pictures of his sheep.

[16] Most 7805 datasheets show the same internal schematic. Some chips using the common design are Fairchild 7800 series, Hi-Sincerity H78XX, FCI LM7800, MCC MC7805, Microelectronics ML7800, Motorola MCT7800, uPC7800H, JRC NJM7800, TI uA7800, Signetics uA7800, and ST L7805. Other chips use variants of the common design: AS78XXA, UTC LM78XX, L78L05 and Motorola MC7800.

The LM109-based design of the 7805 that I looked at is very different from the common design and appears to be fairly rare; it is used by National LM340/LM7800 and KEC KIA7805AF. There are a few differences to note between this design and the original National LM109. In order to support multiple output voltages, the 7805 design uses a resistor divider and a different circuit feeding the bandgap reference. This probably also motivated the removal of a couple transistors from the bandgap circuit so its voltage is one V_BE drop lower. The startup circuit is also slightly changed.

[17] Widlar's patent on the bandgap reference is 3617859. A later patent with a bandgap reference very similar to the LM109's is 4249122.

[18] A current mirror is a very useful way of connecting transistors so the current through the second transistor matches the current through the first transistor. For more information about current mirrors, you can check Wikipedia or any analog IC book such as chapter 3 of Designing Analog Chips.

[19] Several sources give an explanation of the common 7805 design that is plausible but wrong. The faulty explanation is that Zener D1 provides the reference voltage. It feeds into a comparator built from Q13 and Q10 (or Q6) as a differential pair and Q1, Q7, and Q2 forming a current mirror active load. The most obvious problem with this is Q13, Q6, R1, and R2 are all tied together which would short out the two sides of the supposed differential pair / current mirror.

Ironically, the design of the 7905 (the negative-voltage version of the 7805) is similar to the erroneous 7805 explanation. The 7905 uses a Zener diode to provide the reference voltage. A comparator with a current mirror active load generates the error signal by comparing the reference voltage with the feedback voltage. Meanwhile another current mirror ensures a constant (probably temperature-compensated) current flows through the Zener diode. I had expected the 79XX chips would be mirror-images of the 78XX chips, but the internal design turns out to be fundamentally different. This explains why the block diagrams in 7905 datasheets show a comparator and 7805 datasheets just show an "error amplifier" box.

[20] In the common 7805 design, I believe the purpose of Q7 and R10 is to pull the same current from Q1's base that Q4 and R14 pull from Q2's base, to keep both sides balanced. Because R1 is 1KΩ and R2+R3 is 21kΩ, 21 times the current should flow through Q1 as through Q2.