The Z-80 has a 4-bit ALU. Here's how it works.

The 8-bit Z-80 processor is famed for use in many early personal computers such the Osborne 1, TRS-80, and Sinclair ZX Spectrum, and it is still used in embedded systems and TI graphing calculators. I had always assumed that the ALU (arithmetic-logic unit) in the Z-80 was 8 bits wide, like just about every other 8-bit processor. But while reverse-engineering the Z-80, I was shocked to discover the ALU is only 4 bits wide! The founders of Zilog mentioned the 4-bit ALU in a very interesting discussion at the Computer History Museum, so it's not exactly a secret, but it's not well-known either.

I have been reverse-engineering the Z-80 processor using images from the Visual 6502 team. The image below shows the overall structure of the Z-80 chip and the location of the ALU. The remainder of this article dives into the details of the ALU: its architecture, how it works, and exactly how it is implemented.

I've created the following block diagram to give an overview of the structure of the Z-80's ALU. Unlike Z-80 block diagrams published elsewhere, this block diagram is based on the actual silicon. The ALU consists of 4 single-bit units that are stacked to form a 4-bit ALU. At the left of the diagram, the register bus provides the ALU's connection to the register file and the rest of the CPU.

The operation of the ALU starts by loading two 8-bit operands from registers into internal latches. The ALU does a computation on the low 4 bits of the operands and stores the result internally in latches. Next the ALU processes the high 4 bits of the operands. Finally, the ALU writes the 8 bits of result (the 4 low bits from the latch, and the 4 high bits just computed) back to the registers. Thus, by doing two computation cycles, the ALU is able to process a full 8 bits of data. ("Full 8 bits" may not sound like much if you're reading this on a 64-bit processor, but it was plenty at the time.)

As the block diagram shows, the ALU has two internal 4-bit buses connected to the 8-bit register bus: the low bus provides access to bits 0, 1, 2, and 3 of registers, while the high bus provides access to bits 4, 5, 6, and 7. The ALU uses latches to store the operands until it can use them. The op1 latches hold the first operand, and the op2 latches hold the second operand. Each operand has 4 bits of low latch and 4 bits of high latch, to store 8 bits.

Multiplexers select which data is used for the computation. The op1 latches are connected to a multiplexer that selects either the low or high four bits. The op2 latches are connected to a multiplexer that selects either the low or high four bits, as well as selecting either the value or the inverted value. The inverted value is used for subtraction, negation, and comparison.

The two operands go to the "alu core", which performs the desired operation: addition, logical AND, logical OR, or logical XOR. The ALU first performs one computation on the low bits, storing the 4-bit result into the result low latch. The ALU then performs a second computation on the high bits, writing the latched low result and the freshly-computed high bits back to the bus. The carry from the first computation is used in the second computation if needed.

The Z-80 provides extensive bit-addressed operations, allowing a single bit in a byte to be set, reset, or tested. In a bit-addressed operation, bits 5, 4, and 3 of the instruction select which of the 8 bits to use. On the far right of the ALU block diagram is the bit select circuit that support these operations. In this circuit, simple logic gates select one of eight bits based on the instruction. The 8-bit result is written to the ALU bus, where it is used for the bit-addressed operation. Thus, decoding this part of an instruction happens right at the ALU, rather than in the regular instruction decode logic.

The Z-80's shift circuitry is interesting. The 6502 and 8085 have an additional ALU operation for shift right, and perform shift left by adding the number to itself. The Z-80 in comparison performs a shift while loading a value into the ALU. While the Z-80 reads a value from the register bus, the shift circuit selects which lines from the register bus to use. The circuit loads the value unchanged, shifted left one bit, or shifted right one bit. Values shifted in to bit 0 and 7 are handled separately, since they depend on the specific instruction.

The block diagram also shows a path from the low bus to the high op2 latch, and from the high bus to the low op1 latch. These are for the 4-bit BCD shifts RRD and RLD, which rotate a 4-bit digit in the accumulator with two digits in memory.

Not shown in the block diagram are the simple circuits to compute parity, test for zero, and check if a 4-bit value is less than 10. These values are used to set the condition flags.

The silicon that implements the ALU

The image above zooms in on the ALU region of the Z-80 chip. The four horizontal "slices" are visible. The organization of each slice approximately matches the block diagram. The register bus is visible on the left, running vertically with the shifter inputs sticking out from the ALU like "fingers" to obtain the desired bits. The data bus is visible on the right, also running vertically. The horizontal ALU low and ALU high lines are visible at the top and bottom of each slice. The yellow arrows show the locations of some ALU components in one of the slices, but the individual circuits of the ALU are not distinguishable at this scale. In a separate article, I zoom in to some individual gates in the ALU and show how they work: Reverse-engineering the Z-80: the silicon for two interesting gates explained.

The ALU's core computation circuit

The silicon that implements one bit of ALU processing

The heart of each bit of the ALU is a circuit that computes the sum, AND, OR, or XOR for two one-bit operands. Zooming in shows the silicon that implements this circuit; at this scale the transistors and connections that make up the gates are visible. Power, ground, and the control lines are the vertical metal stripes. The shiny horizontal bands are polysilicon "wires" which form the connections in the circuit as well as the transistors. I know this looks like mysterious gray lines, but by examining it methodically, you can figure out the underlying circuit. (For details on how to figure out the logic from this silicon, see my article on the Z-80's gates.) The circuit is shown in the schematic below.

The Z-80 ALU circuit that computes one bit

This circuit takes two operands (op1 and op2), and a carry in. It performs an operation (selected by control lines R, S, and V) and generates an internal carry, a carry-out, and the result.

ALU computation logic in detail

The first step is the "carry computation", which is done by one big multi-level gate. It takes the two operand bits (op1 and op2) and the carry in, and computes the (complemented) internal carry that results from adding op1 plus op2 plus carry-in. There are just two ways this sum can cause a carry: if op1 and op2 are both 1 (bottom AND gate); or if there's a carry-in and at least one of the operands is a 1 (top gates). These two possibilities are combined in the NOR gate to yield the (complemented) internal carry. The internal carry is inverted by the NOR gate at the bottom to yield the carry out, which is the carry in for the next bit. There are a couple control lines that complicate carry generation slightly. If S is 1, the internal carry will be forced to 0. If R is 1, the carry out will be forced to 0 (and thus the carry in for the next bit).

The multi-level result computation gate is interesting as it computes the SUM, XOR, AND or OR. It takes some work to step through the different cases, but if anyone wants the details:

  • SUM: If R is 0, S is 0, and V is 0, then the circuit generates the 1's bit of op1 plus op2 plus carry-in, i.e. op1 xor op2 xor carry-in. To see this, the output is 1 if all three of op1, op2, and carry-in are set, or if at least one is set and there's no internal carry (i.e. exactly one is set).
  • XOR: If R is 1, S is 0, and V is 0, then the circuit generates op1 xor op2 To see this, note that this is like the previous case except carry-in is 0 because of R.
  • AND: If R is 0, S is 1, and V is 0, then the circuit generates op1 and op2. To see this, first note the internal carry is forced to 0, so the lower AND gate can never be active. The carry-in is forced to 1, so the result is generated by the upper AND gate.
  • OR: If R is 1, S is 1, and V is 1, then the circuit generates op1 or op2. The internal carry is forced to 0 by S and the the carry-out (carry-in) is forced to 0 by R. Thus, the top AND gate is disabled, and the 3-input OR gate controls the result.

Believe it or not, this is conceptually a lot simpler than the 8085's ALU, which I described in detail earlier. It's harder to understand, though, then the 6502's ALU, which uses simple gates to compute the AND, OR, SUM, and XOR in parallel, and then selects the desired result with pass transistors.

Conclusion

The Z-80's ALU is significantly different from the 6502 or 8085's ALU. The biggest difference is the 6502 and 8085 use 8-bit ALUs, while the Z-80 uses a 4-bit ALU. The Z-80 supports bit-addressed operations, which the 6502 and 8085 do not. The Z-80's BCD support is more advanced than the 8085's decimal adjust, since the Z-80 handles addition and subtraction, while the 8085 only handles addition. But the 6502 has more advanced BCD support with a decimal mode flag and fast, patented BCD logic.

If you've designed an ALU as part of a college class, it's interesting to compare an "academic" ALU with the highly-optimized ALU used in a real chip. It's interesting to see the short-cuts and tradeoffs that real chips use.

I've created a more detailed schematic of the Z-80 ALU that expands on the block diagram and the core schematic above and shows the gates and transistors that make up the ALU.

I hope this exploration into the Z-80 has convinced you that even with a 4-bit ALU, the Z-80 could still do 8-bit operations. You didn't get ripped off on your old TRS-80.

Credits: This couldn't have been done without the Visual 6502 team especially Chris Smith, Ed Spittles, Pavel Zima, Phil Mainwaring, and Julien Oster.

Intel x86 documentation has more pages than the 6502 has transistors

Microprocessors have become immensely more complex thanks to Moore's Law, but one thing that has been lost is the ability to fully understand them. The 6502 microprocessor was simple enough that its instruction set could almost be memorized. But now processors are so complex that understanding their architecture and instruction set even at a superficial level is a huge task. I've been reverse-engineering parts of the 6502, and with some work you can understand the role of each transistor in the 6502. After studying the x86 instruction set, I started wondering which was bigger: the number of transistors in the 6502 or the number of pages of documentation for the x86.

It turns out that Intel's Intel® 64 and IA-32 Architectures Software Developer Manuals (2011) have 4181 pages in total, while the 6502 has 3510 transistors. There are actually more pages of documentation for the x86 than the number of individual transistors in the 6502.

The above photo shows Intel's IA-32 software developer's manuals from 2004 on top of the 6502 chip's schematic. Since then the manuals have expanded to 7 volumes.

The 6502 has 3510 transistors, or 4528, or 6630, or maybe 9000?

As a slight tangent, it's actually hard to define the transistor count of a chip. The 6502 is usually reported as having 3510 transistors. This comes from the Visual 6502 team, which dissolved a 6502 chip in acid, photographed the die (below), traced every transistor in the image, and built a transistor-level simulator that runs 6502 code (which you really should try). Their number is 3510 transistors.

The 6502 processor chip

One complication is the 6502 is built with NMOS logic which builds gates out of active "enhancement" transistors as well as pull-up "depletion" transistors which basically act as resistors. The count of 3510 is just the enhancement transistors. If you include the 2102 1018 depletion transistors, the total transistor count is 5612 4528.

A second complication is that when manufacturers report the transistor count of chips, they often report "potential" transistors. Chips that include a ROM or PLA will have different numbers of transistors depending on the values stored in the ROM. Since marketing doesn't want to publish different transistor numbers depending on the number of 1 bits and 0 bits programmed into the chip, they often count ROM or PLA sites: places that could have transistors, but might not. By my count, the 6502 decode PLA has 21×131=2751 PLA sites, of which 649 actually have transistors. Adding these 2102 "potential" transistors yields a count of 6630 transistors.

Finally, some sources such as Microsoft Encarta and A History of the Personal Computer state the 6502 contains 9000 transistors, but I don't know how they could have come up with that value.

(The number of pages of Intel documentation is also not constant; the latest 2013 Software Developer Manuals have shrunk to 3251 pages.)

Thus, the x86 has more pages of documentation than the 6502 has transistors, but it depends how you count.

Reverse-engineering the Z-80: the silicon for two interesting gates explained

I've been reverse-engineering the Z-80 processor, using images from the Visual 6502 team. One interesting thing about the Z-80's silicon is it uses complex gates with multiple inputs and multiple levels of logic. It also implements an XOR gate with an unusual pass-transistor circuit. I thought it would be interesting to examine these gates at the silicon level and show how they work.

The image above shows the overall organization of the Z-80 chip. I'm going to zoom way in on the ALU and look at the silicon that implements one of the complex gates there: a 5-input, three-level gate. I'll walk through this gate and show how it works at the silicon level. While the silicon look like a jumble of lines, its operation is actually straightforward if you step through it.

Let's begin with an (oversimplified) description of how the chip is constructed. The chip starts with the silicon wafer. Regions are diffused with an element such as boron, yielding conductive diffusion regions. A layer of polysilicon strips is put on top. Finally, a layer of metal "wires" above the polysilicon provides more connections. For our purposes, diffusion regions, polysilicon, and metal can all be consider conductors.

In the image below, the bright vertical bands are metal wires. The slightly darker horizontal bands are polysilicon; the borders are more visible than the regions themselves. In this part of the Z-80, the polysilicon connections run mostly horizontally, and the metal wires run vertically. The large irregular regions outlined in black are doped silicon diffusion regions. The circles are vias between different layers.

Transistors are formed where a polysilicon line crosses a diffusion region. You might expect transistors to be very visible in the image, but a polysilicon line looks the same whether its a conductor or a transistor. So transistors just appear as long skinny regions in the image. The diagram below shows the physical structure of a transistor: the source and drain are connected if the gate is positive.

Structure of an NMOS transistor

Let's dive in and see how this circuit works. There's a lot going on, but the image below has been colored to make it clearer. Only three of the vertical metal lines are relevant. On the left, the yellow metal line ties together parts of the gate. In the middle is the blue ground line, which is critical to the operation of the gate. At the right, the red positive voltage line is used to pull the output high through a resistor. The large diffusion region has been tinted cyan. This region can be thought of as big conductive areas interrupted by transistors. There are 5 pinkish polysilicon input wires, labeled A, B, C, D, E. When they cross the diffusion region they still act as wires, but also form a transistor below in the diffusion region. For instance, input A is connected to two transistors.

With all the pieces labeled, we can figure out the operation of the circuit. If input A is high, the first transistor will conduct and connect the yellow strip to ground (dotted line 1). Likewise, if input B is high, the second transistor will conduct and ground the yellow strip (dotted line 2). C will ground the yellow strip via 3. So the yellow strip will be grounded for A or B or C. This forms a three-input OR gate.

If input D is high, transistor 4 will connect the yellow strip to the output. Likewise, if input E is high, transistor 5 will connect the yellow strip to the output. Thus, the output will be grounded if (A or B or C) and (D or E).

In the upper right, arrow 6/7/8 will ground the output if A and B and C are high and the three associated transistors (6, 7, 8) conduct. This computes A and B and C.

Putting this all together, the output will be grounded if [(A or B or C) and (D or E)] or [A and B and C]. If the output is not grounded, the resistor (actually a depletion transistor) will pull the output high. Thus, the final output is not [(A or B or C) and (D or E)] or [A and B and C].

The diagram below shows the gate logic implemented by this circuit. This rather complex gate is created from just nine transistors. Note that the final AND and NOR gates are "for free" - they are formed by wiring together previous outputs and don't require additional transistors. Another point of interest is that with NMOS, the output will be high unless something pulls it low, which explains why circuits are based on NAND and NOR gates rather than AND and OR gates.

If you want to see more low-level silicon analysis, see my article on the overflow circuit in the 6502 at the silicon level.

What does this gate do?

This gate is a key part of one bit of the Z-80's ALU. The gate generates the (inverted) sum, AND, OR, or XOR of B and C depending on the inputs. Specifically, B and C are the two operand inputs, and A is the carry in. D is a control input and E is an inverted intermediate carry from B plus C plus carry_in. By controlling D and overriding A and E, the operation is selected.

The Z-80's interesting XOR gate

The Z-80 uses an unusual circuit for its XOR gate. XOR is an inconvenient function to implement since it has a worst-case Karnaugh map, making it expensive to implement from simple gates. Instead, the Z-80 uses a combination of inverters and pass transistors, different from regular NMOS logic.

As before, the diagram below shows the power and ground metal lines, a connecting metal line in yellow, the polysilicon in pink, the polysilicon transistor gates in green, and diffusion in cyan. The two inputs are A and B.

Starting with input A: if it is high, transistor 1 will connect A' to ground. Otherwise the pullup resistor (way on the left), will pull A' high. (Note that A' is the whole diffusion region between transistor 1 and transistor 3 up to the resistor.) Thus transistor 1 forms a simple inverter with inverted output A'. Likewise, transistor 2 inverts input B to give inverted B' (in the whole diffusion region between transistors 2 and 4).

Now comes the tricky part. If A' is high, pass transistor 4 will connect B' to the yellow metal. If B' is high, pass transistor 3 will connect A' to the yellow metal. The third pullup resistor will pull the yellow metal high unless something ties it to ground . Working through the combinations, if A' and B' are both high, both A' and B' are connected to the yellow metal, which gets pulled high. If A' is high and B' is low, B' is connected to the yellow metal, pulling it low. Likewise, if A' is low and B' is high, A' pulls the yellow metal low. Finally if A' and B' are low, nothing gets connected to the yellow metal, so the resistor pulls it high.

To summarize, the yellow metal is pulled high if A' and B' are both high or both low. That is, it is the exclusive-nor of A' and B', which is also the exclusive-or of A and B.

Finally, the xnor value controls transistors 5a and 5b which form an inverter. If xnor is high, transistors 5a and 5b conduct and the xor output is connected to ground, and if xnor is low, the pullup resistors pull the xor output high. One unusual feature here is the parallel transistors 5a and 5b with separate pullup resistors. I haven't seen this in the 8085 or 6502; they use a single larger transistor instead of parallel transistors.

The schematic below summarizes the circuit. In case you're wondering, this XOR gate is used to compute the parity flag. All the bits are XORed together to generate the parity flag.

Comparison to other processors

From what I've seen so far, the Z-80 uses considerably more complex gates than the 8085 and the 6502. The 6502 uses mostly simple NAND/NOR gates and only a few two-level gates, not as complex as on the Z-80. The 8085 uses more complex gates, but still less than the Z-80. I don't know if the difference is due to technical limits on the number of gate levels, or the preferences of the designers.

The XOR circuit in the Z-80 is different from the 8085 and 6502. I'm not sure it saves any transistors, but it is unusual. I've seen other pass-transistor implementations of XOR, but none like the Z-80.

Credits: The Visual 6502 team especially Chris Smith, Ed Spittles, Pavel Zima, Phil Mainwaring, and Julien Oster.

9 Hacker News comments I'm tired of seeing

As a long-time reader of Hacker News, I keep seeing some comments they don't really contribute to the conversation. Since the discussions are one of the most interesting parts of the site I offer my suggestions for improving quality.
  • Correlation is not causation: the few readers who don't know this already won't benefit from mentioning it. If there's some specific reason you think a a study is wrong, describe it.
  • "If you're not paying for it, you're the product" - That was insightful the first time, but doesn't need to be posted about every free website.
  • Explaining a company's actions by "the legal duty to maximize shareholder value" - Since this can be used to explain any action by a company, it explains nothing. Not to mention the validity of the statement is controversial.
  • [citation needed] - This isn't Wikipedia, so skip the passive-aggressive comments. If you think something's wrong, explain why.
  • Premature optimization - labeling every optimization with this vaguely Freudian phrase doesn't make you the next Knuth. Calling every abstraction a leaky abstraction isn't useful either.
  • Dunning-Kruger effect - an overused explanation and criticism.
  • Betteridge's law of headlines - this comment doesn't need to appear every time a title ends in a question mark.
  • A link to a logical fallacy, such as ad hominem or more pretentiously tu quoque - this isn't a debate team and you don't score points for this.
  • "Cue the ...", "FTFY", "This.", "+1", "Sigh", "Meh", and other generic internet comments are just annoying.
My readers had a bunch of good suggestions. Here are a few:
  • The plural of anecdote is not data
  • Cargo cult
  • Comments starting with "No." "Wrong." or "False."
  • Just use bootstrap / heroku / nodejs / Haskell / Arduino.
  • "How [or Why] did this make the front page of HN?" followed by http://ycombinator.com/newsguidelines.html
In general if a comment could fit on a bumper sticker or is simply a link to a Wikipedia page or is almost a Hacker News meme, it's probably not useful.

What comments bother you the most?

Check out the long discussion at Hacker News. Thanks for visiting, HN readers!

Amusing note: when I saw the comments below, I almost started deleting them thinking "These are the stupidest comments I've seen in a long time". Then I realized I'd asked for them :-)

Edit: since this is getting a lot of attention, I'll add my "big theory" of Internet discussions.

There are three basic types of online participants: "watercooler", "scientific conference", and "debate team". In "watercooler", the participants are having an entertaining conversation and sharing anecdotes. In "scientific conference", the participants are trying to increase knowledge and solve problems. In "debate team", the participants are trying to prove their point is right.

HN was originally largely in the "scientific conference" mode, with very smart people discussing areas in which they were experts. Now HN has much more "watercooler" flavor, with smart people chatting about random things they often know little about. And certain subjects (e.g. economics, Apple, sexism, piracy) bring out the "debate team" commenters. Any of the three types can carry on happily by themself. However, much of the problem comes when the types of conversation mix. The "watercooler" conversations will annoy the "scientific conference" readers, since half of what they say is wrong. Conversely, the "scientific conference" commenters come across as pedantic when they interrupt a fun conversation with facts and corrections. A conversation between "debate team" and one of the other groups obviously goes nowhere.

Reverse-engineering the 8085's decimal adjust circuitry

In this post I reverse-engineer and describe the simple decimal adjust circuit in the 8085 microprocessor. Binary-coded decimal arithmetic was an important performance feature on early microprocessors. The idea behind BCD is to store two 4-bit decimal numbers in a byte. For instance, the number 42 is represented in BCD as 0100 0010 (0x42) instead of binary 00101010 (0x2a). This continues my reverse engineering series on the 8085's ALU, flag logic, undocumented flags, register file, and instruction set.

The motivation behind BCD is to make working with decimal numbers easier. Programs usually need to input and output numbers in decimal, so if the number is stored in binary it must be converted to decimal for output. Since early microprocessors didn't have division instructions, converting a number from binary to decimal is moderately complex and slow. On the other hand, if a number is stored in BCD, outputting decimal digits is trivial. (Nowadays, the DAA operation is hardly ever used).

Photograph of the 8085 chip showing the location of the ALU, flags, and registers.

One problem with BCD is the 8085's ALU operates on binary numbers, not BCD. To support BCD operations, the 8085 provides a DAA (decimal adjust accumulator) operation that adjusts the result of an addition to correct any overflowing BCD values. For instance, adding 5 + 6 = binary 0000 1011 (hex 0x0b). The value needs to be corrected by adding 6 to yield hex 0x11. Adding 9 + 9 = binary 0001 0010 (hex 0x12) which is a valid BCD number, but the wrong one. Again, adding 6 fixes the value. In general, if the result is ≥ 10 or has a carry, it needs to be decimal adjusted by adding 6. Likewise, the upper 4 BCD bits get corrected by adding 0x60 as necessary. The DAA operation performs this adjustment by adding the appropriate value. (Note that the correction value 6 is the difference between a binary carry at 16 and a decimal carry at 10.)

The DAA operation in the 8085 is implemented by several components: a signal if the lower bits of the accumulator are ≥ 10, a signal if the upper bits are ≥ 10 (including any half carry from the lower bits), and circuits to load the ACT register with the proper correction constant 0x00, 0x06, 0x60, or 0x66. The DAA operation then simply uses the ALU to add the proper correction constant.

The block diagram below shows the relevant parts of the 8085: the ALU, the ACT (accumulator temp) register, the connection to the data bus (dbus), and the various control lines.

The accumulator and ACT (Accumulator Temporary) registers and their control lines in the 8085 microprocessor.

The circuit below implements this logic. If the low-order 4 bits of the ALU are 10 or more, alu_lo_ge_10 is set. The logic to compute this is fairly simple: the 8's place must be set, and either the 4's or 2's. If DAA is active, the low-order bits must be adjusted by 6 if either the low-order bits are ≥ 10 or there was a half-carry (A flag).

Similarly, alu_hi_ge_10 is set if the high-order 4 bits are 10 or more. However, a base-10 overflow from the low order bits will add 1 to the high-order value so a value of 9 will also set alu_hi_ge_10 if there's an overflow from the low-order bits. A decimal adjust is performed by loading 6 into the high-order bits of the ACT register and adding it. A carry out also triggers this decimal adjust.

Schematic of the decimal adjust circuitry in the 8085 microprocessor.

Schematic of the decimal adjust circuitry in the 8085 microprocessor.

The circuits to load the correction value into ACT are controlled by the load_act_x6 signal for the low digit and load_act_6x for the high digit. These circuits are shown in my earlier article Reverse-engineering the 8085's ALU and its hidden registers.

Comparison to the 6502

By reverse-engineering the 8085, we see how the simple decimal adjust circuit in the 8085 works. In comparison, the 6502 handles BCD in a much more efficient but complex way. The 6502 has a decimal mode flag that causes addition and subtraction to automatically do decimal correction, rather than using a separate instruction. This patented technique avoids the performance penalty of using a separate DAA instruction. To correct the result of a subtraction, the 6502 needs to subtract 6 (or equivalently add 10). The 6502 uses a fast adder circuit that does the necessary correction factor addition or subtraction without using the ALU. Finally, the 6502 determines if correction is needed before the original addition/subtraction completes, rather than examining the result of the addition/subtraction, providing an additional speedup.

This information is based on the 8085 reverse-engineering done by the visual 6502 team. This team dissolves chips in acid to remove the packaging and then takes many close-up photographs of the die inside. Pavel Zima converted these photographs into mask layer images, generated a transistor net from the layers, and wrote a transistor-level 8085 simulator.

Reverse-engineering and simulating Sinclair's amazing 1974 calculator with half the ROM of the HP-35

I've reverse-engineered the Sinclair Scientific calculator. The remarkable thing about this calculator is they took a simple 4-function calculator chip and reprogrammed its 320-instruction ROM to be a full scientific calculator. By looking at the chip, I've extracted the original code, reverse-engineered how it works, and written a JavaScript simulator that runs the original code and shows what the calculator is doing internally.

The simulator is at righto.com/sinclair. My earlier TI calculator simulator is at righto.com/ti. (The image above is courtesy of Hackaday.)

I wouldn't have given a nickel for their stock: Visiting Apple in 1976

A guest posting from William Fine:

I saw the "Jobs" movie yesterday and it revived some ancient memories of my dealings with Jobs and Holt in the "old days"! When I returned home, I researched Rod Holt on the Internet and ran across your Power Supply Blog, which I found most interesting. Perhaps you can add my ensuing comments to your blog as you see fit.

In 1973 I started a company in my garage in Cupertino to design and manufacture custom Magnetic Products. It was called Mini-Magnetics Co. Inc. After a few months I was forced out of the garage into a small office complex on Sunnyvale-Saratoga Road, and had about 5/10 employees.

I believe it was around 1975/76 or so, I had a visit from a insulation and wire salesman named Mike Felix. He informed me that I may soon be getting a call from a new /start-up company called Apple Computer located just a few blocks away in Cupertino. He gave them my name when he was asked to recommend a Magnetics manufacturing house.

I promptly forgot about it, as I was already quite busy and I never had to solicit business or even advertise. A week or two went by, and I received a call from a female at Apple who set up a appointment for the next day with a guy named Steve Jobs. She gave me the address and it turned out to be located in a office complex located just behind the "Good Earth" restaurant.

The next day I went over to the location and knocked on the door, and it was opened by Jobs, with Wozniack in the background and a young hippie looking girl at a desk in the corner talking on the phone while eating. That was Apple Computer. They had just moved out of their garage into this new location. It appeared to be a large room with "stuff" scattered hap-hazardly all over, on benches and on the floor. From Jobs' appearance, I was a bit afraid to even shake hands with him, especially after getting a whiff of his body odor!

He immediately took me over to a bench that had a few cardboard boxes on it and showed me some transformer cores, bobbins and spools of wire, and unfolded a hand written diagram of the various magnetic components that he wanted me to wind and assemble for Apple.I took a quick look, and while it was all quite sketchy, looked do-able. He said that he needed them within 10 days and I said ok, since he was furnishing the materials.

I told him that I would call him with a quote after I got back to my office and he said ok and as we were parting he mentioned that if I had any technical questions to get a hold of a guy named Rod Holt and wrote down a phone number where he could be reached.

As I recall, there were about 5-6 magnetic components from simple toroids to a complex switching main power transformer. I believe that the price came to about $10.00 per set,and they wanted 35 sets, so the entire matter would be about $350.00. I called it into Apple the next day and they gave me a Purchase Order number over the phone. When I asked if they would be mailing me a hard copy confirming the order, they had no idea of what I was talking about!

I figured, what the hell, worst case, I would be out $350 bucks if they didn't pay the bill. No big deal.

After I got into examining the sketches I discovered something quite interesting about the power transformer. In all previous designs that I had seen, there was a primary, a base feedback winding and several output windings. What Holt had contrived was a interesting method of assuring excellent coupling of the base winding by using a single strand of wire from a multi-filar bundle that was custom ordered from the wire factory. For example, I think that there was a bundle of 30 strands twisted together, which were all coated in red insulation and one strand of green insulation also twisted together in the bundle, which gave a precise turns ratio together with excellent coupling between the windings.

I am uncertain if that contributed much to improving the efficiency of the switcher, but it seemed clever at the time I discovered it. That transformer, is the one that is shown with the copper foil external shield pictured in your blog. I did speak with Holt once or twice but never met him in person.

The 35 sets of parts were delivered on time and much to my surprise, we were paid within 10 days. I attributed that to the arrival of Mike Markkula onto the scene who had provided some money and organization to Apple.

At the time, after seeing the Apple operation, I wouldn't have given a nickle for a share of their stock if it had been offered! Ha!

I had been involved with power supplies for many years prior to this Apple issue, and can say that switchers were known for a long time, but only became practical with the advent of low loss ferrite core materials and faster transistors as your blog implies.

So, thats the Apple Power Supply story ! Be happy to answer any questions that you may come up with. Regards, wpf

Simulating a TI calculator with crazy 11-bit opcodes

I've built a register-level simulator of a 1974 TI calculator chip that shows what actually happens inside a calculator when you perform operations and shows the calculator source code as it executes. The architecture of the calculator chip is pretty interesting, with 11-bit opcodes, a 9-bit address bus, and 44-bit BCD registers. The chip doesn't support multiplication or division, so these are performed with repeated addition or subtraction.

The simulator is at righto.com/ti.

Reverse-engineering the 8085's ALU and its hidden registers

This article describes how the ALU of the 8085 microprocessor works and how it interacts with the rest of the chip, based on reverse-engineering of the silicon. (This is part 2 of my ALU reverse-engineering; part 1 described the circuit for a single ALU bit.) Along with the accumulator, the ALU uses two undocumented registers - ACT and TMP - and this article describes how they work in detail, as well as how the ALU is controlled.

The arithmetic-logic unit is a key part of the microprocessor, performing operations and comparisons on data. In the 8085, the ALU is also a key part of the data path for moving data. The ALU and associated registers take up a fairly large part of the chip, the upper left of the photomicrograph image below. The control circuitry for the ALU is in the top center of the image. The data bus (dbus) is indicated in blue.

Photograph of the 8085 chip showing the location of the ALU, flags, and registers.

Photograph of the 8085 chip showing the location of the ALU, flags, and registers.

The real architecture of the 8085 ALU

The following architecture diagram shows how the ALU interacts with the rest of the 8085 at the block-diagram level. The data bus (dbus) conneccts the ALU and associated registers with the rest of the 8085 microprocessor. There are also numerous control lines, which are not shown.

The ALU uses two temporary registers that are not directly visible to the programmer. The Accumulator Temporary register (ACT) holds the accumulator value while an ALU operation is performed. This allows the accumulator to be updated with the new value without causing a race condition. The second temporary register (TMP) holds the other argument for the ALU operation. The TMP register typically holds a value from memory or another register.

Architecture of the 8085 ALU as determined from reverse-engineering.

Architecture of the 8085 ALU as determined from reverse-engineering.

The 8085 datasheet has an architecture diagram that is simplified and not quite correct. In particular, the ACT register is omitted and a data path from the data bus to the accumulator is shown, even though that path doesn't exist.

The accumulator and ACT registers

To the programmer, the accumulator is the key register for arithmetic operations. Reverse-engineering, however, shows the accumulator is not connected directly to the ALU, but works closely with the ACT (accumulator temporary) register.

The ACT register has several important functions. First, it holds the input to the ALU. This allows the results from the ALU to be written back to the accumulator without disturbing the input, which would cause instability. Second, the ACT can hold constant values (e.g. for incrementing or decrementing, or decimal adjustment) without affecting the accumulator. Finally, the ACT allows ALU operations that don't use the accumulator.

The accumulator and ACT (Accumulator Temporary) registers and their control lines in the 8085 microprocessor.

The accumulator and ACT (Accumulator Temporary) registers and their control lines in the 8085 microprocessor.

The diagram above shows how the accumulator and ACT registers are connected, and the control lines that affect them. One surprise is that the only way to put a value into the accumulator is through the ALU. This is controlled by the alu_to_a control line. You might expect that if you load a value into the accumulator, it would go directly from the data bus to the accumulator. Instead, the value is OR'd with 0 in the ALU and the result is stored in the accumulator.

The accumulator has two status outputs: a_hi_ge_10, if the four high-order bits are ≥ 10, and a_lo_ge_10, if the four low-order bits are ≥ 10. These outputs are used for decimal arithmetic, and will be explained in another article.

The accumulator value or the ALU result can be written to the databus through the sel_alu_a control (which selects between the ALU result and the accumulator), and the alu/a_to_dbus control line, which enables the superbuffer to write the value to the data bus. (Because the data bus is large and connects many parts of the chip, it requires high-current signals to overcome its capacitance. A "superbuffer" provides this high-current output.)

The ACT register can hold a variety of different values. In a typical arithmetic operation, the accumulator value is loaded into the ACT via the a_to_act control. The ACT can also load a value from the data bus via dbus_to_act. This is used for the ARHL/DAD/DSUB/LDHI/LDSI/RDEL instructions (all of which are undocumented except DAD). These instructions perform arithmetic operations without involving the accumulator, so they require a path into the ALU that bypasses the accumulator.

The control lines allow the ACT register to be loaded with a variety of constants. The 0/fe_to_act control line loads either 0 or 0xfe into the ACT; the value is selected by the sel_0_fe control line. The value 0 has a variety of uses. ORing a value with 0 allows the value to pass through the ALU unchanged. If the carry is set, ADDing to 0 performs an increment. The value 0xfe (signed -2) is used only for the DCR (decrement by 1) instruction. You might think the value 0xff (signed -1) would be more appropriate, but if the carry is set, ADDing 0xfe decrements by 1. I think the motivation is so both increments and decrements have the carry set, and thus can use the same logic to control the carry.

Since the 8085 has a 16-bit increment/decrement circuit, you might wonder why the ALU is also used for increment/decrement. The main reason is that using the ALU allows the condition flags to be set by INR and DCR. In contrast, the 16-bit increment and decrement instructions (INX and DCX) use the incrementer/decrementer, and as a consequence the flags are not updated.

To support BCD, the ACT can be loaded with decimal adjustment values 0x00, 0x06, 0x60, or 0x66. The top and bottom four bits of ACT are loaded with the value 6 with the 6x_to_act and x6_to_act control lines respectively.

It turns out that the decimal adjustment values are easily visible in the silicon. The following image shows the silicon that implements the ACT register. Each of the large pink structures is one bit. The eight bits are arranged with bit 7 on the left and bit 0 on the right. Note that half of the bits have pink loops at the top, in the pattern 0110 0110. These loops pull the associated bit high, and are used to set the high and/or low four bits to 6 (binary 0110).

The ACT register in the 8085. This image shows the silicon that implements the 8-bit register. Each of the large pink structures is one bit.  Bit 7 is on the left and bit 0 on the right.

The ACT register in the 8085. This image shows the silicon that implements the 8-bit register.

Building the 8-bit ALU from single-bit slices

In my previous article on the 8085 ALU I described how each bit of the ALU is implemented. Each bit slice of the ALU takes two inputs and performs a simple operation: or, add, xor and, shift right, complement, or subtract. The ALU has a shift right input and a carry input, and generates a carry output. In addition, each slice of the ALU contributes to the parity and zero calculations. The ALU has five control lines to select the operation.

One bit of the ALU in the 8085 microprocessor.

One bit of the ALU in the 8085 microprocessor

The ALU has seven basic operations: or, add, xor, and, shift right, complement, and subtract. The following table shows the five control lines that select the operation, and the meaning of the carry line for the operation. Note that the meaning of carry in and carry out is different for each operations. For bit operations, the implementation of the ALU circuitry depends on a particular carry in value, even though carry is meaningless for these operations.

Operationselect_neg_in2select_op1select_op2select_shift_rightselect_ncarry_1Carry in/out
or000011
add01000/carry
xor010011
and011010
shift right001110
complement100011
subtract11000borrow

The eight-bit ALU is formed by linking eight single-bit ALUs as shown below. The high-order bit is on the left, and the low-order bit on the right, matching the layout in silicon. The carry, parity, and zero values propagate through each ALU to form the final values on the left. The right shift input is simply the bit from the right, with the exception of the topmost bit which uses a special shift right input. The auxiliary carry is simply the carry out of bit three. The control lines to select the operation are fed into all eight ALU slices. By combining eight of these ALU slices, the whole 8-bit ALU is created. The values from the top bit are used to control the parity, zero, carry, and sign flags (as well as the undocumented K and V flags). Bit 3 generates the half carry flag.

The 8-bit ALU in the 8085 is formed by combining eight 1-bit slices.

The 8-bit ALU in the 8085 is formed by combining eight 1-bit slices.

The control lines

The ALU uses 29 control lines that are generated by a PLA that activates the right control lines based on the opcode and the position in the instruction cycle. For reference, the following table lists the 29 ALU control lines and the instructions that affect them.
Control lineRelevant instructions
ad_latch_dbus, write_dbus_to_alu_tmp, /ad_dbus IN/LDA/LHLD
/ad_dbus ARHL/DAD/DSUB/LDHI/LDSI/RDEL
/alu/a_to_dbus all
/dbus_to_act ARHL/DAD/DSUB/LDHI/LDSI/RDEL
a_to_act ACI/ADC/ADD/ADI/ANA/ANI/CMP/CPI/ORA/ORI/RAL/RAR/RLC/RRC/SBB/SBI/SUB/SUI/XRA/XRI
0/fe_to_act all
sel_alu_a all
alu_to_a ACI/ADC/ADD/ADI/ANA/ANI/CMA/CMC/DAA/DCR/IN/INR/LDA/LDAX/MOV/MVI/ORA/ORI/POP/RAL/RAR/RIM/RLC/RRC/SBB/SBI/SIM/STC/SUB/SUI/XRA/XRI
/daa DAA
sel_0_fe DCR
store_v_flag ACI/ADC/ADD/ADI/ANA/ANI/ARHL/CMP/CPI/DAA/DCR/INR/ORA/ORI/RAL/RAR/RLC/RRC/SBB/SBI/SUB/SUI/XRA/XRI
select_shift_right ARHL/RAR/RRC
arith_to_flags ACI/ADC/ADD/ADI/ANA/ANI/CMP/CPI/DAA/DCR/DSUB/INR/ORA/ORI/SBB/SBI/SUB/SUI/XRA/XRI
bus_to_flags POP PSW
/zero_flag_combine DAD/DSUB
/flags_to_bus ACI/ADC/ADD/ADI/ANA/ANI/ARHL/CALL/CC/CM/CMA/CMC/CMP/CNC/CNZ/CP/CPE/CPI//CPO/CZ/DAA/DAD/DCR/DCX/DI/DSUB/EI/HLT/IN/INR/INX/JC/JK/JM/JMP/JNC/JNK/JNZ/JP/JPE/JPO/JZ/LDA/LDAX/LDHI/LDSI/LHLD/LHLX/LXI/MOV/MVI/NOP/ORA/ORI/OUT/PCHL/POP/PUSH/RAL/RAR/RC/RDEL/RET/RIM/RLC/RM/RNC/RNZ/RP/RPE/RPO/RRC/RST/RSTV/RZ/SBB/SBI/SHLD/SHLX/SIM/SPHL/STA/STAX/STC/SUB/SUI/XCHG/XRA/XRI/XTHL
shift_right_in_select ARHL
xor_carry_in ANA/ANI/ARHL/CMP/CPI/DCR/DSUB/INR/RAR/RRC/SBB/SBI/SUB/SUI
select_op2 ANA/ANI/ARHL/RAR/RRC
/use_latched_carry /rotate_carry LDHI/LDSI/RLC/RRC
/carry_in_0 0 except for ACI/ADC/DAD/DSUB/LDHI/LDSI/RAL/RDEL/RLC/SBB/SBI
select_op1 ACI/ADC/ADD/ADI/ANA/ANI/CMP/CPI/DAA/DAD/DCR/DSUB/INR/LDHI/LDSI/RAL/RDEL/RLC/SBB/SBI/SUB/SUI/XRA/XRI
select_ncarry_1 ACI/ADC/ADD/ADI/CMP/CPI/DAA/DAD/DCR/DSUB/INR/LDHI/LDSI/RAL/RDEL/RLC/SBB/SBI/SUB/SUI
In combination with first control line, write_dbus_to_alu_tmp ADC/ADD/ANA/CMA/CMC/CMP/DAA/DCR/INR/MOV/ORA/RAL/RAR/RIM/RLC/RRC/SBB/SIM/STC/SUB/XRA
select_neg_in2 CMA/CMP/CPI/DSUB/SBB/SBI/SUB/SUI
carry_to_k_flag DCX/INX
store_carry_flag ACI/ADC/ADD/ADI/ANA/ANI/ARHL/CMC/CMP/CPI/DAA/DAD/DSUB/ORA/ORI/RAL/RAR/RDEL/RLC/RRC/SBB/SBI/STC/SUB/SUI/XRA/XRI
xor_carry_result xor for ANA/ANI/CMC/CMP/CPI/DSUB/SBB/SBI/STC/SUB/SUI
/latch_carry use_carry_flag CMC/LDHI/LDSI

Conclusions

By reverse-engineering the 8085, we can see how the ALU actually works at the gate and silicon level. The ALU uses many standard techniques, but there are also some surprises and tricks. There are two registers (ACT and TMP) that are invisible to the programmer. You'd expect a direct path from the data bus to the accumulator, but instead the data passes through the ALU. The increment/decrement logic uses the unexpected constant 0xfe, and there are two totally different ways of performing increment/decrement. Several undocumented instructions perform ALU operations without involving the accumulator at all.

This information builds on the 8085 reverse-engineering done by the visual 6502 team. This team dissolves chips in acid to remove the packaging and then takes many close-up photographs of the die inside. Pavel Zima converted these photographs into mask layer images, generated a transistor net from the layers, and wrote a transistor-level 8085 simulator.

Four Rigol oscilloscope hacks with Python

A Rigol oscilloscope has a USB output, allowing you to control it with a computer and and perform additional processing externally. I was inspired by Cibo Mahto's article Controlling a Rigol oscilloscope using Linux and Python, and came up with some new Python oscilloscope hacks: super-zoomable graphs, generating a spectrogram, analyzing an IR signal, and dumping an oscilloscope trace as a WAV file. The key techniques I illustrate are connecting to the oscilloscope with Windows, accessing a megabyte of data with Long Memory, and performing analysis on the data.

Analyzing the IR signal from a TV remote using an IR sensor and a Rigol DS1052E oscilloscope.

Analyzing the IR signal from a TV remote using an IR sensor and a Rigol DS1052E oscilloscope.

Super-zoomable graphs

One of the nice features of the Rigol is "Long Memory" - instead of downloading the 600-point trace that appears on the screen, you can record and access a high-resolution trace of 1 million points. In this hack, I show how you can display this data with Python, giving you a picture that you can easily zoom into with the mouse.

The following screenshot shows the data collected by hooking the oscilloscope up to an IR sensor. In the above picture, the sensor is the three-pin device below the screen. Since I've developed an IR library for Arduino, my examples focus on IR, but any sort of signal could be used. By enabling Long Memory, we can download not just the data on the screen, but 1 million data points, allowing us to zoom way, way in. The graph below shows what it sent when you press a button on the TV remote - the selected button transmits a code, followed by a periodic repeat signal as long as the button is held down.

The IR signal from a TV remote. The first block is the code, followed by period repeat signals while the button is held down.

The IR signal from a TV remote. The first block is the code, followed by period repeat signals while the button is held down.
But with Long Memory, we can interactively zoom way on the waveform and see the actual structure of the code - long header pulses followed by a sequence of wide and narrow pulses that indicate the particular button. That's not the end of the zooming - we can zoom way in on an edge of a pulse and see the actual rise time of the signal over a few microseconds. You can do some pretty nice zooming when you have a million datapoints to plot.

Zooming in on the first part of the waveform shows the structure of the code - long header pulses followed by wide and narrow pulses. Zooming way, way in on the edge of a pulse shows the rise time of the signal.

To use this script, first enable Long Memory by going to Acquire: MemDepth. Next, set the trigger sweep to Single. Capture the desired waveform on the oscilloscope. Then run the Python script to upload the data to your computer, which will display a plot using matplotlib. To zoom, click the "+" icon at the bottom of the screen. This lets you pan back and forth through the data by holding down the left mouse button. You can zoom in and out by holding the right mouse button down and moving the mouse right or left. The magnifying glass icon lets you select a zoom rectangle with the mouse. You can zoom on your oscilloscope too, of course, but using a mouse and having labeled axes can be much more convenient.

A few things to notice about the code. The first few lines get the list of instruments connected to VISA and open the USB instrument (i.e. your oscilloscope). The timeout and chunk size need to be increased from the defaults to download the large amount of data without timeouts.

Next, ask_for_values gets various scale values from the oscilloscope so the axes can be labeled properly. By setting the mode to RAW we download the full dataset, not just what is visible on the screen. We get the raw data from channel 1 with :WAV:DATA? CHAN1. The first 10 bytes are a header and should be discarded. Next, the raw bytes are converted to numeric values with Mahto's formulas. Finally, matplotlib plots the data.

There are a couple "gotchas" with Long Memory. First, it only works reliably if you capture a single trace by setting the trigger sweep to "single". Second, downloading all this data over USB takes 10 seconds or so, which can be inconveniently slow.

Analyze an IR signal

Once we can download a signal from the oscilloscope, we can do more than just plot it - we can process and analyze it. In this hack, I decode the IR signal and print the corresponding hex value. Since it takes 10 seconds to download the signal, this isn't a practical way of using an IR remote for control. The point is to illustrate how you can perform logic analysis on the oscilloscope trace by using Python.

This code shows how the Python script can wait for the oscilloscope to be triggered and enter the STOP state. It also shows how you can use Python to initialize the oscilloscope to a desired configuration. The oscilloscope gets confused if you send too many commands at once, so I put a short delay between the commands.

Generate a spectrogram

Another experiment I did was using Python libraries to generate a spectrogram of a signal recorded by the oscilloscope. I simply hooked a microphone to the oscilloscope, spoke a few words, and used the script below to analyze the signal. The spectrogram shows low frequencies at the bottom, high frequencies at the top, and time progresses left to right. This is basically a FFT swept through time.

A spectrogram generated by matplotlib using data from a Rigol DS1052E oscilloscope.

A spectrogram generated by matplotlib using data from a Rigol DS1052E oscilloscope.
To use this script, set up the oscilloscope for Long Memory as before, record the signal, and then run the script.

Dump data to a .wav file

You might want to analyze the oscilloscope trace with other tools, such as Audacity. By dumping the oscilloscope data into a WAV file, it can easily be read into other software. Or you can play the data and hear how it sounds.

To use this script, enable Long Memory as described above, capture the signal, and run the script. A file channel1.wav will be created.

How to install the necessary libraries

Before connecting your oscilloscope to your Windows computer, there are several software packages you'll need.
  • I assume you have Python already installed - I'm using 2.7.3.
  • Install NI-VISA Run-Time Engine 5.2. This is National Instruments Virtual Instrument Software Architecture, providing an interface to hardware test equipment.
  • Install PyVISA, the Python interface to VISA.
  • If you want to run the graphical programs, install Numpy and matplotlib.
You can also use Rigol's UltraScope for DS1000E software, but the included NI_VISA 4.3 software doesn't work with pyVisa - I ended up with VI_WARN_CONFIG_NLOADED errors. If you've already installed Ultrascope, you'll probably need to uninstall and reinstall NI_VISA.

If you're using Linux instead of Windows, see Mehta's article.

How to control and program the oscilloscope

Once the software is installed (below), connect the oscilloscope to the computer's USB port. Use the USB port on the back of the oscilloscope, not the flash drive port on the front panel.

Hopefully the code examples above are clear. First, the Python program must get the list of connected instruments from pyVisa and open the USB instrument, which will have a name like USB0::0x1AB1::0x0588::DS1ED141904883. Once the oscilloscope connection is open, you can use scope.write() to send a command to the oscilloscope, scope.ask() to send a command and read a result string, and scope.ask_for_values() to send a command and read a float back from the oscilloscope.

When the oscilloscope is under computer control, the screen shows Rmt and the front panel is non-responsive. The "Force" button will restore local control. Software can release the oscilloscope by sending the corresponding ":KEY:FORCE" command.

Error handling in pyVisa is minimal. If you send a bad command, it will hang and eventually timeout with VisaIOError: VI_ERROR_TMO: Timeout expired before operation completed.

The API to the oscilloscope is specified in the DS1000D/E Programming Guide. If you do any Rigol hacking, you'll definitely want to read this. Make sure you use the right programming guide for your oscilloscope model - other models have slightly different commands that seem plausible, but they will timeout if you try them.

Conclusions

Connecting an oscilloscope to a computer opens up many opportunities for processing the measurement data, and Python is a convenient language to do this. The Long Memory mode is especially useful, since it provides extremely detailed data samples.

Reverse-engineering the flag circuits in the 8085 processor

Processors all have status flags to keep track of conditions such as a zero value, a carry, or a negative value. Whenever you write a loop or conditional, these flags ultimately are in control. But how are these flags implemented in the chip's silicon? I've reverse-engineered the flag circuits in the 8085 microprocessor and explain what is really going on.

The photograph below is a highly magnified image of the 8085's silicon, showing the relevant parts of the chip. In the upper-left, the arithmetic logic unit (ALU) performs 8-bit arithmetic operations. The status flag circuitry is below the ALU and the flags are connected to the data bus (indicated in blue). To the right of the ALU, the control PLA decodes the instructions into control lines that control the operations of the ALU and flag circuits.

Photograph of the 8085 chip showing the location of the ALU, flags, and registers.

The 8085 has seven status flags.
  • Bit 7 is the sign flag, indicating a negative two's-complement value, which is simply a byte with the top bit set.
  • Bit 6 is the zero flag, indicating a value that is all zeros.
  • Bit 5 is the undocumented K (or X5) flag, indicating either a carry from the 16-bit incrementer/decrementer or the result of a signed comparison. See my article on the undocumented K and V flags.
  • Bit 4 is the auxiliary carry, indicating a carry out of the 4 low-order bits. This is typically used for BCD (binary-coded decimal) arithmetic.
  • Bit 3 is unused and set to 0. Interestingly, a fairly large transistor drives the data bus line to 0 when reading the flags, so this unused flag bit doesn't come for free.
  • Bit 2 is the parity flag, which is set if the result has an even number of 1 bits.
  • Bit 1 is the undocumented signed overflow flag V (details).
  • Bit 0 is the carry flag.

The image below zooms in on the flag silicon, showing individual transistors. The large transistors labeled with the flag name drive the flag value onto the data bus. From the data bus, the flag values control the results of conditional jumps, calls, and returns. The complex circuits above these transistors compute and store the flag values.

The silicon that implements the flags in the 8085 microprocessor.

The schematic below shows the flag circuit that is implemented in the silicon above.

Schematic of the flag storage in the 8085 microprocessor.

Schematic of the flag storage in the 8085 microprocessor.

Each flag bit has a latch and control lines to write a value to the latch. Most flags are updated by the same arithmetic instructions and controlled by the arith_to_flags control line. The carry flag is affected by additional instructions and has its own control line. The undocumented K and V flags are updated in different circumstances and have their own control lines.

The bus_to_flags control loads the flags from the data bus for the POP PSW instruction, while the flags_to_bus control sends the flag values over the data bus for the PUSH PSW instruction or for conditional branches.

The circuitry to compute most flag values is straightforward. The sign flag is set based on bit 7 of the result. The auxiliary carry flag is set on the carry out of bit 3. The K and V flags are set based on the top two bits (details). The zero flag is normally set from the alu_zero signal that indicates all bits are zero.

The zero flag has support for multi-byte zero: at each step it can AND the existing zero flag with the current ALU zero value, so the zero flag will be set if both bytes are zero. This is only used for the (undocumented) DSUB 16-bit subtract instruction. Strangely, this circuit is also activated for the 16-bit DAD instructions, but the result is not stored in the flag.

If you look at the chip photograph at the top of the article, the flags are arranged in apparently-random order, not in their bit order as you might expect. Presumably the layout used is more efficient. Also notice that the carry flag C is off to the right of the ALU. Because of the complexity of the carry logic, which will be discussed next, the circuitry wouldn't fit under the ALU with the rest of the flag logic.

The carry logic

The schematic below shows the circuit for the carry flag. The logic for carry is more complex than for the other flags because carry is used in a variety of ways.

Schematic of the carry circuitry in the 8085 microprocessor.

Schematic of the carry circuitry in the 8085 microprocessor.

The value stored in the carry flag

The top part of the circuit computes carry_result, the value stored in the carry flag. This value has several different meanings depending on the instruction:
  • For arithmetic operations, the carry flag is loaded with the value generated by the ALU. That is, alu_carry_7 (the high-order carry from bit 7 of the ALU) is used. (See Inside the ALU of the 8085 microprocessor for details on how this is computed.)
  • For DAA (decimal adjust accumulator), the carry flag is set if the high-order digit is >= 10. This value is alu_hi_ge_10, which is selected by the daa control line.
  • For CMC (complement carry), the carry flag value is complemented. To compute this, the previous carry flag value c_flag is selected by use_carry_flag and complemented by the xor_carry_result control line.
  • For ARHL/RAR/RRC (rotate right operations), bit 0 of the rotated value goes into the carry. In the circuit, reg_act_0 (the low-order bit in the undocumented ACT (accumulator temp) register) is selected by the alu_shift_right control line.
The xor_carry_result control inverts the carry value in a few cases. For subtraction and comparison, it flips the carry bit to be the borrow bit. For STC (set carry), the xor_carry_result control forces the carry to 1. For AND operations, it forces the carry to 0.

Generating the carry input signal

The middle part of the circuit selects the appropriate carry_in value that is supplied to the ALU.
  • The first option is to set the carry in to either 0 or 1, by using carry_in_0 and optionally xor_carry_in. This is used for most instructions.
  • The next option is to use the current carry flag value as an input for additions or subtractions (allowing multi-byte arithmetic). For subtraction, this is inverted to convert borrow to carry; the xor_carry_in control does this.
  • The final option uses the carry latch to temporarily hold the carry for the undocumented LDHI and LDSI instructions. These instructions add a constant to a 16-bit register pair, so they need to add the carry from of the low-order sum to the high-order byte. The carry latch temporarily holds the carry, and this value is selected by the use_latched_carry control line. You might wonder why not just use the normal carry flag; the LDHI and LDSI instructions are designed to leave the carry flag unchanged, so they need somewhere else to temporarily store the carry. The surprising conclusion that Intel deliberately included circuitry in the 8085 specifically to support these undocumented instructions, and then decided not to support these instructions. (In contrast, the 6502's unsupported instructions are just random consequences of unsupported opcodes.)

    Generating the shift_right input signal

    Each bit of the ALU has a shift right input. For most of the bits, the input comes from the bit to the left, but the high-order bit uses different inputs depending on the instruction. The bottom circuit in the schematic below generates the shift right input for the ALU. This circuit has two simple options.
    • Normally the carry flag is fed into shift_right_in. For the ARHL and RAR instructions, this causes the carry flag to go into the high-order bit.
    • For the RRC and RLC instructions (rotate A left/right), the rotate_carry control selects bit 0 as the shift right input.

    Conclusions

    By reverse-engineering the 8085, we can see how the flag circuits in the 8085 actually works at the gate and silicon level. One interesting feature is the circuitry to implement undocumented instructions and flags. Another interesting feature is the complexity of the carry flag compared to the other flags.

    This information is based on the 8085 reverse-engineering done by the visual 6502 team. This team dissolves chips in acid to remove the packaging and then takes many close-up photographs of the die inside. Pavel Zima converted these photographs into mask layer images, generated a transistor net from the layers, and wrote a transistor-level 8085 simulator.

    Footnotes on rotate

    I recommend you skip this section, but there are few confusing things about the rotate logic that I wanted to write down.

    For some reason the rotate operations are named very strangely in the 8080 and 8085. RRC is the "rotate accumulator right" instruction and RAR is the "rotate accumulator right through carry" instruction. Based on the abbreviations, the names seem reversed. The left rotates RLC and RAL are similar. The Z-80 processor has a similar RRC instruction, but calls it "rotate right circular", making the abbreviation slightly less nonsensical.

    Bit 0 of ACT is fed into shift_right_in for both RRC and RLC. However, this input is just ignored for RLC since the rotation is the other direction, so I assume this is just a result of the control logic treating RRC and RLC the same.)

    To reduce the control circuitry, the rotate_carry and use_latched_carry control lines are actually the same control line since the instructions that use them don't conflict. In other words, there is just one control line, but it has two distinct functions.

  • Twelve tips for using the Rigol DS1052E Oscilloscope

    In this article I share a few tips I've learned about using the Rigol DS1052E oscilloscope.

    The Rigol DS1052E digital oscilloscope.

    The Rigol DS1052E digital oscilloscope.

    Push the knobs

    The knobs all have convenient actions if you push them: pushing Vertical Position or Horizontal Position centers the trace vertically or horizontally. Pushing Tigger Level sets it to zero. Pushing Scale sets it to fine adjust mode.

    Long Memory

    If you don't use Long Memory, you're wasting most of the capacity of the oscilloscope. Long Memory stores 64 times as much data, so you can really zoom in on the waveform. To enable Long Memory, push the Acquire menu button, then select MemDepth to set Long Mem. There's additional documentation here.

    The Long Memory depth option of the Rigol DS1052E oscilloscope.

    The Long Memory depth option of the Rigol DS1052E oscilloscope.

    Use zoom

    Once you've recorded a waveform, you can pan across it using the horizontal position knob - the waveform window indicator at the very top of the screen shows where you are. In mid-range settings, however, the pan range is fairly limited (about a factor of 5) compared to how deep you can zoom with the horizontal scale knob (about a factor of 1000 with Long Memory). Note: zoom works best with Single triggering; if you use Auto or Norm triggering and hit Run/Stop, sometimes the detailed data isn't in memory and zoom doesn't show more than is on the display.

    Pushing the horizontal scale knob turns on the cool zoom mode, which lets you see the trace and a zoomed-in version at the same time, letting you zoom and pan.

    The zoom feature of the Rigol DS1052E oscilloscope.

    The zoom feature of the Rigol DS1052E oscilloscope.

    Using the menus

    Most of the menu buttons are in the group of 6 at the top. However, there's also a trigger menu button under the trigger knob and a time base menu under the horizontal position knob. This is in addition to the four vertical menu buttons: CH1, CH2, MATH, and REF. Among other things, the time base menu lets you select X-Y mode.

    The menus hide about 1/6 of the display, so close the menu when you're done: push the round Menu On/Off button or push a menu button a second time.

    Don't press Auto

    The Auto button is right next to the Run/Stop button, so you might think it will set the trigger to Auto Sweep. Instead this button sets the controls to seemingly-random values to aotomatically display your traces. This is good if you're totally lost, but more likely to wipe out the settings you want.

    Screenshots

    Some oscilloscopes make screenshots easy, but the Rigol is more complicated. To take a screenshot on the Rigol, plug a USB drive into the front panel, then hit the Storage menu button, select Bit map under Storage, select External, New File, and Save. This will save NewFile0.bmp to your flash drive. (It's much easier to rename the file on your computer than on the oscilloscope.)

    An alternative is to run the slightly clunky UltraScope software on your computer, which gives you access to the oscilloscope via USB. You can download "UltraScope for DS1000E" from the Rigol Software Applications page; although it has a PDF icon, it's actually a Zip file with the software.

    Built-in help

    If you hold down a button or knob for three seconds, the oscilloscope displays a help screen explaining its action. (I was surprised when I discovered this by accident.)

    The built-in help feature of the Rigol DS1052E oscilloscope is triggered by holding down a button or knob.

    The built-in help feature of the Rigol DS1052E oscilloscope is triggered by holding down a button or knob.

    Triggering

    The three trigger sweep modes are Auto, Normal, and Single. Auto will keep displaying traces until you hit Run/Stop. Normal will display a trace every time the trigger condition is satisfied. Single will display a single trace when triggered and then stop. Auto is the way to see a waveform without worrying about triggering. But if you want a nice, stable waveform, set up the trigger and use Normal. Also make sure you're triggering from the right channel - the oscilloscope likes to default to using Channel 2 as the trigger.

    Controlling the channels

    If you've used an oscilloscope with separate controls for each channel, you may expect the knobs near CH1 to control channel 1, and the knobs near CH2 to control channel 2. Instead, if you hit CH1, the knobs control channel 1's scale and position, while if you hit CH2, Math, or Ref, the same knobs control that channel's scale and position. Make sure you're controlling the trace you think you're controlling.

    Use the colored probe rings

    Maybe this is too obvious to mention, but putting matching colored rings on both ends of the oscilloscope probes lets you easily tell which probe goes with which channel.

    Label oscilloscope probes with colored rings that match the trace colors.

    Label oscilloscope probes with colored rings that match the trace colors.

    Cursors

    The cursors are very handy to measure voltages, times between two points, frequency, etc. (The Measure mode provides lots of automated measurements, but often doesn't measure what you want.) Manual mode lets you position two cursors (either vertical or horizontal), and the positions and difference are displayed. Track lets you position a cursor along the waveform, and a voltage cursor automatically tracks the waveform. Both time and voltage values are displayed. Auto mode is the mode you should use with Measure, in order to see what the measurements mean.

    A tracking cursor puts X-Y lines on the waveform and gives measurements.

    A tracking cursor puts X-Y lines on the waveform and gives measurements.

    Finding the manual

    Search for DS1000E (not DS1052E) to find the user's guide and other documentation.

    Conclusions

    I'm glad I bought the Rigol DS1052E - it performs very well for a low-price ($329) oscilloscope. (If money is no object, there's Agilent's $439,000 Infiniium oscilloscope. :-) I hope you find these tips useful. If you have any additional oscilloscope tips, please leave a comment.

    The Mili universal car/wall USB charger, tested in the lab

    I received a Mili universal USB charger for review from Mobile Fun. This interesting charger has some features that make it my current favorite travel charger. It runs off both wall power and car accessory power. It comes with swappable plugs for Europe, UK, US, or Australia, and runs on 120 or 240 volts. It has two USB outputs - I thought this was pointless until I discovered how useful it is in car trips if two people can charge at the same time. In addition, one of the ports provides 10 watts for charging tablets (when plugged into AC). The charger also lights up - red indicates charging, and green indicates the devices are charged.

    The charger has a few disadvantages. It is a bit expensive with a list price of $49. Measuring about 2 3/4 inches by 2 1/4 inches, it's much larger than Apple's super-compact inch-cube charger - although it has much more functionality. Finally, due to the design, it ends up blocking both outlets when you plug it into the wall.

    In the remainder of this article, I test the performance of the charger both in the car and with AC power. To summarize, the power quality is excellent in the car, but has more noise than the average charger when plugged into the wall.

    The Mili charger with adapters for different countries.

    The Mili charger with adapters for different countries.

    The label shows that when connected to AC, the charger is rated as 2.1A for output 1 and 1A for output 2; that is, it is designed to power an iPad from output 1 and a phone from output 2. When plugged in to a car accessory outlet, it is only rated to provide 1 amp, so charging a tablet will be slower. In the measurements below, I find that the charger's power exceeds these ratings when plugged into the wall, which is good, but provides a bit less than the expected one amp when plugged into a car output, which may make charging slower.

    Label from the Mili charger.

    Label from the Mili charger.

    Apple devices can reject "wrong" chargers with the error "Charging is not supported with this accessory"; Apple uses special proprietary voltages on the USB data pins to distinguish different types of chargers (details). I measured these voltages on the Mili charger and verified that it is configured to appear as an Apple 2A charger on ouput 1, and an Apple 1A charger on output 2.

    Cars: a hostile electrical environment

    You might expect to find 12 volts at your car's accessory outlet, but what comes out can be surprisingly noisy and variable. This voltage will have spikes from the ignition system as well as very large transients due to starting, malfunctions, or jump starting. A car charger must handle this hostile voltage input, and make sure the output to your device is smooth.

    Test setup to measure charger performance in a car.

    Test setup to measure charger performance in a car.
    I measured the voltage in my car to see what happens in a real-world environment using the setup illustrated above. The Mili charger is plugged in just to the left of the gear shift. Above it is the USB interface board, which is connected to the oscilloscope on the dash.

    Car voltage drops and rises when the car is started. Car voltage at idle showing ignition spikes.

    Car voltage drops and rises when the car is started (left). Car voltage at idle showing ignition spikes (right).

    The oscilloscope trace (yellow) on the left shows the large voltage fluctuations when I started the car. At the very left, with the ignition off, the battery provides about 12.5 volts. The starter pulls the voltage down to 8.88 volts until the engine starts. The voltage gradually rises over 6 seconds, settling around 14 volts.

    On the right, zooming in shows that while the car is idling, the accessory output has 1/2 volt spikes every 28 milliseconds, due to the ignition firing. Note the voltage on the left is much noisier with the car running than on battery - the line on the left is thin, and the line on the right is thick.

    Performance of the Mili charger in a car

    The Mili charger has a plug that folds out from the side for use in a car. While this makes the charger larger than a dedicated wall charger, having a charger that works both in the car and with AC is more convenient than I expected, especially when traveling.

    The Mili USB charger with car adapter.

    The Mili USB charger with car adapter.
    I looked at the output of the Mili charger while starting the car, to see if the large voltage fluctuations shown above affected the charger's output. The Mili output remained steady, which is good. I also didn't see any of the ignition spikes in the output from the Mili charger. This indicates that the Mili charger does a good job of filtering out noise from the automotive environment.

    I tested the Mili charger with inputs from 0 to 30 volts. 30 volts may seem excessive, but jump-starts often use 24 volts, and car electrical failures can result in a 120 volt "load dump". Fortunately, the Mili survived 30 volts just fine (unlike some other chargers I'm testing). The image below shows that the Mili generates a stable output voltage (horizontal line) for inputs from 7 volts to 30 volts. This is a good thing, showing that the Mili won't overload your phone even if your car is providing too much voltage. As expected, the Mili can't produce the full output voltage if the input voltage is too low (left side of the graph).

    Output voltage (Y axis) of the Mili charger as the input ranges from 0 to 30 volts (X axis).

    Output voltage (Y axis) of the Mili charger as the input ranges from 0 to 30 volts (X axis).

    The oscilloscope displays below show the output and frequency spectrum with 12V DC input and a 5W load. The power quality is very good - the yellow line is thin and has very few spikes. The high frequency spectrum (orange) shows a spike at the switching frequency, but overall the power quality is among the best of chargers I've looked at.

    High frequency spectrum of the Mili charger with 12V input. Low frequency spectrum of the Mili charger on 12V input.

    High frequency spectrum (left) and Low frequency spectrum (right) of the Mili charger on 12V input.

    Next, I measured the voltage the charger can provide under increasing load (details). The horizontal line shows the voltage drops from about 5 volts to 4.5 volts as the load increases. The vertical line shows the charger maxes out around .9 amps with less than the expected 5 volts. This is slightly less than the rated 1 amp the charger is supposed to provide. Both USB outputs provide the same current when plugged into a car outlet.

    Voltage vs Current for the Mili charger with 12V input.

    Voltage vs Current for the Mili charger with 12V input.

    Charger performance with wall input

    I also examined the performance of the Mili charger when plugged into the wall (120V AC). One minor annoyance with using the Mili as a wall charger is that due to the position of the USB ports, both wall outlets are blocked either by the charger or USB cables.

    The Mili charger.

    The Mili charger.

    The images below show the voltage the charger can provide under increasing load (details). When plugged into the wall, the two USB outputs provide different maximum currents, unlike when plugged into a car outlet. Output 1 (the high current output) is on the left, and output 2 (the low current output) is on the right. Output 1 reaches about 2.45A before the voltage starts dropping, well above the 2.1A rating. The line for output 1 gets fairly wide above 1A, showing the voltage is not too stable. The line also slopes downwards to the right, indicating the voltage drops somewhat as the load increases. Output 2 reaches about 1.1A before the light starts flashing and the power drops and climbs (the curved lines). This graph shows strange behavior under overload that I haven't seen in other chargers. The lines are all fairly wide, showing the voltage is

    Voltage vs current for the Mili charger (output 1) with 120V AC input. Voltage vs current for the Mili charger (output 2) with 120V AC input.

    Voltage vs current for the Mili charger (output 1 left, output 2 right) with 120V AC input.
    I looked at the voltage output along with the high frequency and low frequency spectrums (below), to examine the quality of the power outputs. The yellow line is much wider than when plugged into the car outlet, showing a lot more noise in the output. The large orange spike in the middle of the high frequency spectrum shows that a lot of the charger's switching noise is appearing on the output. Compared to other chargers, the power quality is lower than average. On the positive side, the flat low-frequency spectrum shows the charger is very good at eliminating ripple due to the 60 Hz power lines.

    High frequency spectrum of the Mili charger with 120V AC input. Low frequency spectrum of the Mili charger with 120V AC input.

    High frequency (left) and low frequency (right) spectrum of the Mili charger with 120V AC input.

    Conclusions

    The Mili charger is convenient for travel because it has plugs for multiple countries, works as an auto charger, and has dual outputs. The power quality is very good in the car, but not so good with AC power. This charger is my favorite charger now - while I'd like to tear it apart and examine the circuit inside, I like it too much to destroy it. Hopefully if you get one you'll like it too. And if you found this interesting, check out my detailed analysis of a dozen chargers in the lab.

    Thanks to Mili, Mobile Fun, and Mihnea for providing me with the charger and patiently waiting for the review.