Inside the Apple-1's shift-register memory

Apple's first product was the Apple-1 computer, introduced exactly 46 years ago, on April 11, 1976. This early microcomputer used an unusual type of storage for its display: shift register memory. Instead of storing data in RAM (random-access memory), it was stored in a 1024-position shift register. You put a bit into the shift register and 1024 clock cycles later, the bit pops out the other end. In the early days of random-access memory chips, shift-register memory was cheaper so many systems used it.1 The downside, of course, is that you had to use bits as they became available, rather than access arbitrary memory locations.2

Die of the Signetics 2504 shift register chip. Click this image (or any other) for a larger version.

Die of the Signetics 2504 shift register chip. Click this image (or any other) for a larger version.

The photo above shows the chip under the microscope. The underlying silicon is grayish, with white metal wiring on top. The thickest metal wiring provides power to the chip. The chip also has wiring and transistors constructed from a type of silicon called polysilicon; the polysilicon appears red in the photo. Most of the die is occupied by the shift register, arranged in rows that snake back and forth. The squares around the edge of the die are bond pads, where bond wires connect the die to the chip's external pins.3

The Apple-1's display

The Apple-1 displayed 24 lines of forty characters on a television monitor. Like most computers at the time, the Apple-1 stored characters rather than pixels to reduce memory requirements. A character-generation ROM converted each character into a 5×7 matrix of pixels as it was displayed. To reduce memory even more, the display didn't store full bytes, but 6-bit characters, supporting upper-case letters, numbers, and some symbols.

The Apple-1 computer was sold as a circuit board. The user had to supply a keyboard, power supply, display, and case. Photo by Cynde Moya, CC BY-SA 4.0.

The Apple-1 computer was sold as a circuit board. The user had to supply a keyboard, power supply, display, and case. Photo by Cynde Moya, CC BY-SA 4.0.

The six-bit display characters were held in six 1024-bit shift registers. A seventh shift register tracked the cursor position.4 The diagram below shows the shift registers and the clock driver on the Apple-1 circuit board. These chips are in 8-pin packages, so two chips fit into the space of a regular TTL chip.5

Apple-1 circuit board, showing the 1024-bit shift register chips and the clock driver chip.
Original image from
Achim Baqué, CC BY-SA 4.0.

Apple-1 circuit board, showing the 1024-bit shift register chips and the clock driver chip. Original image from Achim Baqué, CC BY-SA 4.0.

The image below shows how the 2504 shift register chips are represented on the Apple-1 schematic. The chips use just 6 pins. Each chip has a single connection for bits coming in and a connection for the bits coming out. The remaining pins provide the two clock signals and the ±5 volt power supplies. Unlike RAM chips, these chips do not take an address.

Detail of the Apple-1 schematic showing two of the shift register chips.

Detail of the Apple-1 schematic showing two of the shift register chips.

PMOS integrated circuits

This shift register chip was created around 1970, an interesting time in the development of MOS integrated circuits. Early integrated circuits used a type of transistor known as bipolar. However, the metal-oxide-semiconductor (MOS) transistor had the potential to make cheaper, high-density integrated circuits. The first commercial MOS integrated circuit was a 20-bit shift register, created in 1964 by a company called General Microelectronics.

The diagram below shows the structure of a MOS transistor. At the bottom is the silicon, which is doped with impurities to form p-type silicon. The two conductive p-type regions are called the transistor's source and drain. The channel acts as a switch between the source and drain, turned on by voltage in the metal gate above. A thin insulating oxide layer separates the metal gate from the underlying silicon. These three layers—metal, oxide, semiconductor— give the MOS transistor its name. In the late 1960s, chips started to use gates made of polysilicon, a special type of silicon that produced better transistors than metal gates. This is the technology used by the 2504 shift register: the "P-MOS silicon gate process".

Structure of a P-type MOSFET.

Structure of a P-type MOSFET.

By the mid-1970s, however, integrated circuits changed in two more ways. First, P-MOS transistors were replaced by N-MOS transistors, which had better performance. Second, the introduction of ion implantation machines allowed transistor characteristics to be adjusted, with "depletion-mode" transistors8 leading to faster, lower-power circuitry. These changes ushered in the age of popular microprocessors such as the Zilog Z80, MOS Technology 6502, and Intel 8085. These had much better performance than earlier PMOS processors such as the Intel 8008.7 The 6502, of course, was the processor in the Apple-1 (and Apple II).

The shift register

Next, I'll look at the details of how the shift register was constructed. The idea of a shift register is that bits are passed from stage to stage, controlled by clock pulses. With 1024 stages, the shift register can hold 1024 bits. Each shift register stage uses two transistors and two inverters as shown below. During the first clock phase, the first transistor turns on, allowing the input bit to pass through it and the first inverter. During the second clock phase, the second transistor turns on, allowing the inverted value to pass through it and the second inverter, producing the output. Thus, a bit takes two clock phases to move through the shift register stage.

In the first clock phase, the input passes through the first transistor. In the second clock phase, the input is held by the gate capacitance and passes to the output.

In the first clock phase, the input passes through the first transistor. In the second clock phase, the input is held by the gate capacitance and passes to the output.

This circuit is a dynamic shift register, which works due to the circuit's capacitance. When the first transistor turns off, the value remains at the input to the first inverter, held by the capacitance of the circuit. (And likewise for the second transistor.) Because the gate of a MOSFET uses almost no current, the bit value will remain for a couple of milliseconds or so before it drains away. (This is the same principle used by DRAM, holding bits through capacitance.) As long as the clock keeps going, the bit gets refreshed by each stage.

Each inverter is implemented using two MOS transistors. The concept is shown on the left, below. A high input turns on the transistor, which pulls the output low. A low input turns off the transistor allowing the pull-up resistor to pull the output high. Thus, the circuit inverts its input.

Conceptually, the inverter uses the circuit on the left. The implementation uses the circuit on the right.

Conceptually, the inverter uses the circuit on the left. The implementation uses the circuit on the right.

The circuit is actually implemented with a transistor in place of the resistor, as shown on the right, because transistors are more compact than resistors. A high input to the upper transistor turns it on, causing the pull-up current to flow. In a standard inverter, the transistor would be connected to be always on.9 However, the output of the inverter is only used during one clock phase. To reduce power consumption, the transistor is wired to the clock so it only acts as a pull-up when needed.

A shift-register stage on the die

The diagram below shows how shift-register stages are physically constructed on the die. The first part of the image shows how the circuitry appears under the microscope, a complicated jumble of silicon, polysilicon, and metal circuitry. In the middle, I've highlighted the doped silicon in green and the polysilicon in red. A transistor gate (yellow) is formed where polysilicon crosses silicon, with the source and drain on either side. (The horizontal metal wiring should be clear without highlighting.) Note the complex, optimized shapes of the polysilicon and the transistors. Finally, a black dot indicates a contact that connects two layers. In the bottom half of the image, bits are shifted to the right, while in the top half, bits are shifted to the left.

One stage of the shift register.

One stage of the shift register.

In the lower right, one stage of the shift register is represented by a schematic on top of the underlying circuitry. The stage is implemented with six transistors as described earlier. Note that the pull-up transistors to Vdd are long and skinny, reducing their current. The inverter transistors to Vcc, on the other hand, are wide, so they provide a lot of current. The circuitry in the top half of the image is the same, but rotated 180°. Note that the two rows of shift registers share the clock phase lines and Vdd, making the layout more efficient.6

Topology of the chip

You might expect the chip to consist of 1024 shift-register stages arranged into a chain. However, the chip had an unusual topology that allowed it to operate at double speed: one bit per clock phase instead of one bit per complete clock cycle. It accomplished this with a simple trick: it was really two 512-bit shift registers operating in parallel. The first operated on clock phase 1, phase 2, phase 1, ..., while the second was the opposite: phase 2, phase 1, phase 2, ... The result was that one half would produce bits in phase 1, while the other would produce bits in phase 2. The output circuit merged these together into a single output stream. From the outside, it looked like a 1024-bit shift register that operated twice as fast.

Another complication is that Signetics produced three 1024-bit shift register chips from the same silicon layout: the 2502 (organized as four 256-bit shift registers), the 2503 (512×2), and the Apple-1's 2504 (1024×1). The different chips were created by changing the metal wiring of the chip during manufacturing, which was much easier than building completely different chips. To support this, the shift register was broken into eight 128-bit segments, shown below. In the 1024×1 chip, two chains of four 128-bit segments ran in parallel (on opposite clock phases) to produce a 1024-bit shift register. The first chain used the light-colored segments A, B, C, D, while the second chain used the dark-colored segments. The segments are connected by the metal wiring along the side of the die. The chip's pads around the edges are labeled; the grayed-out ones are not used in this chip. The large block of circuitry above the output pin combines the two chains into one output.

The chip consists of 8 shift-register chains, each 128 bits long. They are connected in different ways to form different shift register chips.

The chip consists of 8 shift-register chains, each 128 bits long. They are connected in different ways to form different shift register chips.

The other variants of the chip wire the shift-register segments differently and use additional input and output pins. The 512×2 2503 chip used four chains of 256 bits along with two input and output circuits. The 2502B chip used all eight 128-bit chains in parallel to form a 256×4 shift register, with four input and output circuits.10

The image below shows one of the unconnected outputs: the red polysilicon wire isn't connected at one end. With a small change to the metal layer, the metal wiring between two segments can be broken and the segment wired to this output instead. The other changes between chip versions are similar.

The polysilicon wire in the middle is disconnected.

The polysilicon wire in the middle is disconnected.

The clock driver

I'll wrap up with a brief mention of the clock driver chip that drives the shift registers. Shift-register memory chips required clock pulses with high current and unusual voltages due to the PMOS circuitry: from +5 volts to -11 volts. These pulses were provided by a special chip, the DS0025 Two-Phase MOS Clock Driver. The die photo below shows this chip. The die is dominated by four power transistors that produced 1.5 amp pulses. I wrote a blog post about the clock driver chip if you want more details.

Die photo of the DS0025 clock driver chip.

Die photo of the DS0025 clock driver chip.

Conclusion

The Apple-1 is now a collector's item, with boards selling for hundreds of thousands of dollars. However, when it was introduced in 1976, it wasn't a particularly important computer, with about 200 sold at the price of $666.66. The Apple II, which came out a year later in 1977, was a much more influential computer, selling millions to become one of the archetypical home computers of that era. The Apple II used RAM chips for all its storage, illustrating that shift-register memory had rapidly become obsolete.

The shift-register chip illustrates the amazing decline in memory prices, as reflected by Moore's Law. This 1-kilobit shift register cost about $60 (in current dollars), while a 16-gigabit DRAM chip now costs about $6. Thus, memory is about 160 million times cheaper now, an amazing drop.

I announce my latest blog posts on Twitter, so follow me @kenshirriff. I also have an RSS feed. Thanks to @TubeTimeUS for supplying the chip. I've written about the Intel 1405 shift register memory if you want to know more about this type of storage.

Notes and references

  1. The first reference to the Signetics 2504 that I could find was in 1970, when each chip cost $11.05 in quantities of 100 (about $60 in current dollars). Looking in an old Byte magazine from 1976, a 1-kilobit shift register chip cost $9 ($34 in current dollars), while a 4-kilobit DRAM chip cost $20 ($75 in current dollars). Thus, it appears that even by the time the Apple-1 was released, DRAMs had become cheaper than shift registers.

    The Apple-1 used 4-kilobit RAM for data and program storage. It's possible, though, to build a computer that uses shift-register storage for its main memory. The Datapoint 2200 is one example. If memory is accessed sequentially, shift-register storage is efficient since the bits are provided sequentially. However, if you access memory out of sequence, the processor has to wait while the memory cycles around, until the desired bits become available. In a way, shift-register memory is a throwback to very early computers such as EDSAC (1949), which used mercury delay lines for main storage. 

  2. The behavior of shift-register memory was a good match for video circuitry, since characters are displayed on the screen in a fixed, repeating order (left to right and top to bottom). The IBM 2260 video display terminal (1965) used a technique similar to shift registers: it stored data in a sonic delay line, sending torsional pulses through a 50-foot nickel wire. But unlike the Apple-1, this delay line stored pixels, not characters. For more about this system, see my blog post

  3. The die was encased in an epoxy package. To expose the die, Eric (@TubeTimeUS) tediously sanded through the plastic package until the die was visible. There are a few scratches on the die from this process, especially in the upper left. 

  4. The display circuitry has some additional complexity. Characters can't be taken directly from the display shift register: since each character is made up of eight scan lines; a line of character must be processed eight times. To handle this, a second shift register (six 40-bit registers) buffers a line of characters and feeds each character into a display ROM. Another 1024-bit shift register keeps track of the cursor position. For more details, see stackexchange

  5. The Apple-1 display has a lot of similarity with the popular TV Typewriter, a hobbyist video terminal kit from 1973. The TV Typewriter used shift-register memory for its 32×16 display, but had a complex 5-board design. Wozniak's design for the Apple-1 was much simpler. 

  6. The schematic of the chip is shown below. Notice the upper and lower shift registers, which run on opposite clock phases. Apart from the 6-transistor shift-register stages, the only circuitry is the output stage that merges the two results and drives the output pin.

    Schematic of the chip. Click for a larger image. From the 1972 databook.

    Schematic of the chip. Click for a larger image. From the 1972 databook.

     

  7. Another major improvement in integrated circuits was the introduction of CMOS, which used NMOS and PMOS transistors together, with much lower power consumption. By the 1980s, processors such as the Intel 80386 (1985) and Motorola 68030 (1987) used CMOS. CMOS is still used in modern integrated circuits. 

  8. In the mid-1970s, ion implantation technology allowed the creation of depletion-mode transistors. These transistors could be used as pull-up elements, called depletion loads. Since depletion-load transistors could operate faster and with less current, they rapidly became a standard part of MOS integrated circuits, until replaced by CMOS in the 1980s. The Zilog Z80 and Intel 2102 SRAM were two early chips that used depletion loads. 

  9. You might think that the inverter circuit will result in a short circuit between Vdd and Vcc when both the input and the clock are high. However, the pull-up transistor is designed to produce a weak current, so the other transistor can still pull the output low. This current results in relatively high power consumption for PMOS or NMOS circuitry, a problem that is fixed by CMOS. 

  10. The 256×4 2502B chip required a 16-pin package, rather than the 8-pin package of other chips, due to the additional input and output pins. 

12 comments:

Brian Willoughby said...

Thanks again, Ken, for covering the Apple-1 video circuit.
Would you be interested in writing an article describing how the circuit is used by the 6502?
Are these seven 1024-bit shift registers basically cycling all the time to generate video, output to input, with the 6502 only able to wait for the cursor to make changes?
What is the fastest that the 6502 could update characters in the shift register?
How, exactly, does the "scroll" feature work?
Is there a writable VBL signal in addition to the cursor bit?
Does the display glitch whenever a character is written? ... or it is smooth?
Similarly for scroll: does that glitch during operation, or is it smooth?
If the Living Computer Museum in Seattle were open, I could probably answer some of these questions by watching their Apple-1 in action...
Thanks!

Brian Willoughby said...

I tried to leave a comment on your earlier Apple-1 article, but it got waylaid by moderation, so I'll just express my sentiments here.

Thank you for covering the Apple-1 video circuits and chips. I had previously assumed that the Apple-1 video was a simpler version of the Apple II video with less DRAM and fewer modes, much like the Apple II video evolved from 4-color hi-res to 6-color hi-res. You articles made me realize that the video circuits are quite different. Sure, both 1 and II have 24 lines of 40-characters each, and the Apple II started with only upper case, even though the circuits were capable of more symbols. It's fascinating to see an earlier design of Wozniak's.

Brian Willoughby said...

Related to my questions about the Apple-1 shift register access, has anyone tested the limits of the design?

i.e. Is is possible to write two cursor markers into the memory?
That would surely make things non-deterministic when adding text after "the cursor" since the relative timing of the display and code would determine where the character is added. I'm just wondering whether any of the circuitry prevents multiple cursor markers in the 7th bit.

Ken Shirriff said...

Brian: I don't know a whole lot about the Apple-1's display circuit, since I'm mostly interested in the shift-register chip :-) The main thing is that the display shift registers go very fast, and the current line being displayed is copied into much slower shift registers. So updates can happen much faster than the scan-line rate. And there shouldn't be glitches since you're not modifying the line being displayed.

The scroll feature delays the shift register timing by one line rather than explicitly moving the bits, so it's very simple.

I haven't looked at how the cursor marker gets stored and updated but I assume you can't create multiple cursors.

Chris Espinosa said...

The cursor marker is stored as one 1 bit in a pool of 0 bits in the shift register at C11b. It's clocked through the flip-flop at C13, and gated with the output of the 555 at D13 into the 2519 shift register as bits 5 (inverted) and 6. The 2519 shifts through the 40 characters on the line for 7 scan lines (+1 blanking line) into the 2513 generator, whose output bits O1-O5 are shifted out into the video signal. When the 555 oscillates, it changes the character in the cursor cell from $20 (space) to $40 (@-sign), which is the blinking cursor.

The Monitor ROM does a tight loop testing the high bit of the PIA at location $D012. When it goes positive, it writes one byte to the same location, which is latched into the shift registers. It also clears the cursor bit, which causes flip-flops 2 and 3 at C13 to re-set the cursor bit on the next character clock; this advances the cursor one position.

Chris Espinosa said...

I have seen an Apple 1 get into a state where there are two cursor bits. In that case, generated text is output at two places on the screen, alternating between each, so one part would say HLOWRD and the other EL OL. This can happen if you write a character with the high bit set to $D012 without checking to see whether the cursor is there.

The Clear Screen pin on the keyboard connector (pin 12), if pulled high, strobes both shift registers for as long as it's held down, filling the shift registers with 0s. The logic at C8 and C12 sets the cursor bit to 1 when MEM 0 comes around (signal /WC2). That fixes up the multiple-cursors state.

Brian Willoughby said...

Thanks, Chris!

LOCAL FAVORITE said...

Came here for general shift register education (I'm more interested in the really early Eventide digital delays, when they employed hundreds of these Signetics ICs to delay digitized audio) but you'll have to excuse me for a moment while I pick my jaw up off the floor...

*The* Chris Espinosa? Holiest of guacamolies, how rad is that?? Much respect, for your incredible career and for just casually dropping Apple Arcana knowledgebombs like the above.

Internet. Wow. Sorry, just a bit starstruck.

Jeff Moffatt said...

Hi. I second everything that Chris said. I was the custodian of a pile of partially built Computer Conversers back in my Call Computer days. The Converser was practically the same circuit, except that in place of the PIA it had a UART. (And of course no CPU or RAM or ROM). The max baud rate was 300, or 30 characters per second, so there was plenty of time for the shift registers to come around to the cursor for the next character to be inserted.

Curt J. Sampson said...

@Brian Willoughby:

The operation of the output to the video display is fairly clear from the ECHO routine in the Apple 1 ROM and the schematics in the Apple-1 Operation Manual.

The 6821 handles both keyboard input (Port A) and output to the display (Port B). Port B is configured to use PB7 as an input and PB0 through PB6 as outputs. PB7 will read high when the shift sequence is at the cursor location; you can then write the character to Port B which will send it over PB0 through PB6 to be inserted at that point in the shift sequence. This is very simply done in code: just execute BIT $D012 followed by BMI back to that BIT instruction to loop until the video system is ready and then STA $D012 to send the character. (That code starts at $FFEF in the ROM, labeled ECHO in the listing.)

The 6502 can update once per frame, as the cursor location comes around, so 60 times per second. The only control character available is CR to move to the start of the next line (scrolling if the cursor was on the bottom line); there's no way to move the cursor (beyond printing a character) or even to clear the screen via software (there is a switch provided that you can close to clear the screen manually).

The scroll feature is a bit complex; it's done all in hardware in the terminal section of the board. On the far right of the terminal section sheet you can see how some gates are used to detect when RD7 through RD1 are 0001101 = $0D = ASCII CR; you can try to follow along from there to see how the scroll is done.

EriknocTDW said...

In the ROM code listed in the Apple-1 Operation Manual, it looks like storing an $8D(CR) to DSP($C012) does not actually write that value to the screen but actually starts a new line, which is not surprising. So does that mean that it's impossible to store any values $80-9F(control characters) range to the screen? What about Inverse and Flashing characters (available in low ASCII on all Apple II's), or $E0-FF(duplicate of Normal special characters on Apple II and unmodified II+)? Are there any other Apple 1 books that elaborate more regarding it's available screen codes?

Apple II and unmodified II+ displays:
$00-1F Inverse uppercase
$20-3F Inverse special
$40-5F Flashing uppercase
$60-7F Flashing special
$80-9F Normal uppercase (mirrors $C0-DF / control characters)
$A0-BF Normal special
$C0-DF Normal uppercase
$E0-FF Normal special (mirrors $A0-BF / lowercase not yet supported)

Curt J. Sampson said...

EriknocTDW: It's possible to send control characters other than CR to the display circuitry, but they don't do anything useful. (I don't recall what they display, perhaps nothing or perhaps just duplicates of the the available 64 glyphs in the character ROM.) The only things you can do on this display are print one of the 64 characters from the ROM (ASCII sticks 2-punctuation, 3-numbers, 4,5-upper case letters; the 2513 data sheet is linked in the post) or a CR. CR is the only cursor movement available. It's the dumbest form of a "glass TTY" terminal, basically.