Showing posts with label electronics. Show all posts
Showing posts with label electronics. Show all posts

Looking inside a vintage Soviet TTL logic integrated circuit

This blog post examines a 1980s chip used in a Soyuz space clock. The microscope photo below shows the tiny silicon die inside the package, with a nice, geometric layout. The silicon appears pinkish or purplish in this photo, while the metal wiring layer on top is white. Around the edge of the chip, the bond wires (black) connect pads on the chip to the chip's pins. The tiny structures on the chip are resistors and transistors.

Die photo of the Soviet 134ЛА8 (134LA8) NAND gate integrated circuit. (Click any photo for a larger image.)

Die photo of the Soviet 134ЛА8 (134LA8) NAND gate integrated circuit. (Click any photo for a larger image.)

The chip is used in the clock shown below. We recently obtained this digital clock that flew on a Soyuz space mission.1 The clock displays the time on the upper LED digits and provides a stopwatch on the lower LEDs. Its alarm feature activates an external circuit at a preset time. I expected that this clock would have a single clock chip inside, but the clock is surprisingly complicated, with over 100 integrated circuits on ten circuit boards. (See my previous blog post for more information about the clock.)

Space clock from Soyuz with the cover removed.

Space clock from Soyuz with the cover removed.

The clock's circuit boards can be opened like a book to reveal the integrated circuits and other components, thanks to the flexible wiring harnesses that connect the boards. The integrated circuits are mostly 14-pin "flat packs" in metal packages, surface-mounted on the printed circuit boards. I wanted to know more about these integrated circuits, so I opened one up,2 took photos, and reverse-engineered the chip's circuitry.

The wiring bundles are arranged so the boards can swing apart. The quartz crystal that controls the clock's timing is visible in the upper center. The clock's power supply is on the boards at the right, with multiple round inductors.

The wiring bundles are arranged so the boards can swing apart. The quartz crystal that controls the clock's timing is visible in the upper center. The clock's power supply is on the boards at the right, with multiple round inductors.

Soviet integrated circuits

The clock is built from TTL integrated circuits, a type of digital logic that was popular in the 1970s through the 1990s because it was reliable, inexpensive, and easy to use. (If you've done hobbyist digital electronics, you probably know the 7400-series of TTL chips.) A basic TTL chip contained just a few logic gates, such as 4 NAND gates or 6 inverters, while a more complex TTL chip implemented a functional unit such as a 4-bit counter. Eventually, TTL lost out to CMOS chips (the chips in modern computers), which use much less power and are much denser.

The photo below shows a chip with its metal lid removed. The tiny silicon die is visible in the middle, with bond wires connecting the die to the pins. This integrated circuit is very small; the ceramic package is 9.5mm×6.5mm, considerably smaller than a fingernail. To open up a chip like this, I normally put it in a vise and then tap the seam with a chisel. However, in this case, the chip decapped itself—while I was looking for a hammer, the top suddenly popped off due to the pressure from the vise.

The integrated circuit with its metal lid removed, showing the tiny silicon die inside.

The integrated circuit with its metal lid removed, showing the tiny silicon die inside.

The chip I'm examining has the Cyrillic part number 134ЛА8 (134LA8)6. It implements four open-collector NAND gates, as shown below.4 The NAND gate is a standard logic gate, outputting a 0 if both inputs are 1, and otherwise outputting a 1. An open-collector output is slightly different from a standard output. It will pull the output pin low for a 0, but for a 1 it just leaves the output floating ("high impedance").5 An external pull-up resistor is required to pull the output high for a 1. The clock uses three of these chips: one in the quartz crystal oscillator circuit, and another functioning as inverters in another part of the clock.3

Logic diagram of the Soviet 134ЛА8 (134LA8) integrated circuit, with pin numbers.

Logic diagram of the Soviet 134ЛА8 (134LA8) integrated circuit, with pin numbers.

The Soviet Union lagged about 9 years behind the US in integrated circuit development.7 The lag would have been much larger, except the Soviet Union copied many Western integrated circuits. As a result, most of the Soviet TTL chips have Western equivalents.4 However, the 134ЛА8 chip that I examined is different from Western chips8 with two unusual features. First, to reduce the number of external resistors, this chip includes two pull-up resistors on the chip that can be wired up as desired. Second, the chip shares two NAND gate inputs, which frees up the two pins used by the resistors. Thus, even though the Soviet Union was copying integrated circuits, they were also creatively designing their own chips.

Integrated circuit components

Under the microscope, the transistors and resistors of the integrated circuit are visible. The silicon die appears in shades of pink, purple, and green, depending on how different regions of the chip have been "doped". By doping the silicon with impurities, the silicon takes on different semiconductor properties, making N-type and P-type silicon. On top of the silicon, the white lines are metal traces that wire together the components on the silicon layer.

The photo below shows how a resistor appears on the silicon die. A resistor is formed by doping silicon to form a high-resistance path, the reddish line below. The longer the path, the higher the resistance, so the resistors typically zig-zag back and forth to create the desired resistance. The resistor is connected to the metal layer at both ends, while another metal passes over the resistor shown below.

A resistor on the integrated circuit die.

A resistor on the integrated circuit die.

This chip, like other TTL chips, uses bipolar NPN transistors. These transistors have N-type silicon for the emitter, P-type silicon for the base, and N-type silicon for the collector. On the IC, the transistors are constructed by doping the silicon to form layers with different properties. At the bottom of the stack, the collector forms the bulk of the transistor, doped to form N-type silicon (the large green area below). On top of the collector, a thin region of P-type silicon forms the base; this is the reddish region in the middle. Finally, a small square N-type emitter is formed on top of the base. These layers form the N-P-N structure of the transistor. Note that the metal wiring to the collector and base is off to the side, away from the main body of the transistor.

An input transistor on the integrated circuit die. The transistor is surrounded by an isolation ring (dark color) to separate it from the other transistors.

An input transistor on the integrated circuit die. The transistor is surrounded by an isolation ring (dark color) to separate it from the other transistors.

TTL circuits typically used transistors with multiple emitters, one for each input, and this can be seen above. A multiple-emitter transistor may seem strange, but it is straightforward to build one on an integrated circuit. The transistor above has two emitters wired up. Close examination shows there are four emitters, but the two lower unused emitters are shorted to the base.

The output transistors on the chip produce the external signal from the chip, so they must support much higher current than the other transistors. As a result, they are much larger than the other transistors. As before, the transistor has a large N-type collector region (green), with a base on top (pink), and then emitter on top of the base. The output transistor has long contacts between the metal layer and the silicon, rather than the small square contacts of the previous transistor. The emitter (wired in a "U" shape) is also much larger. These changes allow more current to flow through the transistor. In the photo below, the transistor on the left has no metal layer, so its silicon features are more visible.9 The transistor on the right shows the metal wiring.

Two output transistors on the integrated circuit die. The one on the left is unused, while the one on the right is wired into the circuit by the metal layer.

Two output transistors on the integrated circuit die. The one on the left is unused, while the one on the right is wired into the circuit by the metal layer.

How a TTL NAND gate works

The schematic below shows one of the open-collector NAND gates in the chip. In this paragraph, I'll give a brief explanation of the circuit; you can skip this if you want.10 To understand the circuit, first assume that an input is 0. The current through resistor R1 and the base of transistor Q1 will flow out through the transistor's emitter and the low input. Transistor Q2 will be off, so R3 pulls Q3's base low, turning Q3 off. Thus, the output will float (i.e. open-collector 1 output). On the other hand, suppose both inputs are 1. Now the current through R1 can't pass through an input so it will flow out the collector of Q1 (i.e. backward) and into Q2's base, turning on Q2. Q2 will pull Q3's base high, turning on Q3 and pulling the output low. Thus, the circuit implements a NAND gate, outputting 0 if both inputs are high. Note that Q1 isn't acting like a normal transistor, but instead is "current-steering", directing the current from R1 in one direction or the other.

Schematic of one gate in the integrated circuit. This is an open-collector TTL NAND gate.

Schematic of one gate in the integrated circuit. This is an open-collector TTL NAND gate.

The diagram below shows the components for one of the NAND gates, labeled to match the schematic. (The three other NAND gates on the chip are similar.) The wiring of the gate is simple compared to most integrated circuits; you can follow the metal traces (white) and match up the wiring with the schematic. Note the winding path from the ground pad to Q3. Q1 is a two-emitter transistor while Q3 is a large output transistor. Two unused transistors are below Q2.

The die, showing the components in a gate. Components are labeled (blue) for one of the NAND gates, while pins are labeled in red. The pull-up resistors are above and below the Vcc wire.

The die, showing the components in a gate. Components are labeled (blue) for one of the NAND gates, while pins are labeled in red. The pull-up resistors are above and below the Vcc wire.

Conclusion

This Soviet chip from 1984 is simple enough that the circuitry can be easily traced out, illustrating how a TTL NAND gate is constructed. The downside of simple chips, however, is that the Soyuz clock required over 100 chips to implement basic clock functionality. Even at the time, single chips implemented wristwatches and alarm clocks. Now, modern chips can contain billions of transistors, providing an extraordinary amount of functionality, but making the chip impossible to understand visually.

My previous blog post discussed the clock's circuitry in detail and I plan to write more about the clock, so follow me @kenshirriff (or on RSS) for details. Until then, you can watch CuriousMarc's video showing the disassembly of the space clock:

Notes and References

  1. CuriousMarc obtained the clock from an auction and it was advertised as flown to space, but we don't know which mission it was flown on. The date codes on the components inside the clock are mostly from 1983, with one from 1984, so the clock was probably manufactured in 1984. The Russian name for the clock is "Бортовые Часы Космические" (Onboard Space Clock), which is abbreviated as "БЧК". 

  2. Don't worry; I didn't destroy any of the chips in the clock. We bought duplicate chips on eBay for reverse-engineering. I was surprised that most of these 1980s-era chips are not too hard to obtain. 

  3. I don't see any obvious reason why the 134ЛА8 chip was used instead of an inverter chip. Surprisingly, even though the 7404 hex inverter chip was extremely common in US designs, the clock doesn't use any inverter chips at all. 

  4. For more information on Russian integrated circuits, including the ones used in the clock, see the databook Интегральные микросхемы и их зарубежные аналоги (Integrated circuits and their foreign counterparts). (The title makes it explicit that they were copying foreign chips.) Be warned that the databook's description of the 134ЛА8 has a few typos. 

  5. One reason to use open-collector gates is to get an AND gate "for free". Connecting outputs together produces a wired-AND; if any output is a 0, the tied-together output is a 0. (Tying together NAND gates is equivalent to AND-OR-INVERT logic.)

    Open-collector outputs can also be used on a bus, where multiple devices or boards can write signals to a bus line (as in the Xerox Alto) without electrical conflict. This use is obsolete, though; tri-state outputs provide much better performance. 

  6. One nice thing about Russian ICs is that the part numbers are assigned according to a rational system, unlike the essentially random numbering of American integrated circuits. Two letters in the part number indicate the function of the chip, such as a logic gate, counter, flip flop, or decoder. For example, consider the label "Δ134 ЛA8A". The series number, 134, indicates the chip is a low-power TTL chip. The "Л" (L) indicates a logic chip (Логические), with "A" indicating the NAND gate subcategory. Finally, "8" indicates a specific type of NAND chip in the ЛA category. As with American chips, the "0684" date code on the chip indicates that it was made in the 6th week of 1984. 

  7. Two CIA reports (1974 and 1986) provide information on the lag between Soviet IC technology and Western technology. "Microcomputing in the Soviet Union and Eastern Europe", ABACUS, 1985, discusses how the Soviet Union copied American microprocessors, especially Intel ones. 

  8. The 7400 series includes several quad open-collector NAND gate chips, such as the 7401, 7403, 7426, 7438, and 7439. These are all different from the Soviet chip. A die photo of the 74S01 is here; I think the Soviet chip has a much nicer layout. 

  9. The integrated circuit has a few unused transistors. In addition, the input transistors have 4 emitters, but only two of them are used. This is probably so the same silicon die can be used to manufacture multiple integrated circuits by changing the metal layer. For instance, the 4-emitter transistors could be used for 3- or 4-input NAND gates. Alternatively, the unused transistors could be used to create a hex inverter chip. 

  10. For a detailed explanation of how TTL gates work, see this page

The core memory inside a Saturn V rocket's computer

The Launch Vehicle Digital Computer (LVDC) had a key role in the Apollo Moon mission, guiding and controlling the Saturn V rocket. Like most computers of the era, it used core memory, storing data in tiny magnetic cores. In this article, I take a close look at an LVDC core memory module from Steve Jurvetson's collection. This memory module was technologically advanced for the mid-1960s, using surface-mount components, hybrid modules, and flexible connectors that made it an order of magnitude smaller and lighter than mainframe core memories.2 Even so, this memory stored just 4096 words of 26 bits.1

A core memory module from the LVDC. This module stored 4K words of 26 data bits and 2 parity bits. It weighs 2.3 kg (5.1 pounds) and measures about 14 cm×14 cm×16 cm (5½"×5½"×6"). Click on any photo for a larger version.

A core memory module from the LVDC. This module stored 4K words of 26 data bits and 2 parity bits. It weighs 2.3 kg (5.1 pounds) and measures about 14 cm×14 cm×16 cm (5½"×5½"×6"). Click on any photo for a larger version.

The race to the Moon started on May 25, 1961 when President Kennedy stated that America would land a man on the Moon before the end of the decade. This mission required the three-stage Saturn V rocket, the most powerful rocket ever built. The Saturn V was guided and controlled by the Launch Vehicle Digital Computer3 (below), from liftoff into Earth orbit, and then on a trajectory towards the Moon. (The Apollo spacecraft separated from the Saturn V rocket at that point, ending the LVDC's role.)

The LVDC mounted in a support frame. The round connectors are visible on the front side of the computer. There are 8 electrical connectors and two connectors for liquid cooling. Photo courtesy of IBM.

The LVDC mounted in a support frame. The round connectors are visible on the front side of the computer. There are 8 electrical connectors and two connectors for liquid cooling. Photo courtesy of IBM.

The LVDC was just one of several computers onboard the Apollo mission. The LVDC was connected to the Flight Control Computer, a 100-pound analog computer. The Apollo Guidance Computer (AGC) guided the spacecraft to the Moon's surface. The Command Module contained one AGC while the Lunar Module contained a second AGC7 along with the Abort Guidance System, an emergency backup computer.

Multiple computers were onboard an Apollo mission. The Launch Vehicle Digital Computer (LVDC) is the one discussed in this blog post.

Multiple computers were onboard an Apollo mission. The Launch Vehicle Digital Computer (LVDC) is the one discussed in this blog post.

Unit Logic Devices (ULD)

The LVDC was built with an interesting hybrid technology called ULD (Unit Logic Devices). Although they superficially resembled integrated circuits, ULD modules contained multiple components. They used simple silicon dies, each implementing just one transistor or two diodes. These dies, along with thick-film printed resistors, were mounted on a half-inch-square ceramic wafer to implement a circuit such as a logic gate. These modules were a variant of the SLT (Solid Logic Technology) modules developed for IBM's popular S/360 series of computers. IBM started developing SLT modules in 1961, before integrated circuits were commercially viable, and by 1966 IBM produced over 100 million SLT modules a year.

ULD modules were considerably smaller than SLT modules, as shown in the photo below, making them more suitable for a compact space computer.4 ULD modules used ceramic packages instead of SLT's metal cans, and had metal contacts on the upper surface instead of pins. Clips on the circuit board held the ULD module in place and connected with these contacts.5 The LVDC and associated hardware used more than 50 different types of ULDs.

SLT modules (left) are considerably larger than ULD modules (right). A ULD module is 7.6 mm × 8 mm.

SLT modules (left) are considerably larger than ULD modules (right). A ULD module is 7.6 mm × 8 mm.

The photo below shows the internal components of a ULD module. On the left, the circuit traces are visible on the ceramic wafer, connected to four tiny square silicon dies. While this looks like a printed circuit board, keep in mind that it is much smaller than a fingernail. On the right, the black rectangles are thick-film resistors printed onto the underside of the wafer.

Top and underside of a ULD showing the silicon dies and resistors. While SLT modules had resistors on the upper surface, ULD modules had resistors underneath, increasing the density but also the cost. From IBM Study Report Figure III-11.

Top and underside of a ULD showing the silicon dies and resistors. While SLT modules had resistors on the upper surface, ULD modules had resistors underneath, increasing the density but also the cost. From IBM Study Report Figure III-11.

The microscope photo below shows a silicon die from a ULD module that implements two diodes.6 The die is very small; for comparison, grains of sugar are displayed next to the die. The die had three external connections through copper balls soldered to the three circles. The two lower circles were doped (darker regions) to form the anodes of the two diodes, while the upper-right circle was the cathode, connected to the substrate. Note that this die is much less complex than even a basic integrated circuit.

Photo of a two-diode silicon die next to sugar crystals. This photo is a composite of top-lighting to show the die details, with back-lighting to show the sugar.

Photo of a two-diode silicon die next to sugar crystals. This photo is a composite of top-lighting to show the die details, with back-lighting to show the sugar.

How core memory works

Core memory was the dominant form of computer storage from the 1950s until it was replaced by semiconductor memory chips in the 1970s. Core memory was built from tiny ferrite rings called cores, storing one bit in each core by magnetizing the core either clockwise or counterclockwise. A core was magnetized by sending a pulse of current through wires threaded through the core. The magnetization could be reversed by sending a pulse in the opposite direction.

To read the value of a core, a current pulse flipped the core to the 0 state. If the core was in the 1 state previously, the changing magnetic field created a voltage in a sense wire threaded through the cores. But if the core was already in the 0 state, the magnetic field wouldn't change and the sense wire wouldn't pick up a voltage. Thus, the value of the bit in the core was read by resetting the core to 0 and testing the sense wire. An important characteristic of core memory was that the process of reading a core destroyed its value, so it needed to be re-written.

Using a separate wire to flip each core would be impractical, but in the 1950s a technique called "coincident-current" was developed that used a grid of wires to select a core. This depended on a special property of cores called hysteresis: a small current has no effect on a core, but a current above a threshold would magnetize the core. This allowed a grid of X and Y lines to select one core from the grid. By energizing one X line and one Y line each with half the necessary current, only the core where both lines crossed would get enough current to flip leaving the other cores unaffected.

Closeup of an IBM 360 Model 50 core plane. The LVDC and Model 50 used the same type of cores, known as 19-32 because their
inner diameter was 19 mils and their outer diameter was 32 mils (0.8 mm).
While this photo shows three wires through each core, the LVDC used four wires.

Closeup of an IBM 360 Model 50 core plane. The LVDC and Model 50 used the same type of cores, known as 19-32 because their inner diameter was 19 mils and their outer diameter was 32 mils (0.8 mm). While this photo shows three wires through each core, the LVDC used four wires.

The photo below shows one core plane from the LVDC's memory.8 This plane has 128 X wires running vertically and 64 Y wires running horizontally, with a core at each intersection. For reading, a single sense wire runs through all the cores parallel to the Y wires. For writing, a single inhibit wire (explained below) runs through all the cores parallel to the X wires. The sense wires cross over in the middle of the plane; this reduces induced noise because noise from one half of the plane cancels out noise from the other half.

One core plane for the LVDC's memory, holding 8192 bits. Connections to the core plane are made through the pins around the outside. From Smithsonian National Air and Space Museum.

One core plane for the LVDC's memory, holding 8192 bits. Connections to the core plane are made through the pins around the outside. From Smithsonian National Air and Space Museum.

The plane above had 8192 locations, each storing a single bit. To store a word of memory, multiple core planes were stacked together, one plane for each bit in the word. The X and Y select lines were wired to zig-zag through all the core planes, in order to select a bit of the word from each plane. Each plane had a separate sense line for reading, and a separate inhibit line for writing. The LVDC memory used a stack of 14 core planes (below), storing a 13-bit "syllable" along with a parity bit.10

The LVDC core stack consists of 14 core planes. This stack is at the US Space & Rocket Center. Photo from NCAR EOL. I retouched the photo to reduce distortion from the plastic case.

The LVDC core stack consists of 14 core planes. This stack is at the US Space & Rocket Center. Photo from NCAR EOL. I retouched the photo to reduce distortion from the plastic case.

Writing to core memory required additional wires called the inhibit lines. Each plane had one inhibit line threaded through all the cores in the plane. In the write process, a current passed through the X and Y lines, flipping the selected cores (one per plane) to the 1 state, storing all 1's in the word. To write a 0 in a bit position, the plane's inhibit line was energized with half current, opposite to the X line. The currents canceled out, so the core in that plane would not flip to 1 but would remain 0. Thus, the inhibit line inhibited the core from flipping to 1. By activating the appropriate inhibit lines, any desired word could be written to the memory.

To summarize, a core memory plane had four wires through each core: X and Y drive lines, a sense line, and an inhibit line. These planes were stacked to form an array, one plane for each bit in the word. By energizing an X line and a Y line, one core in each plane was selected. The sense line was used to read the contents of the bit, while the inhibit line was used to write a 0 (by inhibiting the writing of a 1).9

The LVDC core memory module

In this section, I'll explain how the LVDC core memory module was physically constructed. At its center, the core memory module contains the stack of 14 core planes shown earlier. This is surrounded by multiple boards with the circuitry to drive the X and Y select lines and the inhibit lines, read the bits from the sense lines, detect errors, and generate necessary clock signals.11

An exploded view of the memory module showing the key components.
An MIB (Multilayer Interconnection Board) is a 12-layer printed circuit board.
From Saturn V Guidance Computer Progress Report Fig 2-43.

An exploded view of the memory module showing the key components. An MIB (Multilayer Interconnection Board) is a 12-layer printed circuit board. From Saturn V Guidance Computer Progress Report Fig 2-43.

Memory Y driver panel

A word in core memory is selected by driving the appropriate X and Y lines through the core stack. I'll start by describing the Y driver circuitry and how it generates a signal through one of the 64 Y lines. Instead of having 64 separate driver circuits, the module reduces the amount of circuitry by using 8 "high" drivers and 8 "low" drivers. These are wired up in a "matrix" configuration so each combination of a high driver and a low driver selects a different line. Thus, the 8 high drivers and 8 low drivers select one of the 64 (8×8) Y lines. The footnote12 has more information on the matrix technique.

The Y driver board (front) drives the Y select lines in the core stack.

The Y driver board (front) drives the Y select lines in the core stack.

The closeup view below shows some of the ULD modules (white) and transistor pairs (golden) that drive the Y select lines. The "EI" module is the heart of the driver; it supplies a constant voltage pulse (E) or sinks a constant current pulse (I) through a select line.14 A select line is driven by activating an EI module in voltage mode at one end of the line and an EI module in current mode at the other end. The result is a pulse with the correct voltage and current to flip the core. It takes a hefty pulse to flip a core; the voltage pulse is fixed at 17 volts, while the current is adjusted from 180 mA to 260 mA depending on the temperature.13

Closeup of the Y driver board showing six ULD modules and six transistor pairs. Each ULD module is labeled with an IBM part number, the module type (e.g. "EI"), and an unknown code.

Closeup of the Y driver board showing six ULD modules and six transistor pairs. Each ULD module is labeled with an IBM part number, the module type (e.g. "EI"), and an unknown code.

The board also has error-detector (ED) modules that detect if more than one Y select line is driven at the same time. Implementing this with digital logic would require a complicated set of gates to detect if two or more of the 8 inputs are high. Instead, the ED module uses a simple semi-analog design: it sums the input voltages using a resistor network. If the resulting voltage is above a threshold, the output is triggered.

A diode matrix is underneath the driver board, containing 256 diodes and 64 resistors. This matrix converts the 8 high and 8 low pairs of signals from the driver board into connections to the 64 Y lines that pass through the core stack. Flex cables on the top and bottom of the board connect the board to the diode matrix. Two flex cables on the left (not visible in the photo) and two flex cables on the right (one visible) connect the diode matrix to the core stack.15 The flex cable visible on the left connects the Y board to the rest of the computer via the I/O board (described later) while a small flex cable on the lower right connects to the clock board.

Memory X driver panel

The circuitry to drive the X lines is similar to the Y circuitry, except there are 128 X lines compared to 64 Y lines. Because there are twice as many X lines, the module has a second X driver board underneath the one visible below. Although the X and Y boards have the same components, the wiring is different.

This board and the similar one underneath drive the X select lines in the core stack.

This board and the similar one underneath drive the X select lines in the core stack.

The closeup below shows that the board has suffered some component damage. One of the transistors has been dislodged, a ULD module has been broken in half, and the other ULD module is cracked. The wiring is visible inside the broken module as well as one of the tiny silicon dies (on the right). This photo also shows vertical and horizontal circuit board traces on several of the board's 12 layers.

A closeup of the X driver board showing some damaged circuitry.

A closeup of the X driver board showing some damaged circuitry.

Underneath the X driver boards is the X diode matrix, containing 288 diodes and 128 resistors. The X diode matrix uses a different topology than the Y diode board to avoid doubling the number of components.16 Like the Y diode board, this board contains components mounted vertically between two printed circuit boards. This technique is called "cordwood" and allows the components to be packed together closely.

Closeup of X diode matrix showing diodes mounted vertically using cordwood construction between two printed circuit boards. The two X driver boards are above the diode board, separated from it by foam. Note how the circuit boards are packed very closely together.

Closeup of X diode matrix showing diodes mounted vertically using cordwood construction between two printed circuit boards. The two X driver boards are above the diode board, separated from it by foam. Note how the circuit boards are packed very closely together.

Memory sense amplifiers

The photo below shows the sense amplifier board on top of the module.17 It has 7 channels to read 7 bits from the memory stack; an identical board below processes another 7 bits, for 14 bits in total. The job of the sense amplifier is to detect the small signal (20 millivolts) generated by a flipping core, and turn it into a 1-bit output. Each channel consists of a differential amplifier and buffer, followed by a differential transformer and an output latch. At the left, the 28-conductor flex cable connects to the memory stack, feeding the two ends of each sense wire into the amplifier circuitry, starting with an MSA-1 (Memory Sense Amplifier) module. The discrete components are resistors (brown cylinders), capacitors (red), transformers (black), and transistors (golden). The data bits exit the sense amplifier boards through the flex cable on the right.

The sense amplifier board on top of the memory module. This board amplifies the signals from the sense wires to produce the output bits.

The sense amplifier board on top of the memory module. This board amplifies the signals from the sense wires to produce the output bits.

Memory inhibit drivers

The inhibit board is on the underside of the core module and holds the inhibit drivers that are used for writing to memory. There are 14 inhibit lines, one for each plane in the core stack. To write a 0 bit, the corresponding inhibit driver is activated and the current through the inhibit line prevents the core from flipping to a 1. Each line is driven by an ID-1 and ID-2 (Inhibit Driver) module and a pair of transistors. The high-precision 20.8Ω resistors at the top and bottom of the board regulate the inhibit current. The 14-wire flex cable on the right connects the drivers to the 14 inhibit wires in the core stack.

The inhibit board on the bottom of the memory module. This board generates the 14 inhibit signals used during writing.

The inhibit board on the bottom of the memory module. This board generates the 14 inhibit signals used during writing.

Memory clock driver

The clock driver is a pair of boards that generate the timing signals for the memory module. Once the computer starts a memory operation, the various timing signals used by the memory module are generated asynchronously by the module's clock driver. The clock driver boards are on the bottom of the module, between the core stack and the inhibit board so it is hard to see the boards.

The clock driver boards are below the core memory stack but above the inhibit board.

The clock driver boards are below the core memory stack but above the inhibit board.

The photo above looks between the clock driver boards; the inhibit board is on the bottom. The blue components are multi-turn potentiometers, presumably to adjust timings or voltages. Resistors and capacitors are also visible on the boards. The schematic shows several MCD (Memory Clock Driver) modules, but I can't see any modules on the boards. I don't know if that is due to the limited visibility, a change in the circuitry, or another board with these modules.

Memory input-output panel

The final board of the memory module is the input-output panel (below), which distributes signals between the boards of the memory module and the remainder of the LVDC computer. At the bottom, the green 98-pin connector plugs into the LVDC's memory chassis, providing signals and power from the computer. (Much of the connector's plastic is broken, exposing the pins.) The distribution board is linked to this connector by two 49-pin flex cables at the bottom (only the front cable is visible). Other flex cables distribute signals to the X-driver board (left), the Y-driver board (right), the sense amplifier board (top), and inhibit board (underneath). The 20 capacitors on the board filter the power supplied to the memory module.

The input-output board is the interface between the memory module and the rest of the computer. The green connector at the bottom plugs into the computer, and these signals are routed through flat cables to other parts of the memory module. This board also has filter capacitors.

The input-output board is the interface between the memory module and the rest of the computer. The green connector at the bottom plugs into the computer, and these signals are routed through flat cables to other parts of the memory module. This board also has filter capacitors.

Conclusion

The LVDC's core memory module provided compact, reliable storage for the computer. The lower half of the computer (below) was filled by up to 8 core memory modules. This allowed the computer to hold a total of 32 kilowords of 26-bit words, or 16 kilowords in redundant high-reliability "duplex" mode.18

The LVDC held up to eight core memory modules. Photo at US Space & Rocket Center, courtesy of Mark Wells.

The LVDC held up to eight core memory modules. Photo at US Space & Rocket Center, courtesy of Mark Wells.

The core memory module provides an interesting view of a time when 8K of storage required a 5-pound module. While this core memory was technologically advanced for its time, the hybrid ULD modules were rapidly obsoleted by integrated circuits. Core memory as a whole died out in the 1970s with the advent of semiconductor DRAMs.

The contents of core memory are retained when the power is disconnected, so it's likely that the module still holds the software from when the computer was last used, even decades later. It would be interesting to try to recover this data, but the damaged circuitry poses a problem so the contents will probably remain locked inside the memory module for decades more.

I announce my latest blog posts on Twitter, so follow me @kenshirriff for future articles. I also have an RSS feed. For an explanation of core memory, see CuriousMarc's video where we wire up a core plane and demonstrate how it works. I've written before about core memory in the IBM 1401, core memory in the Apollo Guidance Computer, and core memory in the IBM S/360. Thanks to Steve Jurvetson for supplying the core array.

Notes and references

  1. A word size of 26 bits may seem bizarre, but in the 1960s computers hadn't yet standardized on bytes and word sizes that were a power of two. Business computers often used 6-bit characters, while aerospace computers typically used whatever word size provided the necessary accuracy. 

  2. It's interesting to compare the size of the LVDC's core memory to IBM's commercial core memories, which I wrote about here. The 128-kilobyte expansion for the IBM S/360 Model 40 computer required an additional cabinet weighing 610 pounds and measuring 62.5"×26"×60". An LVDC core memory module holds 4K words of 26 bits, equivalent to 13 kilobytes. Doing the math, the LVDC has 1/12 the weight and 1/40 the volume per byte. The core stack itself was very similar between the LVDC and the S/360 machines; the difference in weight and volume comes from the surrounding electronics and packaging. 

  3. For more information on the LVDC, see the Virtual AGC project's LVDC page. Also see the interesting SmarterEveryDay video on the LVDC. Fran Blanche did an extensive investigation into an LVDC circuit board. 

  4. The SLT modules in my photograph are mounted on an SMS card, rather than the expected SLT card. SMS cards were IBM's previous generation of circuit cards and normally used discrete germanium transistors. However, even after the introduction of SLT in 1964, IBM needed to support older computers with SMS cards. To reduce costs, they started building old-style SMS cards that used the more modern SLT modules. The point is that SLT modules were usually packed densely on multiple-layer circuit boards, rather than the low-density SMS card in the photo. 

  5. One question is why did IBM use SLT modules instead of integrated circuits? The main reason was that integrated circuits were still in their infancy, having been invented in 1959. In 1963, SLT modules had cost and performance advantages over integrated circuits. However, SLT modules were viewed outside IBM as backward compared to integrated circuits. One advantage of SLT modules over integrated circuits was that the resistors in SLT were much more accurate than those in integrated circuits. During manufacturing, the thick-film resistors in SLT modules were carefully sand-blasted to remove resistive film until they had the desired resistance. SLT modules were also cheaper than comparable integrated circuits in the 1960s. By 1969, IBM started using integrated circuits, which they called MST (Monolithic Systems Technology). IBM packaged their integrated circuits in SLT-style metal packages, rather than the industry-standard DIP epoxy packages. Chapter 2 of IBM's 360 and Early 370 Systems discusses the history of SLT modules in great detail. 

  6. Curiously, the ULD modules in the core memory did not contain any sealant inside. In contrast, the ULD modules examined by Fran Blanche were filled with pink silicone inside. 

  7. It's interesting to compare the AGC to the LVDC since they took two very different approaches to computer design and manufacture. Both computers had rectangular metal boxes, magnesium-lithium for the LVDC and magnesium for the AGC. Physically, the LVDC was about twice the size (2.2 cubic feet vs 1.1 cubic feet) even though they were both about 70 pounds. The LVDC used 138 Watts and was liquid-cooled, while the AGC used 55 watts and was cooled by conduction. The LVDC used 26-bit words compared to 15 bits in the AGC. One big architectural difference was that the LVDC was a serial computer, operating on one bit at a time, while the AGC operated on all bits in parallel (like most computers). Another important difference was that the LVDC used triple redundancy for reliability, while the AGC had no hardware fault handling. Both computers used a 2.048 MHz clock, but the LVDC was considerably slower because it was serial: 82 µs for an add operation compared to 23.4 µs for the AGC. The LVDC had up to 8 core memory modules, holding 4K words each. The AGC's core memory was only 2K words. However, the AGC also had 36K words of read-only storage in its hardwired core rope modules. (The LVDC did not use core rope.)

    The two computers were constructed in very different ways. The AGC was built from integrated circuits, while the LVDC used hybrid ULD modules. The AGC's logic gates were RTL (resistor-transistor logic) NOR gates, while the LVDC's were slightly more advanced DTL (diode-transistor logic) AND-OR-INVERT gates. While the AGC used two types of ICs (a dual NOR gate and a sense amplifier), the LVDC used many different types of modules.

    The AGC's circuit boards were encapsulated into rectangular modules, while the LVDC's circuit boards plugged into a backplane in a more standard way. The AGC's backplane was wire-wrapped by machine, while the LVDC's backplane was a 14-layer printed circuit board.

    IBM engaged in political battles, attempting to replace MIT's AGC with the LVDC. IBM argued that the AGC wasn't reliable enough compared to the triple-redundant LVDC. According to MIT, however, the AGC could run a guidance program 10 to 20 times faster than the LVDC, use half the memory, and provide more accuracy (by using double precision). MIT argued that the LVDC wasn't powerful enough to replace the AGC. In the end, the AGC survived the "naysayers" and was used on the Apollo spacecraft, while the LVDC had its role in the Saturn V rocket. The "showdown" is described in more detail here.  

  8. The Smithsonian website states that the core plane is approximately 4"×7"×1", but that can't be right since the entire memory module is less than 7" wide. The Study Report page 3-43 says each plane is 5.5"×3.5×0.15", which seems accurate. 

  9. The book Memories That Shaped an Industry discusses the history of core memory at IBM. 

  10. The LVDC has 26-bit words, each word consisting of two 13-bit syllables. Its core memory is described as holding 4K words, where each word is 26 data bits and 2 parity bits. However, the core memory is physically constructed to store 8K syllables (13 data bits and 1 parity bit). Thus, two memory accesses are required to read a complete word. An instruction is one 13-bit syllable so an instruction can be read in a single memory cycle. Thus, executing a typical instruction requires three memory accesses: one for the instruction and two for the data. (Keep in mind that reading from core memory erases the data, so a memory access consists of a read followed by a write to restore the data.) 

  11. Much of the memory-related circuitry is in the LVDC's computer logic, not the memory module itself. In particular, the computer's logic contains registers to hold the address and data word and convert between serial and parallel. It also contains circuitry to decode the address into drive lines, as well as to generate and check parity. 

  12. Core memories typically used a "matrix" approach to reduce the number of circuits required to drive the X and Y select lines. The diagram below demonstrates this technique for the vertical lines in a hypothetical 9×5 core array. There are three "high" drivers (A, B and C), and three "low" drivers (1, 2 and 3). If driver B is energized positive and driver 1 is energized negative, current flows through the core line highlighted in red. By selecting a different pair of drivers, a different line is energized. In a large array, this approach significantly reduces the number of line drivers required.

    The "matrix" approach reduces the number of line drivers required.

    The "matrix" approach reduces the number of line drivers required.

    When using a matrix approach, each line must have diodes to prevent "sneak paths" through the cores. To see the need for diodes, note that in the example above current could flow from B to 2, up to A and finally down to 1, for instance, incorrectly energizing multiple lines and flipping the wrong cores. By putting diodes on each line, reverse current paths such as 2 to A can be blocked. Also note that writing core memory requires current pulses in the opposite direction from reading. Supporting this requires additional diodes in the opposite direction. 

  13. Because the characteristics of ferrite cores change with temperature, the memory module adjusts the current based on temperature, from 260 mA at 10 °C to 180 mA at 70 °C. A sensor in the stack detects the temperature, causing a TCV regulator (Temperature Controlled Voltage) to generate a voltage ranging from 6 V at 10 °C to 4 V at 70 °C. The TCV control voltage is fed into each EI module, causing the current to drop 1.33 mA per °C.  

  14. It's unclear why the driver boards use EI modules as well as ID-2 (Inhibit Driver) modules, since a separate board implements the inhibit drivers. The earlier schematics show just the EI modules. (See Laboratory Maintenance Instructions for LVDC Vol. II (1965) page 10-164 for the schematics.) The inhibit driver is similar to the current sink in the EI driver, so I suspect the ID-2 module is being used to boost the current.  

  15. For reference, this footnote provides details of the Y driver signal routing. There are 8 high drive signals and 8 low drive signals generating the 64 Y select lines through the core stack. However, the current through the select line needs to go both ways, so cores can be flipped both directions. Thus, the drive signals are in pairs, one from the "E" side (voltage source) of the EI chip and one from the "I" side (current sink). These 32 signals go from the driver board to the diode matrix through two 16-wire flat cables. The diode board is connected to 64 Y select lines, but each line has two ends. These 128 connections are through four 32-wire flat cables, two on the left and two on the right. The two cables connected to the front side of the diode matrix wrap around to the far side of the stack, while the two cables connected to the back side of the diode matrix go to the near side of the stack. Thus, alternating select lines go through the stack in opposite directions. 

  16. The X and Y diode matrices use a different wiring topology. There are 64 Y lines through the core stack. They are matrixed with 8 drivers at one end and 8 at the other end. The Y board has a diode pair (electrically) at each end of the 64 Y lines, so it has 256 diodes and 128 wires to the Y lines. (Because a line needs to be driven in either direction, one diode is required in each direction, making a pair at each end.)

    On the other hand, there are 128 X lines through the core stack, matrixed with 16 drivers at one end and 8 at the other end. To avoid doubling the number of diodes used, the X board only has a diode pair at one end of each of the 128 X lines. At the other end, groups of 8 X lines are tied together directly, forming 16 groups with one diode pair is used for each group. Thus, there are 256 diodes in the matrix, as well as 32 diodes associated with the 16 groups. As far as wires between the diode matrix and the core stack, there are 128 wires for the diode-connected end, and 32 wires corresponding to the grouped end. See Figures 10-42 and 10-43 in the Laboratory Maintenance Instructions for LVDC Vol. II (1965) for schematics.

    The X driver board is connected to other boards and the core stack through multiple flex cables. The cable on the right links the driver board to the rest of the computer via the I/O board. The top edge of the board has a 24-wire flex cable to the diode matrix, with a second 24-wire cable at the bottom. At the bottom, another smaller flex cable receives signals from the timing board underneath the core stack. The flex cables between the diode matrices and the core stack are not visible: there is a 16-wire cable and a 64-wire cable to the stack at the top and similar cables at the bottom.

    There is an important difference between the X and Y wiring. The four flat cables between the X diode matrix and the core planes went vertically, from the top and bottom of the matrix. The flat cables from the Y diode matrix went horizontally, from the sides of the matrix. In this way, the X and Y cables were attached to orthogonal sides of the core planes, connecting to the orthogonal X and Y wires. 

  17. A special handle was produced to insert, remove, or carry the memory module. Because the memory modules were delicate and mounted with little clearance, this tool was developed to manipulate the module safely. This handle slides over the four shoulder screws on top of the module and latches into place.

    The special carrying handle for the memory module. From Laboratory Maintenance Instructions for LVDC Vol. II page 4-5.

    The special carrying handle for the memory module. From Laboratory Maintenance Instructions for LVDC Vol. II page 4-5.

     

  18. One interesting feature of the LVDC was that memory modules could be mirrored for reliability. In "duplex" mode, each word was stored in two memory modules. If one module had an error, the correct word could be retrieved from the other module. While this provided reliability, it cut the memory capacity in half. Alternatively, memory modules could be used in "simplex" mode, with each word stored once.

    Note that the LVDC's circuitry was triply-redundant to detect and correct errors. However, memory only needed to be doubly redundant because parity indicated which value was incorrect. The LVDC used odd parity. Odd parity had the advantage that parity would catch a word that was stuck all 0's or all 1's. One interesting feature of the simplex and duplex memory modes is that the software could switch between them while running, even setting separate modes for instructions and data. This allowed some words to be stored in simplex mode while more important words were stored in duplex mode. However, it appears that in actual use, the entire memory would be duplexed rather than specific parts. 

A computer built from NOR gates: inside the Apollo Guidance Computer

We recently restored an Apollo Guidance Computer1, the computer that provided guidance, navigation, and control onboard the Apollo flights to the Moon. This historic computer was one of the first to use integrated circuits and its CPU was built entirely from NOR gates.2 In this blog post, I describe the architecture and circuitry of the CPU.

Architecture of the Apollo Guidance Computer

The Apollo Guidance Computer with the two trays separated. The tray on the left holds the logic circuitry built from NOR gates. The tray on the right holds memory and supporting circuitry.

The Apollo Guidance Computer with the two trays separated. The tray on the left holds the logic circuitry built from NOR gates. The tray on the right holds memory and supporting circuitry.

The Apollo Guidance Computer was developed in the 1960s for the Apollo missions to the Moon. In an era when most computers ranged from refrigerator-sized to room-sized, the Apollo Guidance Computer was unusual—small enough to fit onboard the Apollo spacecraft, weighing 70 pounds and under a cubic foot in size.

The AGC is a 15-bit computer. It may seem bizarre to have a word size that isn't a power of two, but in the 1960s before bytes became popular, computers used a wide variety of word sizes. In the case of the AGC, 15 bits provided sufficient accuracy to land on the moon (using double- and triple-precision values as needed), so 16 bits would have increased the size and weight of the computer unnecessarily.4

The Apollo Guidance Computer has a fairly basic architecture, even by 1960s standards. Although it was built in the era of complex, powerful mainframes, the Apollo Guidance Computer had limited performance; it is more similar to an early microprocessor in power and architecture.3 The AGC's strengths were its compact size and extensive real-time I/O capability. (I'll discuss I/O in another article.)5

The architecture diagram below shows the main components of the AGC. The parts I'll focus on are highlighted. The AGC has a small set of registers, along with a simple arithmetic unit that only does addition. It has just 36K words of ROM (fixed memory) and 2K words of RAM (erasable memory). The "write bus" was the main communication path between the components. Instruction decoding and the sequence generator produced the control pulses that directed the AGC.

Block diagram of the Apollo Guidance Computer. From Space Navigation Guidance and Control, R-500, VI-14.

Block diagram of the Apollo Guidance Computer. From Space Navigation Guidance and Control, R-500, VI-14.

About half of the architecture diagram is taken up by memory, reflecting that in many ways the architecture of the Apollo Guidance Computer was designed around its memory. Like most computers of the 1960s, the AGC used core memory, storing each bit in a tiny ferrite ring (core) threaded onto a grid of wires. (Because a separate physical core was required for every bit, core memory capacity was drastically smaller than modern semiconductor memory.) A property of core memory was that reading a word from memory erased that word, so a value had to be written back to memory after each access. The AGC also had fixed (ROM), the famous core ropes used for program storage where bits were physically woven into the wiring pattern (below). (I've written about the AGC's core memory and core rope memory in detail.)

Detail of core rope memory wiring from an early (Block I) Apollo Guidance Computer. Photo from Raytheon.

Detail of core rope memory wiring from an early (Block I) Apollo Guidance Computer. Photo from Raytheon.

NOR gates

The Apollo Guidance Computer was one of the very first computers to use integrated circuits. These early ICs were very limited; the AGC's chips (below)2 contained just six transistors and eight resistors, implementing two 3-input NOR gates.

Die photo of the dual 3-input NOR gate used in the AGC. The ten bond wires around the outside of the die connect to the IC's external pins. Photo by Lisa Young, Smithsonian.

Die photo of the dual 3-input NOR gate used in the AGC. The ten bond wires around the outside of the die connect to the IC's external pins. Photo by Lisa Young, Smithsonian.

The symbol for a NOR gate is shown below. It is a very simple logic gate: if all inputs are low, the output is high. It might be surprising that NOR gates are sufficient to build a computer, but NOR is a universal gate: you can make any other logic gate out of NOR gates. For instance, wiring the inputs of a NOR gate together forms an inverter. Putting an inverter on the output of a NOR gate produces an OR gate. Putting inverters on the inputs of a NOR gate produces an AND gate.6 More complex circuits, such as flip flops, adders, and counters can be built from these gates.

The NOR gate generates a 1 output if all inputs are 0. If any input is a 1 (or multiple inputs), the NOR gate generates a 0 output.

The NOR gate generates a 1 output if all inputs are 0. If any input is a 1 (or multiple inputs), the NOR gate generates a 0 output.

One building block that appears frequently in the AGC is the set-reset latch. This simple circuit is built from two NOR gates and stores one bit of data: the set input stores a 1 bit and the reset input stores a 0 bit. In more detail, a 1 pulse on the set input turns the top NOR gate off and the bottom one on, so the output is a 1. A 1 pulse on the reset input does the opposite so the output is a 0. If both inputs are 0, the latch remembers its previous state, providing storage. The next section will show how the latch circuit is used to build registers.

A set-reset latch built from two NOR gates. If one NOR gate is on, it forces the other one off. The overbar on the top output indicates that it is the complement of the lower output.

A set-reset latch built from two NOR gates. If one NOR gate is on, it forces the other one off. The overbar on the top output indicates that it is the complement of the lower output.

The registers

The Apollo Guidance Computer has a small set of registers to store values temporarily outside of core memory. The main register is the accumulator (A), which is used in many arithmetic operations. The AGC also has a program counter register (Z), arithmetic unit registers (X and Y), a buffer register (B), return address register (Q)7, and a few others. For memory accesses, the AGC has a memory address register (S) and a memory buffer register (G) for data. The AGC also has some registers that reside in core memory, such as I/O counters.

The following diagram outlines the register circuitry for the AGC, simplified to a single bit and two registers (Q and Z). Each register bit has a latch (flip-flop), using the circuit described earlier (blue and purple). Data is transmitted both to and from the registers on the write bus (red). To write to a register, the latch is first reset by a clear signal (CQG or CZG, green). A "write service" gate signal (WQG or WZG, orange) then allows the data on the write bus to set the corresponding register latch. To read a register, a "read service" gate signal (RQG or RZG, cyan) passes the latch's output through the write amplifier to the write bus, for use by other parts of the AGC. The complete register circuitry is more complex, with multiple 16-bit registers, but follows this basic structure.

Simplified diagram of AGC register structure, showing one bit of the Q and Z registers. (Source)

Simplified diagram of AGC register structure, showing one bit of the Q and Z registers. (Source)

The register diagram illustrates three key points. First, the register circuitry is built from NOR gates. Second, data movement through the AGC centers on the write bus. Finally, the register actions (like other AGC actions) depend on specific control signals at the right time; the "control" section of this post will discuss how these signals are generated.

The arithmetic unit

Most computers have an arithmetic logic unit (ALU) that performs arithmetic and Boolean logic operations. Compared to most computers, the AGC's arithmetic unit is very limited: the only operation it performs is addition of 16-bit values, so it's called an arithmetic unit, not an arithmetic logic unit. (Despite its limited arithmetic unit, the AGC can perform a variety of arithmetic and logic operations including multiplication and division, as explained in the footnote.9)

The schematic below shows one bit of the AGC's arithmetic unit. The full adder (red) computes the sum of two bits and a carry. In particular, the adder sums the X bit, Y bit, and carry-in, generating the sum bit (sent to the write bus) and carry bit. The carry is passed to the next adder, allowing adders to be combined to add longer words.8)

Schematic of one bit in the AGC's arithmetic unit. (Based on AGC handbook p214.)

Schematic of one bit in the AGC's arithmetic unit. (Based on AGC handbook p214.)

The X register and Y register (purple and green) provide the two inputs to the adder. These are implemented with the NOR-gate latch circuits described earlier. The circuitry in blue writes a value to the X or Y register as specified by the control signals. This circuitry is fairly complex since it allows constants and shifted values to be stored in the registers, but I won't go into the details. Note the "A2X" control signal that gates the A register value into the X register; it will be important in the following discussion.

The photo below shows the physical implementation of the AGC's circuitry. This module implements four bits of the registers and arithmetic unit. The flat-pack ICs are the black rectangles; each module has two boards with 60 chips each, for a total of 240 NOR gates. The arithmetic unit and registers are built from four identical modules, each handling four bits; this is similar to a bit-slice processor.

The arithmetic unit and registers are implemented in four identical modules. Each module implements 4 bits. The modules are installed in slots A8 through A11 of the AGC.

The arithmetic unit and registers are implemented in four identical modules. Each module implements 4 bits. The modules are installed in slots A8 through A11 of the AGC.

Executing an instruction

This section illustrates the sequence of operations that the AGC performs to execute an instruction. In particular, I'll show how an addition instruction, ADS (add to storage), takes place. This instruction reads a value from memory, adds it to the accumulator (A register), and stores the sum in both the accumulator and memory. This is a single machine instruction, but the AGC performs many steps and many values move back and forth to accomplish it.

Instruction timing is driven by the core memory subsystem. In particular, reading a value from core memory erases the stored value, so a value must be written back after each read. Also, when accessing core memory there is a delay between when the address is set up and when the data is available. The result is that each memory cycle takes 12 time steps to perform first a read and then a write. Each time interval (T1 to T12) takes just under one microsecond, and the full memory cycle takes 11.7µs, called a Memory Cycle Time (MCT).

The erasable core memory module from the Apollo Guidance Computer. This module holds 2 kilowords of memory, with a tiny ferrite core storing each bit. To read memory, high-current pulses flip the magnetization of the cores, erasing the word.

The erasable core memory module from the Apollo Guidance Computer. This module holds 2 kilowords of memory, with a tiny ferrite core storing each bit. To read memory, high-current pulses flip the magnetization of the cores, erasing the word.

The MCT is the basic time unit for instruction execution. A typical instruction requires two memory cycles: one memory access to fetch the instruction from memory, and one memory access to perform the operation.13 Thus, a typical instruction requires two MCTs (23.4µs), yielding about 43,000 instructions per second. (This is extremely slow compared to modern processors performing billions of instructions per second.)

Internally, the Apollo Guidance Computer processes instructions by breaking an instruction into subinstructions, where each subinstruction takes one memory cycle For example, the ADS instruction consists of two subinstructions: the ADS0 subinstruction (which does the addition) and the STD2 subinstruction (which fetches the next instruction, and is common to most instructions). The diagram below shows the data movement inside the AGC to execute the ADS0 subinstruction. The 12 times steps are indicated left to right.

Operations during the ADS0 (add to storage) subinstruction. Arrows show important data movement. Based on the manual.

Operations during the ADS0 (add to storage) subinstruction. Arrows show important data movement. Based on the manual.

The important steps are:
T1: The operand address is copied from the instruction register (B) to the memory address register (S) to start a memory read.
T4: The operand is read from core memory to the memory data register (G).
T5: The operand is copied from (G) to the adder (Y). The accumulator value (A) is copied to the adder (X).
T6: The adder computes the sum (U), which is copied to the memory data register (G).
T8: The program counter (Z) is copied to the memory address register (S) to prepare for fetching the next instruction from core memory.
T10: The sum in the memory data register (G) is written back to core memory.
T11: The sum (U) is copied to the accumulator (A).

Even though this is a simple add instruction, many values are moved around during the 12 time intervals. Each of these actions has a control signal associated with it; for instance, the signal A2X at time T5 causes the accumulator (A) value to be copied to the X register. Copying the G register to the Y register takes two control pulses: RG (read G) and WY (write Y). The next section will explain how the AGC's control unit generates the appropriate control signals for each instruction, focusing on these A2X, RG, and WY control pulses needed by ADS0 at time T5.

The control unit

As in most computers, the AGC's control unit decodes each instruction and generates the control signals that tell the rest of the processor (the datapath) what to do. The AGC uses a hardwired control unit built from NOR gates to generate the control signals. The AGC does not use microcode; there are no microinstructions and the AGC does not have a control store (which would have taken too much physical space).12

The heart of the AGC's control unit is called the crosspoint generator. Conceptually, the crosspoint generator takes the subinstruction and the time step, and generates the control signals for that combination of subinstruction and time step. (You can think of the crosspoint generator as a grid with subinstructions in one direction and time steps in the other, with control signals assigned to each point where the lines cross.) For instance, going back to the ADS0 subinstruction, at time T5 the crosspoint generator would generate the A2X, RG, and WY control pulses, causing the desired data movement.

The crosspoint generator required a lot of circuitry and was split across three modules; this is module A6. Note the added wires to modify the circuitry. This is an earlier module used for ground testing; modules in flight did not have these wires.

The crosspoint generator required a lot of circuitry and was split across three modules; this is module A6. Note the added wires to modify the circuitry. This is an earlier module used for ground testing; modules in flight did not have these wires.

For efficiency, the implementation of the control unit is highly optimized. Instructions with similar behavior are combined and processed together by the crosspoint generator to reduce circuitry. For instance, the AGC has a "Double-precision Add to Storage" instruction (DAS). Since this is roughly similar to performing two single-word adds, the DAS1 subinstruction and ADS0 subinstruction share logic in the crosspoint generator. The schematic below shows the crosspoint generator circuitry for time T5, highlighting the logic for subinstruction ADS0 (using the DAS1 signal). For instance, the 5K signal is generated from the combination of DAS1 and T5.

Crosspoint circuit for signals generated at time T5. With negative inputs, these NOR gates act as AND gates, detecting a particular subinstruction AND T05. From Apollo Lunar Excursion Manual.

Crosspoint circuit for signals generated at time T5. With negative inputs, these NOR gates act as AND gates, detecting a particular subinstruction AND T05. From Apollo Lunar Excursion Manual.

But what are the 5K and 5L signals? These are another optimization. Many control pulses often occur together, so instead of generating all the control pulses directly, the crosspoint generates intermediate crosspoint signals. For instance, 5K generates both the A2X and RG control pulses, while 5L generates the WY control pulse. The diagram below shows how the A2X signal is generated: any of 8 different signals (including 5K) generate A2X.15 Similar circuits generate the other control pulses. These optimizations reduced the size of the crosspoint generator, but it was still large, split across three modules in the AGC.

The A2X control signal is generated from multiple "crosspoint pulses" from the crosspoint generator. The different possibilities are ORed together. From manual, page 4-351.

The A2X control signal is generated from multiple "crosspoint pulses" from the crosspoint generator. The different possibilities are ORed together. From manual, page 4-351.

To summarize, the control unit is responsible for telling the rest of the CPU what to do in order to execute an instruction. Instructions are first decoded into subinstructions. The crosspoint generator creates the proper control pulses for each time interval and subinstruction, telling the AGC's registers, arithmetic unit, and memory what to do.14

Conclusion

This has been a whirlwind tour of the Apollo Guidance Computer's CPU. To keep it manageable, I've focused on the ADS addition instruction and a few of the control pulses (A2X, RG, and WY) that make it operate. Hopefully, this gives you an idea of how a computer can be built from components as primitive as NOR gates.

The most visible part of the architecture is the datapath: arithmetic unit, registers, and the data bus. The AGC's registers are built from simple NOR-gate latches. Even though the AGC's arithmetic unit can only do addition, the computer still manages to perform a full set of operations including multiplication and division and Boolean operations.9

However, the datapath is just part of the computer. The other critical component is the control unit, which tells the data path components what to do. The AGC uses an approach centered around a crosspoint generator, which uses highly-optimized hardwired logic to generate the right control pulses for a particular subinstruction and time interval.

Using these pieces, the Apollo Guidance Computer provided guidance, navigation, and control onboard the Apollo missions, making the Moon landings possible. The AGC also provided a huge boost to the early integrated circuit industry, using 60% of the United States' IC production in 1963. Thus, modern computers owe a lot to the AGC and its simple NOR gate components.

The Apollo Guidance Computer running in Marc's lab, hooked up to a vintage Tektronix scope.

The Apollo Guidance Computer running in Marc's lab, hooked up to a vintage Tektronix scope.

CuriousMarc has a series of AGC videos which you should watch for more information on the restoration project. I announce my latest blog posts on Twitter, so follow me @kenshirriff for future articles. I also have an RSS feed. Thanks to Mike Stewart for supplying images and extensive information.

Notes and references

  1. The AGC restoration team consists of Mike Stewart (creator of FPGA AGC), Carl Claunch, Marc Verdiell (CuriousMarc on YouTube) and myself. The AGC that we're restoring belongs to a private owner who picked it up at a scrapyard in the 1970s after NASA scrapped it. 

  2. In addition to the NOR-gate logic chips, the AGC used a second type of integrated circuit for its memory circuitry, a sense amplifier. (The earlier Block I Apollo Guidance Computer used NOR gate ICs that contained a single NOR gate.) 

  3. How does the AGC stack up to early microprocessors? Architecturally, I'd say it was more advanced than early 8-bit processors like the 6502 (1975) or Z-80 (1976), since the AGC had 15 bits instead of 8, as well as more advanced instructions such as multiplication and division. But I consider the AGC less advanced than the 16-bit Intel 8086 (1978) which has a larger register set, advanced indexing, and instruction queue. Note, though, that the AGC was in a class of its own as far as I/O, with 227 interface circuits connected to the rest of the spacecraft.

    Looking at transistor counts, the Apollo Guidance Computer had about 17,000 transistors in total in its ICs, which puts it between the Z80 microprocessor (8,500 transistors) and the Intel 8086 (29,000 transistors).

    As far as performance, the AGC did a 15-bit add in 23.4μs and a multiply in 46.8μs. The 6502 took about 3.9μs for an 8-bit add (much faster, but a smaller word). Implementing an 8-bit multiply loop on the 6502 might take over 100μs, considerably worse than the AGC. The AGC's processor cycle speed of 1.024 MHz was almost exactly the same as the Apple II's 1.023 MHz clock, but the AGC took 24 cycles for a typical instruction, compared to 4 on the 6502. The big limitation on AGC performance was the 11.7μs memory cycle time, compared to 300 ns for the Apple II's 4116 DRAM chips.  

  4. An AGC instruction fit into a 15-bit word and consisted of a 3-bit opcode and a 12-bit memory address. Unfortunately, both the opcode and memory address were too small, resulting in multiple workarounds that make the architecture kind of clunky.

    The AGC's 15-bit instructions included a 12-bit memory address which could only address 4K words. This was inconvenient since the AGC had 2K words of core RAM and 36K words of core rope ROM. To access this memory with a 12-bit address, the AGC used a complex bank-switching scheme with multiple bank registers. In other words, you could only access RAM in 256-word chunks and ROM in somewhat larger chunks.

    The AGC's instructions had a 3-bit opcode field, which was too small to directly specify the AGC's 34 instructions. The AGC used several tricks to specify more opcodes. First, an EXTEND instruction changed the meaning of the following instruction, allowing twice as many opcodes but wasting a word. Also, some AGC opcodes didn't make sense if performed on a ROM address (such as incrementing), so four different instructions ("quartercode instructions") could share an opcode field. Instructions that act on peripherals only use 9 address bits, freeing up 3 additional bits for opcode use. This allows, for instance, Boolean operations (AND, OR, XOR) to fit into the opcode space, but they can only access peripheral addresses, not main memory addresses.

    The AGC also used some techniques to keep the opcode count small. For example, it had some "magic" memory locations such as the "shift right register". Writing to this address performed a shift; this avoided a separate opcode for "shift right".

    The AGC also had some instructions that wedged multiple functions into a single instruction. For instance, the "Transfer to Storage" instruction not only transferred a value to storage, but also checked the overflow flag and updated the accumulator and skipped an instruction if there had been an arithmetic overflow. Another complex instruction was "Count, Compare, and Skip", which loaded a value from memory, decremented it, and did a four-way branch depending on its value. See AGC instruction set for details. 

  5. For more on the AGC's architecture, see the Virtual AGC and the Ultimate Apollo Guidance Computer Talk

  6. The NAND gate also has the same property of being a universal gate. (In modern circuits, NAND gates are usually more popular than NOR gates for technical reasons.) The popular NAND to Tetris course describes how to build up a computer from NAND gates, ending with an implementation of Tetris. This approach starts by building a set of logic gates (NOT, AND, OR, XOR, multiplexer, demultiplexer) from NAND gates. Then larger building blocks (flip flop, adder, incrementer, ALU, register) are built from these gates, and finally a computer is built from these building blocks. 

  7. Modern computers usually have a stack that is used for subroutine calling and returning. However, the AGC (like many other computers of its era) didn't have a stack, but stored the return address in a link register (the AGC's Q register). To use recursion, a programmer would need to implement their own stack. 

  8. A carry-skip circuit improves the performance of the adder. The problem with binary addition is that propagating a carry through all the bits is slow. For example, if you add 111111111111111 + 1, the carry from the low-order bit gets added to the next bit. This generates a carry which propagates to the next bit, and so forth. This "ripple carry" causes the addition to be essentially one bit at a time. To avoid this problem, the AGC uses a carry-skip circuit that looks at groups of four bits. If there is a carry in, and each position has at least one bit set, there is certain to be a carry, so a carry-out is generated immediately. Thus, propagating a carry is approximately three times as fast. (With groups of four bits, you'd expect four times as fast, but the carry-skip circuit has its own overhead.) 

  9. You might wonder how the AGC performs a variety of arithmetic and logic operations if the arithmetic unit only supports addition. Subtraction is performed by complementing one value (i.e. flipping the bits) and then adding. Most computers have a complement circuit built into the ALU, but the AGC is different: when the B register is read, it can provide either the value or the complement of the stored value.10 So to subtract a value, the value is stored in the B register and then the complement is read out and added.

    What about Boolean functions? While most computers implement Boolean functions with logic circuitry in the ALU, the Apollo Guidance Computer manages to implement them without extra hardware. The OR operation is implemented through a trick of the register circuitry. By gating two registers onto the write bus at the same time, a 1 from either register will set the bus high, yielding the OR of the two values. AND is performed using the formula A ∧ H = ~(~A ∨ ~H); complementing both arguments, doing an OR, and then complementing the result yields the AND operation. XOR is computed using the formula A ⊕ H = ~(A ∨ ~H) ∨ ~(H ∨ ~A), which uses only complements and ORs. It may seem inefficient to perform so many complement and OR operations, but since the instruction has to take 12 time intervals in any case (due to memory timing), slow down the instruction.

    Multiplication is performed by repeated additions, subtractions, and shifts using a Radix-4 Booth algorithm that operates two bits at a time. Division is performed by repeated subtractions and shifts.11 Since multiply and divide require multiple steps internally, they are slower than other arithmetic instructions. 

  10. Since a latch has outputs for both a bit and the complement of the bit, it is straightforward to get the complemented value out of a latch. Look near the bottom of the schematic to see the B register's circuitry that provides the complemented value. 

  11. The AGC's division algorithm is a bit unusual. Instead of subtracting the divisor at each step, a negative dividend / remainder is used through the division and the divisor is added. (This is essentially the same as subtracting the divisor, except everything is complemented.) See Block II Machine Instructions section 32-158 for details. 

  12. The AGC doesn't use microcode but confusingly some sources say it was microprogrammed. The book "Journey to the Moon" by Eldon Hall (creator of the AGC) says:

    The instruction selection logic and control matrix was a microprogrammed instruction sequence generator, equivalent to a read-only memory implemented in logic. Outputs of the microprogrammed memory were a sequence of control pulses that were logic products of timing pulses, tests of priority activity, instruction code, and memory address.

    This doesn't make sense, since the whole point of microprogramming is to use read-only memory instead of hardwired control logic. (See A brief history of microprogramming, Computer architecture: A quantitative approach section 5.4, or Microprogramming: principles and practices.) Perhaps Hall means that the AGC's control was "inspired" by microprogramming, using a clearly-stated set of sequenced control signals with control hardware separated from the data path (like most modern computers, hardwired or microcoded). (In contrast, in many 1950s computers (like the IBM 1401) each instruction's circuitry generated its own ad hoc control signals.)

    By the way, implementing the AGC in microcode would have required about 8 kilobytes of microcode (79 control pulses for about 70 subinstructions with 12 time periods. This would have been impractical for the AGC, especially when you consider that microcode storage needs to be faster than regular storage.  

  13. While instructions typically used two subinstructions, there were exceptions. Some instructions, such as multiply and divide, required multiple subinstructions because they took many steps. On the other hand, the jump instruction (TC) used a single subinstruction since fetching the next instruction was the only task to do. 

  14. Other processors use different approaches to generate control signals. The 6502 and many other early microprocessors decoded instructions with a Programmable Logic Array (PLA), a ROM-like way of implementing AND-OR logic. The Z-80 used a PLA, followed by logic very similar to the crosspoint generator to generate the right signals for each time step. Many computers use microcode, storing the sequence of control steps explicitly in ROM. Since minimizing the number of chips in the AGC was critical, optimizing the circuitry was more important than using a clean, structured approach.

    Die photo of the 6502 microprocessor. The 6502 used a PLA and random logic for the control logic, which occupies over half the chip. Note the regular, grid-like structure of the PLA. Die photo courtesy of Visual 6502.

    Die photo of the 6502 microprocessor. The 6502 used a PLA and random logic for the control logic, which occupies over half the chip. Note the regular, grid-like structure of the PLA. Die photo courtesy of Visual 6502.

     

  15. Each subinstruction's actions at each time interval are described in the manual. The control pulses are described in detail in the manual. (The full set of control pulses for ADS0 are listed here.)