Strange chip: Teardown of a vintage IBM token ring controller

IBM used some unusual techniques in its integrated circuits, and one of the most visible is packaging them in square metal cans. I've been studying these chips recently, since there's not a lot of information about them. I opened up the large metal chip—1.5" on a side—from the token ring network board below. This chip turned out to be stranger and more interesting than I expected, combining analog circuitry, a custom microprocessor, and complex logic. The internal packaging was also unconventional: instead of the bond wires used by most manufacturers to connect the silicon die, IBM used a "flip-chip" technique, soldering the die upside down onto a ceramic substrate. Instead of pads, the chip had solder balls across its surface, giving it an unexpected layout and appearance. In the blog post, I discuss this chip in detail.

The IBM 4/16 ISA token ring board. Click this photo (or any other) for a larger version.

The IBM 4/16 ISA token ring board. Click this photo (or any other) for a larger version.

The token ring network was introduced by IBM in 1985,1 a local-area network technology that competed with Ethernet and other network systems. In a token ring network, the computers are wired in a ring, with each computer receiving packets from the previous computer and transmitting them to the next computer in the loop. To give a computer access to the network, a special three-byte token circulates in the ring. When a computer receives the token, it can transmit a network packet to the next computer in the ring. The packet travels around the ring until it comes back to the original computer. That computer discards the packet and sends out the token in its place, giving another computer a chance to transmit data. In comparison, an Ethernet network lets computers transmit at any time; if two transmit at the same time, the collision is detected and they try again a bit later. A token ring network had the advantage of avoiding collision, making it more deterministic and fair and providing better performance on a congested network.

IBM's use of square metal cans goes back to the early 1960s with IBM's SLT modules (Solid Logic Technology). Because IBM didn't think integrated circuits were mature enough at the time, they used small hybrid modules with a few transistors, diodes, and resistors mounted on a ceramic substrate. These half-inch-square SLT modules were packaged in an aluminum can for protection, giving IBM circuit boards a unique appearance. In the late 1960s, IBM moved to integrated circuits2 but they kept the ½" metal cans instead of the rectangular ceramic or epoxy packages used by other manufacturers. As integrated circuits required more pins, IBM increased the package size, leading to the bulky 1.5" package that I examined.

To examine the integrated circuit, I removed it from the board with a hot air gun. In the photo below, you can see the grid of pins underneath the chip. The chip is labeled with the part number is 50G6144. The "ESD" suffix indicates an electrostatic-sensitive device that can be damaged by static electricity and requires special handling. The next line, IBM 9352PQ, is a code for the manufacturing site. The final line, 194390074M, shows that the chip was manufactured in 1994 during the 39th week of the year.

The integrated circuit is packaged in a square aluminum can, 1.5" on a side.

The integrated circuit is packaged in a square aluminum can, 1.5" on a side.

Cutting off the aluminum lid reveals the silicon die inside. The chip is mounted upside down as a flip chip, soldered directly to the connections on the ceramic substrate. Thus, you can't see the chip's circuitry, just the underside of the silicon die. IBM called this mounting technology controlled collapse chip connection or C4.3 (In comparison, most manufacturers mounted a silicon die right side up and connected it to the pins with tiny bond wires.) Tiny printed-circuit traces connect the module's 175 pins to the die.

The integrated circuit with the metal lid removed, showing the silicon die on the ceramic substrate.

The integrated circuit with the metal lid removed, showing the silicon die on the ceramic substrate.

I removed the die from the substrate with the hot-air gun and then dissolved the solder balls with a mixture of hydrogen peroxide and vinegar. By taking numerous photos with a metallurgical microscope, I created the die photo below. The black circles on the die are the positions of the solder balls, more irregular than you might expect. They are not around the edge of the die (as with bond pads), but overlap the circuitry. The chip is fairly large, about 9×7.9 mm, with features of about 1µm. Note the horizontal rows of circuitry; these are standard cells, which I will discuss below.

Die photo of the chip. Click this (or any other) image for a larger version.

Die photo of the chip. Click this (or any other) image for a larger version.

The pattern of solder balls is more visible in Antoine Bercovici's photo below. There are rows three-deep of solder balls along the four sides, as well as rows through the middle of the chip and more in the corners. Roughly speaking, the solder balls around the edges are for signals, while the solder balls in the middle distribute power and ground. Note the tangled metal wiring on top of the chip that connects the solder balls to the underlying circuitry.4

Die photo showing the solder balls and upper metal clearly. Courtesy of Antoine Bercovici.

Die photo showing the solder balls and upper metal clearly. Courtesy of Antoine Bercovici.

The photo below shows a closeup of the ceramic substrate that holds the die; compare the pattern to the die above.5 The die was soldered to the rectangular array of contacts in the middle, while the large circles around the edge of the photo are the pins of the chip. Note the dense, complex wiring pattern between the pins and the tiny contacts. The wiring traces are extremely thin (about 30µm), with thicker traces from power and ground. The contacts form a complex pattern. most are in a rectangular array, three deep. However, there are also rows of contacts through the center of the chip, connected alternately to power and ground by the thick traces inside the rectangle, and a few scattered contacts. The contact pattern on the substrate was optimized for the layout of this particular chip. Power distribution was a particular concern.

A closeup of the ceramic substrate showing where the die is mounted.

A closeup of the ceramic substrate showing where the die is mounted.

It's interesting to consider the hierarchy of connections between the coarse 0.1" grid of the chip's pins and the tiny 1µm features on the chip. At the top level, the pin spacing is 0.1" in a 14×14 grid. The solder balls have a spacing of 0.01", so the ceramic substrate reduces the spacing by a factor of 10. The solder balls are connected to the wiring on top of the die, spaced at 0.001", increasing the density by another factor of 10. The top wiring is connected to the underlying wiring on the chip, with a spacing of 0.0001", another factor of 10. Finally, the feature size on the die is about 1µm, another factor of 2.

With this type of packaging, you can visualize the die position by looking at the underside of the IC (below). Because the chip is soldered directly to the substrate, there are no pins where the chip is attached. Thus, the spot with no pins indicates the position of the die.

Underside of the package.

Underside of the package.

Inside the chip

The die photo below shows the chip with most of the metal layers dissolved, making the transistor structure underneath visible. The chip has three main components: a 16-bit microprocessor CPU, an analog front end for the network signals, and 24,000 logic gates for the main functionality. The chip also has some buffer RAM at the left, and I/O drivers in the middle and bottom. (IBM originally implemented the token ring interface with six analog and digital chips. To decrease cost, they put all the functionality onto a single chip, resulting in the combination of analog and digital circuitry.)

The die with major components labeled. The metal layer has been removed to show the circuitry underneath.

The die with major components labeled. The metal layer has been removed to show the circuitry underneath.

The block diagram below shows the complex functionality of the chip. Starting in the upper right, the analog front end circuitry communicates with the ring. The analog front end extracts the clock and data from the network signals. The protocol handler implements the low-level token ring protocol: it decodes data, breaks packets into frames and performs error checking. Network data is moved between on-chip buffers and the external RAM by the shared RAM control. Finally, a custom 16-bit microprocessor implements the data link layer protocols and controls the chip.

Block diagram of the chip, from IBM's paper.

Block diagram of the chip, from IBM's paper.

Standard-cell logic

The chip's logic is implemented with a CMOS standard cell library and consists of about 24,000 gates. The idea of standard-cell logic is that each function (such as a NAND gate or latch) has a standard layout. These cells can then be combined by automated design tools to create the desired logic. (This is in contrast to older methodologies, where the designer would lay out each transistor individually, either on paper or using design software.) Standard cells make chip design much easier, since software can do the circuit synthesis, layout, and routing, However, the design isn't as flexible or optimized as a fully-custom circuit.

The standard cell layout is visible on the chip, with the cells arranged in uniform rows, connected by horizontal and vertical wiring. The diagram below magnifies the die to zoom in on five rows of standard-cell logic, and then a single row, to show how small the cells are on the die.

Zooming in on the die shows rows of standard cell logic. Another zoom shows the details of the logic.

Zooming in on the die shows rows of standard cell logic. Another zoom shows the details of the logic.

The standard cell below implements a 3-input NAND gate, and I'll explain how it is constructed.6 There are 6 PMOS transistors on top and 6 NMOS transistors on the bottom. The transistors are formed from a region of doped silicon at the top and another at the bottom. Vertical lines of polysilicon, a special type of silicon, form the transistor gates. Polysilicon is also used for vertical wiring inside the cell. The chip has three layers of metal: the bottom layer is used for horizontal wiring, the middle layer is used for vertical wiring, and the top layer connects to the solder balls. Horizontal metal wiring connects the transistors inside the cell and connects the cell to other cells. The two thick horizontal metal wires provide power and ground for the cell. The second, vertical metal layer provides vertical wiring across and between cells. This layer also implements the power connections between the solder balls and the horizontal power wiring visible here. The round dots are connections between layers (silicon, polysilicon, or metal). The schematic on the right matches the layout of the cell.

Closeup of a cell that implements a NAND gate.

Closeup of a cell that implements a NAND gate.

In the schematic below, I've removed the redundant transistors and rearranged the layout to make the NAND circuit more clear. If all inputs are 1, the NMOS transistors at the bottom turn on, pulling the output low. If any input is 0, a PMOS transistor turns on, pulling the output high. Thus, the circuit implements a NAND gate.

Schematic of the 3-input NAND gate.

Schematic of the 3-input NAND gate.

To summarize, standard-cell logic provides a convenient, automated way of implementing logic. A small number of standardized cells implement the basic logic functions. These cells are arranged in rows and wired together to create the desired logic. (From the teardown perspective, standard-cell logic is somewhat disappointing, since the high-level structure is not visible; it's just a bunch of uniform cells.)

The logic circuitry includes some static RAM buffers to hold network data. These were custom-implemented (as were the I/O drivers) instead of using standard cells. The photo below shows a block of RAM cells.

One of the RAM buffers on the chip.

One of the RAM buffers on the chip.

Inside the CPU

The chip contains a 16-bit CMOS control microprocessor that was custom-designed by IBM7 and contains about 10,000 gates. This processor handles the network protocol, controls transmit and receive operations, and manages the shared memory. It runs at 5.34 megahertz and performs about 3 MIPS (million instructions per second). The microprocessor runs code from an EPROM on the board. IBM calls this "microcode", but it's unclear if this is microcode in the usual sense or just firmware instructions.

The CPU, with main functional blocks labeled. The metal layer has been removed.

The CPU, with main functional blocks labeled. The metal layer has been removed.

The CPU is built with standard-cell logic (except for the RAM and ALU), but curiously the cell layout is entirely different from the rest of the chip, presumably because it had different designers. The photo below compares the CPU's logic (left) with the other logic (right). The CPU fits 7 rows of logic in the same vertical space that holds 4 rows of the regular logic. On the other hand, the logic on the right appears to be much dense horizontally.

Comparison of the CPU's standard-cell logic (left) with the rest of the chip (right), at the same scale.

Comparison of the CPU's standard-cell logic (left) with the rest of the chip (right), at the same scale.

One design feature of the CPU that's visible on the die is its use of multiple PLAs (programmable logic arrays) for instruction decode and control. (Looking at the photo, I count nine small PLAs and a large PLA in the corner.) A PLA provides a structured and dense way of implementing logic (typically AND-OR logic). More importantly, PLAs also provided flexibility and the ability to easily change the design. In the PLA below, 12 signals enter at the lower left. The matrix above converts these to 11 signals that pass to the right. The second matrix generates 8 outputs. The contents of the PLA are visible as the pattern in the metal layer. Since the PLA could be modified by changing the chip's metal layer, bug fixes could even be done after the silicon had been etched.

One of the many PLAs in the CPU.

One of the many PLAs in the CPU.

The CPU contains memory cells for register storage (which they call a 16×16 cache). This RAM design is different from the RAM design in the logic circuitry.

Memory cells in the CPU.

Memory cells in the CPU.

Analog circuitry

The chip contains a block of analog circuitry implemented in CMOS. This circuitry "performs signal conversion and clock recovery functions as well as detecting and compensating for line impairments". This circuitry includes resistors, capacitors, MOS transistors with special properties, and other components.8 The analog block uses a variety of circuits such as op-amps, switched-capacitor amplifiers, voltage references, peak detectors, a charge pump, voltage-controlled-oscillator, and phase-locked loop.

Die photo showing part of the analog circuitry.

Die photo showing part of the analog circuitry.

One challenge in the design was to minimize "jitter" in the clock signal extracted from the network data. Because each node retransmitted the data, jitter would accumulate as a packet traversed the ring, so each node had to be accurate. They used a variety of techniques to keep noise out of the signal such as providing separate power and ground for the analog circuitry, using differential signals in the circuitry, and keeping logic signals away from the analog circuitry.

The analog circuitry made the chip much more complex to manufacture and test.9 The capacitors and special transistors required special process steps during manufacturing. Manufacturing tolerances were also much tighter since process variations could change the electrical characteristics enough to make the analog circuitry stop performing. Some of the analog circuitry was too sensitive to be tested on the wafer and couldn't be tested until the chip was packaged, making failed chips much more costly. Even so, IBM found it worthwhile to put the analog circuitry on the chip.

Shrinking the chip

IBM originally made the token ring chip in 1988. The chip I examined is a smaller version from 1994. The photo below compares the two chips on a 1 mm grid; the older, larger chip is on the left. Note that both chips have the same microprocessor block (upper left corner) and the same analog block (lower left / upper right corner). The height of the standard cell logic rows is much smaller in the newer chip, probably how they shrunk the logic. The solder balls on the left connect to the underlying circuitry, while the solder balls on the right are routed all over the chip by a third layer of metal.

Comparison of the two chips. Photo courtesy of Antoine Bercovici.

Comparison of the two chips. Photo courtesy of Antoine Bercovici.

The analog section from the old chip was copied to the new chip unchanged, but the connections to solder balls are very different, showing the change in wiring techniques. In the old chip (left), the solder balls are on top of metal pads that are connected to the circuitry. The layout is similar to integrated circuits that use wire bonding and bond pads. In the old chip (right), the solder ball grid is not anchored to the underlying chip architecture, but follows its own constraints. A new layer of metal connects the solder balls to the pads. The pads remain in their atavistic positions, despite being unused in the new chip.

Comparison of the analog section of the old chip and the new chip. The color of the chips is different due to lighting.

Comparison of the analog section of the old chip and the new chip. The color of the chips is different due to lighting.

The token ring board

I'll just say a bit about the token ring board that contains this chip. The board is an ISA card from 1994. The IBM chip dominates the board, but there are also numerous other chips, largely 74F-series TTL. There's also a square (and curiously thick) Lattice chip, probably a GAL (Generic Array Logic). A GAL is a programmable logic chip, combining AND/OR logic with flip-flops. A Signetics chip with an IBM label on top is probably a field-programmable logic array (FPLA). Despite all the complexity of the IBM chip, the board requires a lot of programmable logic and simple logic ICs, mostly to interface to the computer's ISA bus. The board has 64 kilobytes of RAM to store network data, two Toshiba TC55329 32K×9 bit static RAM chips. This RAM is accessible both by the network card and by the host PC. The code for the internal microprocessor is contained in an EPROM chip on the board, an AMD 27C1024 chip holding 128 kilobytes as 16-bit words. The EPROM chip has an adhesive label on it with the IBM part number 73G2042, indicating the microcode version.

The token ring board plugs into a PC's ISA slot.

The token ring board plugs into a PC's ISA slot.

The right side of the board holds the analog circuitry to interface with the network. Five pulse transformers provide electrical isolation between the interface board and the potentially-dangerous voltages of the network. Two bypass relays disconnect the card from the ring when not in use, preserving the ring's connectivity. There are also two transistor arrays along with resistors and capacitors to condition the network signals before passing them to the token ring chip. The card connects to the network via an RJ-45 connector that can be used with unshielded twisted-pair (UTP) cable. It also has a DB-9 connector on the back that can be used with shielded twisted-pair (STP).11

In the 1980s, many different local area networking standards were competing including Ethernet, Token Ring, Datapoint's ARCnet, AppleTalk, Omninet, and Econet. By the early 1990s, Ethernet won due to a combination of factors: much lower cost (about 1/5 the cost of Token Ring), less complexity leading to faster technological improvement (such as 100 Mb/s Ethernet and switched Ethernet), and a wider ecosystem than IBM provided.10 The complexity of the chip reflects the complexity of Token Ring and illustrates that IBM's technological edge in the 1980s was a double-edged sword: although it initially gave Token Ring a large performance advantage, the simpler technology of Ethernet eventually won.12

The IBM logo is in the lower-left corner of the die, along with the mysterious codename "PINEGR SH".

The IBM logo is in the lower-left corner of the die, along with the mysterious codename "PINEGR SH".

Thanks to Antoine Bercovici for die photos and information. Thanks to my Twitter readers for discussion. I announce my latest blog posts on Twitter, so follow me @kenshirriff. I also have an RSS feed.

Notes and references

  1. IBM's token ring network was inspired by ring network research from the 1970s, such as the Cambridge Ring

  2. IBM called their integrated circuits MST, Monolithic System Technology. 

  3. The diagram below illustrates the complex construction of a solder ball on the die. Thin layers of aluminum, chromium, copper, and gold are put on the silicon to obtain the necessary properties, followed by a layer of lead-tin solder, which is reflowed to form the balls. The chromium bonds to the oxide layer, while the copper provides solderability and the gold protects the copper from oxidizing.

    Diagram of a solder pad, from this paper.

    Diagram of a solder pad, from this paper.

     

  4. The metal wiring on the top layer of the chip looks like a mess, but there is some structure behind it. The diagram below shows a small section of this wiring, colored to show the structure. The solder balls are shown in yellow. The red and blue traces transmit power and ground from the solder balls across the chip. These traces connect with the vertical strips of metal wiring that transmit power and ground throughout the chip. The other wiring connects the signal solder balls to the I/O drivers, converging in a narrow band in groups of four. Most of the solder balls are positioned with little regard for the underlying circuitry; the top metal layer provides the "glue" between them and the integrated circuit itself. The result is the peculiar metal pattern visible on top of the chip.

    The colored lines show how the top layer of metal wiring connects the solder balls to the chip.

    The colored lines show how the top layer of metal wiring connects the solder balls to the chip.

    In most integrated circuits, the I/O drivers are around the edges of the chip next to the bond pads. However, in this chip, most of the I/O drivers stretch in a line across the middle of the chip (indicated above). More I/O drivers are at the bottom of the chip next to the CPU, probably connected to it directly.

    The photo below shows three I/O drivers, side by side. The metal layers have been mostly removed to reveal the silicon underneath. These drivers are fairly complex. The top half contains large drive transistors to provide relatively high-current outputs, along with smaller control transistors. The lower half contains reddish serpentine resistors made out of polysilicon. These resistors help protect the sensitive gates of the input transistors from static discharges. For output pins, these resistors are disconnected. The middle resistor, however, is connected to the input transistor near the bottom.

    Die photo of three I/O drivers.

    Die photo of three I/O drivers.

     

  5. The die is flipped over when soldered to the substrate. This needs to be kept in mind when comparing the die and the substrate. For instance, the two extra power connections for the CPU are in the lower right of the die but the lower left of the substrate. (Just a note to avoid potential confusion.) 

  6. I'm not sure which transistors are NMOS and which are PMOS in the gate. I'm assuming the PMOS are on top and it's a NAND gate, but it could be the other way around, in which case it's a NOR gate. 

  7. The processor is described as using IBM's "universal controller (UC) architecture" but there's very little information about this architecture. Wikipedia claims this architecture consisted of UC0 (8-bit), UC.5 (16-bit), and U1 (32-bit), with upwards compatibility. An alt.folklore.computers thread and this page provide a bit more information. 

  8. The analog circuitry contains small loops of various sizes that I was unable to identify. They are only connected on one end and have nothing underneath, so they don't seem to be inductors. Twitter readers suggested probe points, disconnected circuitry, or reflective delay lines, but their function remains unclear.

    Three of the loops on the die.

    Three of the loops on the die.

     

  9. The designers were very proud of the testability of the chip, writing a paper about the testing methodology, and a second paper about testing the analog circuitry. The chip includes a boundary scan feature (kind of like JTAG) and built-in self-test features, as well as mechanisms to isolate the analog block and the CPU for separate testing. 

  10. Much of the information about this chip comes from A 16-Mbit/s adapter chip for the IBM token-ring local area network. That article describes an earlier version of the chip, so I can't be sure everything is accurate when applied to this chip. (It appears to me that the chips are the same apart from the smaller size of the newer chip.) One source says the two chips are compatible. The older chip has part number 51F1439 while the chip I examined is 50G6144.

    For information on Token Ring, the book The Triumph of Ethernet: Technological Communities and the Battle for the LAN Standard discusses the competition between network protocols in great detail. You might also like Foone's Twitter thread on Token Ring. Interestingly, one of the original "ENIAC Women", Jean Bartik, wrote a 1984 article on Token Rings—"IBM's Token Ring: Have the Pieces Finally Come Together?"—but unfortunately I haven't been able to locate a copy. 

  11. Token Ring cables could be joined using the "IBM Data Connector", a curious type of connector. The connectors are known as hermaphroditic because two connectors can be joined without worrying about male and female ends. The connectors were nicknamed "Boy George" connectors after the androgynous singer, which seems questionable by current standards. (The nickname may also be motivated by the BOGR text on the connector, which I think indicates the black, orange, green, and red wires.)

    IBM Data Connector. Photo from Redgrittybrick, (CC BY-SA 3.0).

    IBM Data Connector. Photo from Redgrittybrick, (CC BY-SA 3.0).

     

  12. The book The Innovator's Dilemma describes how a low-end but innovating technology can defeat an advanced, entrenched technology. I haven't investigated Token Ring versus Ethernet enough to be sure this model applies, so consider it a hypothesis. 

Booting the IBM 1401: How a 1959 punch-card computer loads a program

How do you boot a computer from punch cards when the computer has no operating system and no ROM? To make things worse, this computer requires special metadata called "word marks" that can't be represented on a card. In this blog post, I describe the interesting hardware and software techniques used in the vintage IBM 1401 computer to load software from a deck of punch cards. (Among other things, half of each card contains loader code that runs as each card is read.) I go through some IBM 1401 machine code in detail, which illustrates the strangeness of the 1401's architecture and instruction set compared to a modern machine.

The IBM 1401 was an early all-transistorized computer, so early that it didn't use silicon transistors but germanium transistors. It was announced in 1959, and went on to become the best-selling computer of the mid-1960s, with more than 10,000 systems in use. The 1401 leased for $2500 a month (about $20,000 in current dollars), a low price that opened up computing to many companies. Even a medium-sized business could use the 1401 for payroll, accounting, inventory, order processing, and invoicing.

An IBM 1401 mainframe computer at the Computer History Museum. IBM 729 tape drives are at the right.

An IBM 1401 mainframe computer at the Computer History Museum. IBM 729 tape drives are at the right.

To understand the 1401's architecture, it helps to understand how punch cards were used in that era. In 1928, IBM developed the 80-column punch card that became the standard for data processing for decades. A punch card held 80 characters, one per column, with the character represented by the holes punched in that column, as shown below. The 6-bit character set was limited to 64 different characters: upper case letters, numbers, and some special characters. Instead of binary, cards used a BCD-based encoding (which later was extended to create EBCDIC).1

Punch card code, from IBM 29 Card Punch Reference Manual.

Punch card code, from IBM 29 Card Punch Reference Manual.

Despite their limitations, punch cards were extensively used for data processing into the 1970s and beyond. A typical application used one card for each data record, so everything needed to fit into 80 columns2 which were divided up into fixed-length fields. Often, custom cards would be printed that showed the fields for an application, such as the card below designed for accounting.3 Each field has a fixed location. For instance, in the card below, the customer name is from columns 18 to 29 while the invoice amount is in columns 74 through 80.

Example card, from IBM 29 Card Punch Reference Manual.

The IBM 1401 has a peculiar architecture, optimized to support these punch-card applications. The idea is that fixed-length fields were be delimited in memory by word marks, a sort of metadata, and then instructions operated on these arbitrary-length fields. This let you move a 19-character name string with a single instruction. Or you could perform arithmetic on a 50-digit numeric field with a single instruction. Thus, word marks were convenient for fixed-field data, since you didn't need to loop over each character of the field.

To implement word marks, each memory location had 6 bits to hold a character as well as a separate bit to hold the word mark. (These were not bytes, as the IBM 1401 predated the popularity of byte-based computers.) It's important to note that the word marks were independent of the characters. Word marks were set or cleared using different instructions from the ones that acted on characters. Once word marks were configured, they remained unchanged as data records were read into memory.

Word marks were also critical for machine instructions since they indicated the length of the instruction. A machine instruction in the 1401 consisted of one to eight characters. The first character was the op code, potentially followed by addresses and/or a modifier. Each instruction needed to have a word mark set on the op code and a word mark on the next character after the instruction (i.e. the op code of the next instruction). Note that word marks create a problem. The machine instructions of a program are directly represented as characters on a punch card, but a punch card cannot hold the necessary word marks.

Thus, loading a program into the 1401 raised two problems. First was the standard computer bootstrap problem: if there's no program in the machine, what performs the load? But there was a second: word marks are a key component of 1401 machine code, but word marks cannot be represented on punch cards. In the next section, I'll explain in detail how the IBM 1401 solved these problems.

Loading a program

To load a program, a card deck, such as the short one below is placed into the card reader. Each card has the contents of the card printed at the top, with the holes punched in the columns below. The first two cards are bootstrap cards that initialize the computer's memory, clearing it out and setting necessary word marks. The bulk of the cards hold the machine code of the desired program on the left, and the machine code of the loader on the right. The last card runs the program.

A card deck for my Mandelbrot program.

A card deck for my Mandelbrot program.

At the far right of each card, columns 72-75 hold a sequence number (0001 through 0017). If you dropped a card deck, the cards could be put back into order by a card sorter, sorting on the sequence number.8

The load process was started by pressing the "Load" button on the card reader (the orange button near the center of the blue panel). This button causes several actions to take place.4 The first card was read, and the contents are placed in memory addresses 1 through 80. A word mark was set on address 1, and cleared from addresses 2 through 80. Finally, the instruction at address 1 was executed. Remember that these operations were implemented in hardware by boards with discrete transistors; there's no microcode or operating system to help out with these tasks.5

The IBM 1402 card reader/punch. The 1401 computer is in the background (left) and a tape drive is at the right.

The IBM 1402 card reader/punch. The 1401 computer is in the background (left) and a tape drive is at the right.

Bootstrap card 1

The first card contains the machine code:

,008015,022026,030040/019,001L020100   ,047054,061068,072072⌑08108110220001

The first instruction ,008015 is "Set Word Mark", a critical part of the bootstrap sequence. The comma is the op code and the address arguments are "008" and "015". (Since the 1401 is a decimal computer, not binary, the characters "015" are the same as the address 15.) This instruction sets word marks at the specified addresses, 8 and 15.

Remember that an instruction needs to have a word mark on the opcode and a second word mark on the character following the instruction. The "Load" button put a word mark at address 1, but what about the second word mark? It turns out that the hardware has an exception for the "Set Word Mark" instruction 6 allowing it to execute without the second word mark. (This exception is crucial, since otherwise the first instruction can't execute. Was this carefully planned or a hack to make things work? I don't know.)

The word marks that were set by the first instruction let the next two instructions run. They are also "Set Word Mark" instruction, putting word marks at addresses 22, 26, 30, and 40. Note that each "Set Word Mark" instruction sets two word marks but only "uses up" one, so the code is making progress, preparing word marks for future instructions.

Now we come to /019; with the slash opcode indicating the somewhat curious "Clear Storage" instruction. This instruction starts clearing storage at the specified address (19) and proceeds downwards until the address is a multiple of 100. Thus, in this case it will clear from address 19 down to address 0, erasing both characters and word marks. (These locations contained the instructions we just executed.) A location is erased by storing a blank; this may seem like a strange choice, but keep in mind that an empty punch card column is read as a blank. The next Set Word Mark instruction, ,001 puts a word mark back at location 1.

At this point, the contents of memory are as shown below. Word marks are indicated by underlined characters, which is how the IBM documentation indicated word marks.

                   40/019,001L020100   ,047054,061068,072072⌑08108110220001

The next instruction is L020100 "Load Characters to a Word Mark". This instruction copies the character at address 20 (i.e. "4") to address 100. The instruction then continues copying downwards (copying the blanks) until it hits a word mark (which is at address 1). To summarize, addresses 20 through 1 are copied to addresses 100 through 81. Locations 81 through 99 received blanks, while address 100 received a "4". This may seem pointless, but the "4" will turn out to be an important indicator shortly. This instruction also illustrates how word marks allow a long field to be copied with a single instruction.

The next three instructions set word marks at addresses 47, 54, 61, 68, and 72. (The boot code needs to go to a lot of effort to ensure that word marks are set up for future instructions.) The next instruction ⌑081081 has IBM's unusual "lozenge" character as the opcode. This instruction clears the word mark at address 81 (which had been copied from address 1). The final instruction on the card, 1022, reads the next card (opcode 1 is "Read") and then jumps to address 22. A lot has taken place to execute one card, but the next card has some remarkably tricky code.

Bootstrap card 2

After reading the second bootstrap card, memory locations 1 through 80 hold the data:

,008047/047046       /000H025B022100  4/061046,054061,068072,00104010400002

Execution of this card starts at address 22 with the Clear Storage instruction /000. Remember how the Clear Storage instruction proceeds downwards until the address is a multiple of 100? In this case, it will clear address 0 and then immediately stop on address 0 (a multiple of 100). However, a register called the B register will hold the next address (counting downwards), which will wrap from 0 to the top address in memory. For simplicity, I'll assume the code is running on a 1401 model with 1,000 characters of memory so the B register will hold the address 999.7

The next instruction H025 is a tricky bit of self-modifying code. It stores the contents of the B address register into locations 23-25, changing the "Clear Storage" instruction that we just executed to /999. Next, the B022100 4 instruction will branch to address 22 if address 100 holds a "4" (which is true because the first card put a "4" there.)

Back at address 22, the Clear Storage instruction was modified to be /999, so it will now clear addresses 999-900. It is followed by H025, which, as before will store the B register into the Clear Storage instruction. This time it will modify the Clear Storage to start at 899. Finally, the conditional branch loops back to address 22 as before.

The result is that this loop clears memory 100 characters at a time, using self-modifying code to update the position. This loop continues until addresses 100-199 are cleared. At this point, the branch instruction will fail because address 100 holds a blank and not a "4". At this point, the loop has cleared all of storage from 100 to the end of memory, erasing characters as well as any word marks.

The next instruction is Clear Storage /061046 which clears storage from address 46 down to 0 and then branches to 61. At address 61, ,001040 sets word marks at addresses 1 and 40. Finally, 1040 reads the next card and starts execution at address 40. As with the first card, columns 1 through 80 of the card are read into memory addresses 1 through 80.

The program cards

The next phase consists of reading the desired program into memory. A typical card is:

3332200999&2200&0000000100000          L029368,343346,351356,36136410400004

The left part of the card (columns 1-29) contains machine code for the program that we want to run. The right part (columns 41-71) contains the loader code that will execute card-by-card, loading that code into the right part of memory and setting word marks.

The first loader instruction L029368 copies the program code from the card reader buffer into the desired memory locations. Specifically, it will copy starting from address 29 down to the word mark at address 1. These characters will be copied into addresses 368 down to 340. The next instructions set the word marks in this code, at addresses 343, 346, 351, 356, 361, and 364. This answers the question of how the program in memory gets word marks even though punch cards can't explicitly store word marks. Finally, 1040 reads the next card and starts executing it at address 40.

The following cards have the same structure: the program on the left and the loader code on the right. Interestingly, the number of characters of program code is variable because the loader code can set at most 6 word marks per card. In the worst case, all the characters need word marks so only 6 characters can be provided by the card. In the best case, 40 characters can fit on the left side of the card.

The run card

The last card has the Clear Storage instruction /333080. This clears memory from address 80 downwards to 0, wiping out the card buffer and the loader code so the program will start with a clean slate. The Clear Storage instruction then jumps to address 333, starting the execution of the program. After all this work, the computer finally runs the program we wanted to run. While the loading process seems very long when written out, the card reader is fast for an electromechanical device, with over 13 cards per second zipping through it.

The program I used in the example is a Mandelbrot fractal generator that I wrote. The photo below shows the results of the program, which took 12 minutes to execute. I discuss the program in detail in this post.

The IBM 1401 mainframe computer (left) at the Computer History Museum printing the Mandelbrot fractal on the 1403 line printer (right).

The IBM 1401 mainframe computer (left) at the Computer History Museum printing the Mandelbrot fractal on the 1403 line printer (right).

The bootstrap code I described above is just one of the possible bootstrap sequences. Programmers could write their own bootstrap code, trying to make it as short as possible. I described a longer three-card sequence here. The IBM 1401 could also boot from a magnetic tape using a similar process; pressing the "Tape Load" button on the console loaded a record from tape, just like booting from a card.

Console of the IBM 1401 computer. The "Tape Load" button is in the lower right.

Console of the IBM 1401 computer. The "Tape Load" button is in the lower right.

The origins of "bootstrapping"

The term "bootstrap" has an interesting history. It starts with physical boots, which often had boot straps on the top, physical straps to help pull the boots on (as shown below). In the 1800s, the saying "No man can lift himself by his own boot straps" was used as a metaphor for the impossibility of improvement solely through one's own effort. (Pulling on the straps on your boots superficially seems like it should lift you off the ground, but is of course physically impossible.)

Example of a boot strap at the heel of a boot, from patent 41087, not the first boot strap patent.

Example of a boot strap at the heel of a boot, from patent 41087, not the first boot strap patent.

By the mid-1940s, "bootstrap" was used in electronics to describe a circuit that started itself up through positive feedback, metaphorically pulling itself up by its boot straps. (See usages from 1943, 1944, and 1946). By 1952, analog computers used circuits called "bootstrap integrators".

When a digital computer loaded its program through its own efforts, this took on the name "bootstrap", dating back to the 1950s. (Using a program to load a program seems as paradoxical as lifting yourself up by your bootstraps, but fortunately it works.) A 1954 glossary defined "bootstrap" as "The coded instructions at the beginning of an input tape, together with one or two instructions inserted by switches or buttons into the computer, used to put a routine into the computer." A 1955 computer survey published by the Department of Commerce had a similar definition.

Conclusion

Bootstrapping the IBM 1401 was complicated, and the process has become even more complex in later computers. In the 1960s, computers such as the IBM System/360 had bootstrap microcode stored in read-only storage. This code could load a chain of bootstrap programs, first a 64-byte bootstrap card, which would then load a 4-kilobyte bootstrap program, which could then load the disk operating system. Some early minicomputers and microcomputers lacked ROM and took a step backward, requiring the user to tediously toggle in boot code through switches on the front panel.

Modern computers go through a much more complex bootstrap process. The initial boot code for an x86 system is stored in ROM, and booting happens through the BIOS in older computers or UEFI in more modern systems. The system starts in a primitive state without caches or virtual memory, running a single core in "8086 real mode". The boot code sets up the system and loads a bootloader program, which may then load another bootloader, which loads the kernel, which starts up the computer's various processes. Details are in this presentation.

Studying the 1401's machine code shows many of its unusual characteristics compared to modern computers and the strangeness of its instruction set. Needing to deal with word marks is the most obvious difference, with special instructions to set and erase them. From a modern perspective, it's unusual to see a computer that doesn't use bytes, although that was common back then. The use of decimal arithmetic and decimal addressing also seems strange from the modern perspective. Another curiosity is self-modifying code. Although self-modifying code is discouraged nowadays, it was common on the 1401 (as with other computers of that era).

I announce my latest blog posts on Twitter, so follow me @kenshirriff. I also have an RSS feed.

Notes and references

  1. While punch cards almost always held character data, an optional feature called "column binary" allowed binary data to be punched onto cards, 12 bits in each column. IBM charged $101 a month (in 1960s dollars) for the column binary feature. 

  2. The need to fit all the data into 80 columns was one of the factors that led to the Y2K problem. If you used four columns on a card to hold the year instead of two, you'd need to give up two precious columns somewhere else. 

  3. The punch card below is an example of a card custom-printed for a customer application. This card was used for payroll at the Phoenix Steel Corporation.

    A punch card designed for a steel mill.

    A punch card designed for a steel mill.

     

  4. The operation of the load key is specified in the 1401 reference manual (p118): "This key is used to start loading instruction cards. Pressing the load key operates the read feed until a card has passed the read station. The I -address register is set to 001, and a word mark is set in address 001. All other word marks in addresses 002 through 080 are removed." The instruction in the first columns is executed, and then continued operation is controlled by the first instruction. 

  5. How does the first word mark get set when you load the first card? I looked at the documentation of the circuitry and found the relevant flip-flop (below). It is set by the load button, sets the first word mark (WM), and then is cleared.

    The flip-flop to set word marks. From the 1401 logic diagrams, figure 81.

    The flip-flop to set word marks. From the 1401 logic diagrams, figure 81.

    The photo below shows the card that implements this flip-flop. With the 1401, you can actually see the physical transistors that implement each function.

    A flip-flop card, type "CW".

    A flip-flop card, type "CW".

     

  6. Instructions need to be indicated with word marks with a few specific exceptions. As documented in the 1401 reference manual (p15) "The 4-character unconditional branch instruction, the 7-character set word mark, and clear storage and branch instructions are the only instructions that can be followed by a blank without a word mark. All other instructions must be followed by a word mark." 

  7. The 1401 computer that I used has 16,000 characters of memory (not 16,384 because it's a decimal machine!) so after the Clear Storage instruction, the B register will hold 15,999, pointing to the top of memory. You might wonder how the address 15,999 is represented in three decimal characters. The trick is that a special address code uses the top two bits of the characters to hold the kilobyte part of the address. The resulting address is 999 with the top two bits of the hundreds and units characters set. The result is the three-character alphanumeric address I9I represents the address 15,999. 

  8. If you had the misfortune to drop your cards, a card sorter could put them back in order using the sequence numbers. A card sorter rapidly sorted cards into slots based on the digit punched in one column. By running the cards through several times, you could sort on the complete sequence number. I discuss card sorters in great detail here. (A low-tech way to keep cards in order was to draw a diagonal line across the top of the cards; it helped when putting cards back in order manually.)

    An IBM Type 83 card sorter. Cards enter the machine on the right, whiz along the top of the machine, and fall into the appropriate hopper underneath.

    An IBM Type 83 card sorter. Cards enter the machine on the right, whiz along the top of the machine, and fall into the appropriate hopper underneath.

    The use of sequence numbers in columns 73-80 goes back to the Fortran language. Fortran was developed for the IBM 704 vacuum tube computer. The 704 was a 36-bit machine. The punch-card reading process used two 36-bit words, so only 72 columns could be read. (These could be any 72 columns of the card, selected by a wiring panel, but typically columns 1-72 were used.) The result was that columns 1-72 were used for code (a restriction still often used), while columns 73-80 were free for sequence numbers. 

Teardown of a quartz crystal oscillator and the tiny IC inside

The quartz oscillator is an important electronic circuit, providing highly-accurate timing signals at a low cost. A quartz crystal has the special property of piezoelectricity, changing its electrical properties as it vibrates. Since a crystal can be cut to vibrate at a very precise frequency, quartz oscillators are useful for many applications. Quartz oscillators were introduced in the 1920s and provided accurate frequencies for radio stations. Wristwatches were revolutionized in the 1970s by the use of highly-accurate quartz oscillators. Computers use quartz oscillators to generate their clock signals, from ENIAC in the 1940s to modern computers.1

A quartz crystal requires additional circuitry to make it oscillate, and this analog circuitry can be tricky to design. In the 1970s, crystal oscillator modules became popular, combining the quartz crystal, an integrated circuit, and discrete components into a compact, easy-to-use module. Curious about the contents of these modules, I opened one up and reverse-engineered the chip inside. In this blog post, I discuss how the module works and examine the tiny CMOS integrated circuit that runs the oscillator. There's more happening in the module than I expected, so I hope you find it interesting.

The oscillator module

I examined the oscillator module from an IBM PC card.2 The module is packaged in a rectangular 4-pin metal can that protects the circuitry from electrical noise. (It is the "Rasco Plus" rectangular can on the right, not the square IBM integrated circuit.) This module produced a 4.7174 MHz clock signal, as indicated by the text on the package.

The quartz oscillator module is in the lower right, labeled Rasco Plus. 4.7174 MHZ, © Motorola 1987. The square module is an IBM integrated circuit. Click this (or any other image) for a larger version.

The quartz oscillator module is in the lower right, labeled Rasco Plus. 4.7174 MHZ, © Motorola 1987. The square module is an IBM integrated circuit. Click this (or any other image) for a larger version.

I cut open the can to reveal the hybrid circuitry inside. I was expecting a gem-like quartz crystal inside, but found that oscillators use a very thin disk of quartz. (I damaged the crystal while opening the package, so the upper part is missing..) The quartz crystal is visible on the left, with metal electrodes attached to either side of the crystal. The electrodes are attached to small pegs, raising the crystal above the surface so it can oscillate freely.

Inside the oscillator package, showing the components mounted on the ceramic substrate.

Inside the oscillator package, showing the components mounted on the ceramic substrate.

On the right side of the module is a tiny CMOS integrated circuit die. It is mounted on the ceramic substrate and connected to the circuitry by tiny golden bond wires. A surface-mount capacitor (3 nF) and a film resistor (10Ω) on the substrate filter out noise from the power pin.

The IC's circuitry

The photo below shows the tiny integrated circuit die under a microscope, with the pads and main functional blocks labeled. The brownish-green regions are the silicon that forms the integrated circuit. A metal layer (yellowish white) wires up the components of the IC. Below the metal, reddish polysilicon implements transistors, but it is mostly obscured by the metal layer. Around the outside of the chip, bond wires are connected to pads, wiring the chip to the rest of the oscillator module. Two pads (select and disable) are left unconnected. The chip was manufactured by Motorola, with a 1986 date. I couldn't find any information on the part number SC380003.

The integrated circuit die with key blocks labeled. "FF" indicates flip-flops. "sel" indicates select pads. "cap" indicates pads connected to the internal capacitors.

The integrated circuit die with key blocks labeled. "FF" indicates flip-flops. "sel" indicates select pads. "cap" indicates pads connected to the internal capacitors.

The IC has two functions. First, its analog circuitry drives the quartz crystal to produce oscillations. Second, the IC's digital circuitry divides the frequency by 1, 2, 4, or 8, and produces a high-current clock output signal. (The division factor is selected by the two select pins on the IC.)

The oscillator is implemented with a circuit (below) called a Colpitts oscillator, which is more complex than the usual quartz oscillator circuit.43 The basic idea is that the crystal and the two capacitors oscillate at the desired frequency. The oscillations would rapidly die out, however, except for the feedback boost from the drive transistor.

Simplified schematic of the oscillator.

Simplified schematic of the oscillator.

In more detail, as the voltage across the crystal increases, the transistor turns on, feeding current into the capacitors and boosting the voltage across the capacitors (and thus the crystal). But as the voltage across the crystal decreases, the transistor turns off and the current sink (circle with arrow) pulls current out of the capacitors, reducing the voltage across the crystal. Thus, the feedback from the drive transistor strengthens the crystal's oscillations to keep them going.

The bias voltage and current circuits are an important part of this circuit. The bias voltage sets the drive transistor's gate midway between "on" and "off", so the voltage oscillations on the crystal will turn it on and off. The bias current is set midway between the drive transistor's on and off currents so the current flowing in and out of the capacitors balances out.5 (I'm saying "on" and "off" for simplicity; the signal will be a sine wave.)

A large part of the integrated circuit is occupied by five capacitors. One is the upper capacitor in the schematic, three are paralleled to form the lower capacitor in the schematic, and one stabilizes the voltage bias circuit. The die photo below shows one of the capacitors after dissolving the metal layer on top. The red and green region is polysilicon, which forms the upper plate of the capacitor, along with the metal layer. Underneath the polysilicon, the pinkish region is probably silicon nitride, forming the insulating dielectric layer. The doped silicon (not visible underneath) forms the bottom plate of the capacitor.

A capacitor on the die. The large faint square to the left of the capacitor is a pad for connecting a bond wire to the IC.
The complex structures on the left are clamp diodes on the pins. The cloverleaf structures on the right are transistors, which will
be discussed later.

A capacitor on the die. The large faint square to the left of the capacitor is a pad for connecting a bond wire to the IC. The complex structures on the left are clamp diodes on the pins. The cloverleaf structures on the right are transistors, which will be discussed later.

Curiously, the capacitors aren't connected together on the chip, but are connected to three pads that are wired together by bond wires. Perhaps this provides flexibility; the capacitance in the circuit can be modified by omitting the wire to a capacitor.

The digital circuitry

The right side of the chip contains digital circuitry to divide the crystal's output frequency by 1, 2, 4, or 8. This lets the same crystal provide four different frequencies. The divider is implemented by three flip-flops in series. Each one divides its input pulses by 2. A 4-to-1 multiplexer selects between the original clock pulses, or the output from one of the flip-flops. The choice is made through the wiring to the two select pads on the right side of the die, fixing the ratio at manufacturing time. Four NAND gates (along with inverters) are used to decode these pins and generate four control signals to the multiplexer and flip-flops.

How CMOS logic is implemented

The chip is built with CMOS logic (complementary MOS), which uses two types of transistors, NMOS and PMOS, working together. The diagram below shows how an NMOS transistor is constructed. The transistor can be considered a switch between the source and drain, controlled by the gate. The source and drain (green) consist of regions of silicon doped with impurities to change its semiconductor properties and called N+ silicon. The gate consists of a special type of silicon called polysilicon, separated from the underlying silicon by a very thin insulating oxide layer. The NMOS transistor turns on when the gate is pulled high.

Structure of an NMOS transistor. A PMOS transistor has the same structure, but with N-type and P-type silicon reversed.

Structure of an NMOS transistor. A PMOS transistor has the same structure, but with N-type and P-type silicon reversed.

A PMOS transistor has the opposite construction from NMOS: the source and drain consist of P+ silicon embedded in N silicon. The operation of a PMOS transistor is also opposite from the NMOS transistor: it turns on when the gate is pulled low. Typically PMOS transistors pull the drain (output) high, while NMOS transistors pull the drain low. In CMOS, the transistors act in a complementary fashion, pulling the output high or low as needed.

The diagram below shows how a NAND gate is implemented in CMOS. If an input is 0, the corresponding PMOS transistor (top) will turn on and pull the output high. But if both inputs are 1, the NMOS transistors (bottom) will turn on and pull the output low. Thus, the circuit implements the NAND function.

A CMOS NAND gate is implemented with two PMOS transistors (top) and two NMOS transistors (bottom).

A CMOS NAND gate is implemented with two PMOS transistors (top) and two NMOS transistors (bottom).

The diagram below shows how a NAND gate appears on the die. The transistors have complex, meandering shapes, unlike the rectangular layouts that appear in textbooks. The left side holds the PMOS transistors, while the right side holds the NMOS transistors. The polysilicon that forms the gates is the slightly reddish wiring on top of the silicon. Most of the underlying silicon is doped, making it conductive and slightly darker than the non-conductive undoped silicon along the left and right edges and in the center. For this photo, the metal layer was removed with acid to reveal the silicon and polysilicon underneath; the yellow line illustrates where some of the metal wiring was. The circles are connections between the metal layer and the underlying silicon or polysilicon.

A NAND gate as it appears on the die.

A NAND gate as it appears on the die.

The transistors in the die photo can be matched up with the NAND-gate schematic; look at the transistor gates formed by polysilicon and what they separate. There is a path from the +5 region to the output through the large elongated PMOS transistor on the left, and a second path through the small PMOS transistor near the center, indicating the transistors are in parallel. Each gate is controlled by one of the inputs. On the right, a path from ground to the output connection must go through both of the concentric NMOS transistors, indicating they are in series.

This integrated circuit also uses many circle-gate transistors, an unusual layout technique that allows multiple transistors in parallel at high density. The photo below shows 16 circle-gate transistors. The copper-colored cloverleaf patterns are the transistor gates, implemented with polysilicon. The inside of each "leaf" is the transistor drain, while the outside is the source. The metal layer (removed) wires all the sources, gates, and drains together respectively; the parallel transistors act as one larger transistor. Paralleled transistors are used in the output pin drivers to provide high current for the output. In the bias circuitry, different numbers of transistors are wired together (e.g. 6, 16, or 40) to provide the desired current ratios.

Sixteen circle-gate transistors with four gate connections.

Sixteen circle-gate transistors with four gate connections.

Transmission gate

Another key circuit in the chip is the transmission gate. This acts as a switch, either passing a signal through or blocking it. The schematic below shows how a transmission gate is constructed from two transistors, an NMOS transistor and a PMOS transistor. If the enable line is high, both transistors turn on, passing the input signal to the output. If the enable line is low, both transistors turn off, blocking the input signal. The schematic symbol for a transmission gate is shown on the right.

A transmission gate is constructed from two transistors. The transistors and their gates are indicated. The schematic symbol is on the right.

A transmission gate is constructed from two transistors. The transistors and their gates are indicated. The schematic symbol is on the right.

Multiplexer

A multiplexer is used to select one of the four clock signals. The diagram below shows how the multiplexer is implemented from transmission gates. The multiplexer takes four inputs: A, B, C, and D. One of the inputs is selected by activating the corresponding select line and its complement. That input is connected through the transmission gate to the output, while the other inputs are blocked. Although a multiplexer can be built with standard logic gates, the implementation with transmission gates is more efficient.

The 4-to-1 multiplexer is implemented with transmission gates.

The 4-to-1 multiplexer is implemented with transmission gates.

The schematic below shows the transistors that make up the multiplexer. Note that inputs B and C have pairs of transistors. I believe the motivation is that a pair of transistors presents half the resistance to the signal. Since inputs B and C are the higher-frequency signals, the pair of transistors allows them to pass through with less distortion and delay.

Schematic of the multiplexer, matching the physical layout on the chip.

Schematic of the multiplexer, matching the physical layout on the chip.

The image below shows how the multiplexer is physically implemented on the die. The polysilicon gate wiring is most prominent. The metal layer has been removed; the metal lines ran vertically connecting corresponding transistors segments. Note that the sources and drains of neighboring transistors are merged into single regions between the gates. The top rectangle holds the NMOS transistors while the lower rectangle holds the PMOS transistors; because PMOS transistors are less efficient, the lower rectangle needs to be larger.

Die photo of the multiplexer.

Die photo of the multiplexer.

Flip-flop

The chip contains three-flip-flops to divide the clock frequency. The oscillator uses toggle flip-flops, that flip between 0 and 1 each time they receive an input pulse. Since two input pulses result in one output pulse (0→1→0), the flip-flop divides the frequency by 2.

A flip-flop is constructed from transmission gates, inverters, and a NAND gate, as shown in the schematic below. When the input clock is high, the output passes through the inverter and the first transmission gate to point A. When the input clock switches low, the first transmission gate opens, so point A holds its previous value. Meanwhile, the second transmission gate closes, so the signal passes through the second inverter and transmission gate to point B. The NAND gate inverts it again, causing the output to flip from its previous value. A second cycle of the input clock repeats the process, causing the output to return to its initial value. The result is that two cycles of the input clock result in one cycle of the output, so the flip-flop divides the frequency by 2.

Implementation of a toggle flip-flop.

Implementation of a toggle flip-flop.

Each flip-flop has an enable input. If a flip-flop is not needed for the selected output, it is disabled. For instance, if the "divide by 2" mode is selected, only the first flip-flop is used, and the other two are disabled. I assume this is done to reduce power consumption. Note that this is independent from the module's disable pin, which blocks the module output entirely. This disable feature is optional; this particular module does not provide the disable feature and the disable pin is not wired to the IC.

The schematic above shows the inverters and transmission gates as separate structures. However, the flip-flop uses an interesting gate structure that combines the inverter and the transmission gate (left) into a single gate (right). The pair of transistors connected to data in function as an inverter. However, if the clock in is low, both power and ground are blocked so the gate will not affect the output and it will hold its previous voltage. This provides the transmission gate functionality.

Implementation of a combination inverter / transmission gate.

Implementation of a combination inverter / transmission gate.

The photo below shows how one of these gates appears on the die. This photo includes the metal layer on top; the reddish polysilicon gates are visible underneath. The two PMOS transistors are on the left, as concentric loops, while the NMOS transistors are on the right.

One of the combination inverter / transmission gates, as it appears on the die.

One of the combination inverter / transmission gates, as it appears on the die.

Conclusion

While the oscillator module looks simple from the outside, on the inside there's a lot more complexity than you might expect.6 It contains not just a quartz crystal but also discrete components and a tiny integrated circuit. The integrated circuit combines capacitors, analog circuitry to drive the oscillations, and digital circuitry to choose a frequency. By changing the wiring to the integrated circuit during manufacturing, four different frequencies can be selected.

I'll end with the die photo below showing the chip after removing the metal and oxide layers, showing the silicon and polysilicon underneath. The large pinkish capacitors are the most visible feature in this image, but the transistors can also be seen. (Click the image for a larger version.)

Die photo of the oscillator chip with metal removed to show the polysilicon and silicon underneath.

Die photo of the oscillator chip with metal removed to show the polysilicon and silicon underneath.

I announce my latest blog posts on Twitter, so follow me at kenshirriff. I also have an RSS feed.

Notes and references

  1. Modern PCs use quartz crystals, but with a more complex technique to get multi-gigahertz clock frequencies. A PC uses a crystal with a much lower frequency, and multiplies the frequency using a circuit called a phase-locked loop. Computers often used a 14.318 MHz crystal because that frequency was used in old television sets, so crystals with that frequency were common and cheap. 

  2. Why does the board use a 4.7174 MHz crystal, a somewhat unusual frequency? In the 1970s, the IBM 3270 was a very popular CRT terminal. These terminals were connected with coaxial cable and used the Interface Display System Standard protocol with a 2.3587 MHz bit rate. In the late 1980s, IBM produced interface cards to connect an IBM PC to a 3270 network. I obtained the crystal from one of these interface cards (type 56X4927), and the crystal frequency of 4.7174 MHz is exactly twice the 2.3587 MHz bit rate. 

  3. The terminology used for crystal oscillators is confusing with "Colpitts oscillator" and "Pierce oscillator" used in contradictory ways. I looked into the history of oscillators to try to sort out the naming, and I'll discuss it in this footnote.

    In 1918, Edwin Colpitts, the head researcher at Western Electric, invented an inductor/capacitor oscillator, now known as the Colpitts Oscillator. The idea is that the inductor and capacitors form a "resonant tank", which oscillates at a frequency set by the component values. (You can think of the electricity in the tank as sloshing back and forth between the inductor and the capacitors.) On their own, the oscillations would rapidly die out, so an amplifier is used to boost the oscillators. In the original Colpitts oscillator, the amplifier was a vacuum tube. Later circuits moved to transistors, but it can also be an op-amp or other type of amplifier. (Other circuits, such as the module I examined, ground an end and provide feedback to the middle. In that case, there is no inversion from the capacitors, so a non-inverting amplifier is used.)

    A simplified schematic of a Colpitts oscillator, showing the basic components.

    A simplified schematic of a Colpitts oscillator, showing the basic components.

    The key feature of the Colpitts oscillator is the two capacitors, which form a voltage divider. Since the capacitors are grounded in the middle, the two ends will have opposite voltages: when one end goes up, the other goes down. The amplifier takes the signal from one end, amplifies it, and feeds it into the other end. The amplifier inverts the signal and the capacitors provide a second inversion, so the feedback strengthens the original signal (i.e. it has a phase shift of 360°).

    In 1923, George Washington Pierce, a professor of physics at Harvard, replaced the inductor in the Colpitts oscillator with a crystal. The crystal made the oscillator much more accurate (higher Q factor), leading to its heavy use in radio transmission and other applications. Pierce patented his invention and made a lot of money off it from companies such as RCA and AT&T. The patents led to years of litigation, eventually reaching the Supreme Court. (For more information, see this thesis on crystal history.)

    For several decades, the common terminology was that a Pierce oscillator was a Colpitts oscillator that used a crystal. (See Air Force Manual, 1957 and Navy training, 1983 for instance.) The Pierce oscillator often omitted the characteristic voltage-divider capacitors, using the stray capacitance of the vacuum tube instead. But then terminology shifted, with "Colpitts oscillator" and "Pierce oscillator" indicating two different types of crystal oscillator: Colpitts with the capacitors and Pierce without the capacitors. (See, for example, the classic electronics text Horowitz and Hill.)

    Another change in terminology was to describe the Colpitts oscillator, Pierce oscillator, and Clapp oscillator as topologically identical crystal oscillators, just differing in what point in the circuit was considered AC ground (the collector, emitter, or base respectively). (See Frerking's Crystal Oscillator Design and Temperature Compensation (1978, p56) or Maxim's crystal oscillator tutorial.) Alternatively, these oscillators can all be called Colpitts, but common-collector, common-emitter, or common-base (details).

    The point of this history is that oscillator terminology is confusing, with different sources calling oscillators Colpitts or Pierce in contradictory ways. Getting back to the oscillator module I examined, it could be described as a common-drain Colpitts oscillator (analogous to common-collector). It would also be called a Colpitts oscillator using the terminology based on the ground position. Historically, it would be called a Pierce oscillator since it uses a crystal. It's also called a single-pin crystal oscillator since only one pin of the crystal is connected to the circuitry (and the other is grounded). 

  4. The typical quartz oscillator is built using a simple circuit called the Pierce-gate oscillator, where the crystal forms a feedback loop with an inverter. (The two capacitors grounded in the middle make this very similar to the classical Colpitts oscillator.)

    The Pierce oscillator circuit commonly used as a computer clock. Diagram by Omegatron, CC BY-SA 3.0.

    The Pierce oscillator circuit commonly used as a computer clock. Diagram by Omegatron, CC BY-SA 3.0.

    I'm not sure why the module I disassembled uses a more complex oscillator circuit that requires tricky biasing. 

  5. The voltage bias and current bias circuits are moderately complex analog circuits built with a bunch of transistors and a few resistors. I won't describe them in detail, but they use feedback loops to generate the desired fixed voltage and current. 

  6. If you want to learn more about quartz oscillators, there are interesting videos at EEVblog, electronupdate, and WizardTim. Colpitts oscillators are explained in videos at Hackaday