Showing posts with label ibm1401. Show all posts
Showing posts with label ibm1401. Show all posts

Hammer time: fixing the printer on a vintage IBM 1401 mainframe

The Computer History Museum has two operational IBM 1401 computers used for demos but one of the printers stopped working a few weeks ago. This blog post describes how the 1401 restoration team diagnosed and repaired the printer. After a lot of tricky debugging (as well as smoke coming out of the printer) we fixed a broken trace on a circuit board. (This printer repair might sound vaguely familiar because I wrote in September about an entirely different printer failure due to a failed transistor.)

The IBM 1401 business computer was announced in 1959, and went on to become the best-selling computer of the mid-1960s, with more than 10,000 systems in use. A key selling point of the IBM 1401 was its high-speed line printer, the IBM 1403. It printed 10 lines per second with excellent print quality, said to be the best printing until laser printers were introduced in the 1970s.

The IBM 1401 mainframe computer (left) at the Computer History Museum printing the Mandelbrot fractal on the 1403 printer (right).

The IBM 1401 mainframe computer (left) at the Computer History Museum printing the Mandelbrot fractal on the 1403 printer (right).

To print characters, the printer used a chain of type slugs (below) that rotated at high speed in front of the paper, with an inked ribbon between the paper and the chain. Each of the 132 print columns had a hammer and an electromagnet. At the right moment, when the desired character passed the hammer, an electromagnet drove the hammer against the back of the paper, causing the paper and ribbon to hit the type slug, printing the character.1

The type chain from the IBM 1401's printer. The chain has 48 different characters, repeated five times.

The type chain from the IBM 1401's printer. The chain has 48 different characters, repeated five times.

The printer required careful timing to make this process work. The chain spun around at 7.5 feet per second and each hammer had to fire at exactly the right time to print the right character perfectly aligned without smearing. Every 11.1 µs, a print slug lined up with a hammer, and the control circuitry checked if the slug matches the character to be printed. If so, the electromagnet was energized for 1.5 ms, printing the character.

Printing mechanism of the IBM 1401 line printer. From 1401 Reference Manual, p11.

Printing mechanism of the IBM 1401 line printer. From 1401 Reference Manual, p11.

While the printer is usually reliable, a few weeks ago the printer stopped working and displayed a "sync check" error on the console. The computer needs to know the exact position of the chain in order to fire the hammers at the right time. If something goes wrong with this synchronization, the computer stops with "sync check" rather than printing the wrong characters.

When the sync check light on the printer is illuminated, you have a problem.

When the sync check light on the printer is illuminated, you have a problem.

To track the chain position, the computer receives a sequence of pulses from the printer: a pulse when the first hammer is lined up with a type slug2 and a double pulse when the chain is in its "home" position with the first character lined up. The pulses are created by a slotted metallic timing disk inside the printer. A magnetic pickup detects these slots and produces a small 100 millivolt signal.7 This signal is amplified inside the printer by two differential amplifier cards to produce a stronger square wave signal. (This is the only electronic part of the printer. Everything else inside the printer is electromechanical or hydraulic; a high-speed hydraulic motor feeds paper through the printer and drips oil on the floor.)

The computer receives these pulses from the printer and generates a logic signal that increments counters to keep track of the chain's position. The schematic below shows part of the circuitry inside the computer, starting with the sense amplifier signal from the printer at the left. Don't try to understand this circuit; I just want to show the strange schematic symbols that IBM used in the 1960s. The box with "I" is an inverter. The triangle is an AND gate. The semicircle that looks like an AND gate is actually an OR gate. The large box with a "T" is a trigger, IBM's name for a flip-flop. The "SS" box is a "single shot" that creates a 400µs pulse; this detects the double pulse that indicates the chain's "home" position.

Excerpt from the 1401 Intermedia Level Diagrams (ILD) showing the chain detection circuitry.

Excerpt from the 1401 Intermedia Level Diagrams (ILD) showing the chain detection circuitry.

To track down the problem, we removed the printer's side panel to access the two amplifier circuit boards, which are visible below. We probed the boards with an oscilloscope. The first amplifier stage (on the right) looked okay, but the second stage (on the left) had problems. In the photo below, the computer is at the back, mostly hidden by the printer.

We took the side panel off the 1403 printer to reveal the circuit boards. We hooked an oscilloscope up to the front board to test it.

We took the side panel off the 1403 printer to reveal the circuit boards. We hooked an oscilloscope up to the front board to test it.

The trace below shows what should happen. The board receives a differential signal at the bottom, with alternating cyan and pink signals. The difference between these signals (middle) is amplified to produce the clean, uniform pulse train at the top. Note the double pulse in the middle indicating the chain's home position.

Oscilloscope trace from a working printer.

Oscilloscope trace from a working printer.

But when we measured the signal, we saw signals that were entirely garbled. The differential signals at the bottom are a mess, and track each other rather than alternating. The output signal (top) is basically random. With this signal from the printer, the computer couldn't keep track of the chain position and the sync check error resulted.

Oscilloscope trace from the faulty printer.

Oscilloscope trace from the faulty printer.

We swapped the board with the board from the other, working printer, and verified that the board was the problem. The museum has a filing cabinet full of replacement circuit boards, but unfortunately not a replacement for this "WW" amplifier board. Instead, we had to diagnose the problem with this card and repair it. On the board below you can see the diodes (small gray cylinders), capacitors (silver cylinders), resistors (striped cylinders and large tan cylinders), and germanium transistors (round metal cans). The transistors are germanium transistors as the 1401 predated silicon transistors.

The "WW" differential amplifier card used by the printer.

The "WW" differential amplifier card used by the printer.

We suspected a failed transistor, so we used Marc's vintage Hewlett-Packard Tektronix curve tracer (below) to test the transistors. One transistor was much weaker than the others. Since the performance of a differential amplifier depends on having transistors with closely matched characteristics, we searched through a couple dozen transistors to find a matching pair and replaced the transistors. (We later determined that these transistors were not part of the differential pair—they were "emitter follower" buffers, so our effort was wasted.)3

We used a vintage HP transistor curve tracer to test the transistors.

We used a vintage HP transistor curve tracer to test the transistors.

Back at the Computer History Museum, we tested out the repaired board and the printer still didn't work. Even worse, smoke started coming from the back of the printer! I quickly shut off the system as an acrid smell surrounded the printer. I expected to see a blackened transistor on the board, but it was fine. I examined the printer but couldn't find the source of the smoke.

I decided to test the board outside the printer by feeding in a 2kHz test signal, but the measurements didn't make sense. The board seemed to be ignoring one of the inputs, so I tested that input transistor but it was fine. Next I checked the diodes, capacitors and resistors again; all the components tested okay, but the board still mysteriously failed. I started carefully measuring voltages at various points in the circuit but the signals didn't make sense and weren't consistent. Since all the components were fine but the board didn't work, I was starting to losing confidence in electronics. Eventually, I nailed down a signal that randomly jumped between 10 volts and 1 volt. After wiggling all the components, I finally noticed that the voltage jumped if I flexed the board. Finally, I had an answer: a cracked trace on the circuit board between the input and the transistor was making intermittent contact.

The board had a cracked trace in the upper left, connecting the upper gold contact. Carl put a wire jumper across the bad section.

The board had a cracked trace in the upper left, connecting the upper gold contact. Carl put a wire jumper across the bad section.

To fix the board, Carl put a wire bridge across the bad trace6. We put the board in the printer, and the printer mostly worked. However, when the printer tried to print in column 85, the column failed to print and the printer stopped with an error.5 More testing revealed four columns of the printer were failing to print due to hammer problems. Each electromagnetic hammer coil is driven by a 60 volt, 5 amp pulse for 1.5 milliseconds. This is a lot of power (300 watts), so if anything goes wrong, hammer coils can easily burn up.

We swung open one of the computer's "gates" (lower left), revealing the cards that drive the printer.

We swung open one of the computer's "gates" (lower left), revealing the cards that drive the printer.

We looked at the printer driver cards inside the computer. Each card generates pulses for two hammers, so there are 66 of these cards. In the photo below, you can see the two large high-current transistors at the left that generate the pulses. (Note the felt insulators on top of these transistors. Due to their height, the transistors pose a risk of shorting against the bottom of the neighboring card.) Just to the right of these transistors are two colorful purple and yellow fuses. In the event of a fault, these fuses are supposed to burn out and protect the hammer coils. We checked the cards associated with the four bad columns on the printer and found four burnt-out fuses.

The "AEC" Alloy-Hammer Driver Latch card produces high-current pulses to drive the printer hammer coils.

The "AEC" Alloy-Hammer Driver Latch card produces high-current pulses to drive the printer hammer coils.

Why did the fuses blow? The circuit to drive the hammer coils is a bit tricky. Every 11 microseconds, a hammer lines up with a character slug and can be fired. But when a hammer is fired, the coil needs to be activated for about 1.5 ms, a much longer time interval. To accomplish this, the hammer driver latches on when a hammer is fired. Later in the print cycle, the hammer driver is turned off. This process is controlled by the chain position counters, which are driven by the pulses from the chain sensor, the same pulses that were intermittent. Thus, if the computer received enough pulses to start printing a line, but then the pulses dropped out in the middle of the line, hammer drivers could be left in the on state until the fuse blows. This explained the problem that we saw.

After Carl replaced the fuses, the printer worked fine except for two problems. First, characters in column 85 were shifted slightly so the text was slightly crooked. Frank explained that the hammer in this column must be moving a bit too slow, hitting the chain after it had moved past its position. This explained the smoke: in the time it took the fuse to blow, the coil must have overheated and been slightly damaged. We'll look into replacing this coil next week. The second problem was that the printer's Ready light didn't go on. This turned out to be simply a bad light bulb, unrelated to the rest of the problems. In any case, the printer was working well enough for demos so the repair was a success.

Closeup of the type chain (upside down) for an IBM 1403 line printer.

Closeup of the type chain (upside down) for an IBM 1403 line printer.

I announce my latest blog posts on Twitter, so follow me at @kenshirriff for future articles. I also have an RSS feed. The Computer History Museum in Mountain View runs demonstrations of the IBM 1401 on Wednesdays and Saturdays so if you're in the area you should definitely check it out (schedule).

Notes and references

  1. You might expect that the 132 hammers align with 132 type slugs, so the matching hammers all fire at once, but that's not what happens. Instead, the hammers and type slugs are spaced slightly differently, so only one hammer is aligned at a time, and a tiny movement of the chain lines up a different hammer and type slug. (Essentially they form a vernier.) Specifically, every 11.1 microseconds, the chain moves 0.001 inches. This causes a new hammer / type slug alignment. For mechanical reasons, every third hammer lines up in sequence (1, 4, 7, ...) until the end of the line is reached; this is called a "subscan" and takes 555 microseconds. Two more subscans give each hammer in the line an option to fire, forming a print scan of 1.665 milliseconds. If you want more information on how the print chain works, I have an animation here

  2. To be precise, the printer generates a pulse if hammer 1, 2, or 3 lines up with a type slug. This is due to the three "subscans", each using every third hammer. 

  3. I'll explain how the differential amplifier works in this footnote, since most readers may not want this much detail. The computer uses two differential amplifier boards in series, first a WV board and then a WW board. They use similar principles, except the WV uses NPN transistors and the WW uses PNP. The differential output from the WW board is transmitted to the computer where a third differential amplifier (an NT card) converts the signal to a logic output. Each board is a differential amplifier, which takes two inputs and amplifies the difference, essentially an op amp with two outputs.4

    A differential pair circuit.

    A differential pair circuit.

    The basic differential pair circuit for a differential amplifier is shown above. (Op amps contain a similar differential pair.) The resistor at the top sets a fixed current I. If the two inputs are equal, the current will be split, with half going through each transistor and branch resistor. But if one if the inputs is slightly lower, that transistor will conduct more and most of the current will go through that branch. Thus, the difference between the inputs steers the current down one side or the other, yielding an amplified signal across the lower resistors.

    Schematic of the WW amplifier board from the SMS documentation.

    Schematic of the WW amplifier board from the SMS documentation.

    The IBM 1401 documentation provides the schematic above for the board, but it's hard to follow what's happening. (Note the unusual transistor symbol, three boxes with an emitter arrow in or out.) I redrew the main part of the circuit below, so it resembles the simple differential pair. It has the same resistors at top and bottom as the differential pair, but there is an R-C circuit in each branch. To simplify, if there is a DC offset or low-frequency input, the capacitor will charge and counteract this offset. Thus, the amplifier operates as a high-pass amplifier; it cuts out low-frequency noise while amplifying the 1800 Hz sync pulses. The diodes clip the output, yielding a square wave. The differential output goes through emitter-follower buffers (omitted below) so the signal is strong enough to be transmitted through an under-floor cable from the printer to the computer.

    The differential amplifier circuit of the WW card.

    The differential amplifier circuit of the WW card.

     

  4. An op amp with positive and negative outputs is known as a "fully differential op amp". 

  5. The IBM 1403 printer has multiple error checks to avoid printing incorrect data. For a business machine, it would be bad to drop digits in, say, payroll checks or tax records. To detect hammer failures, the printer has 132 wires from the hammers back to the computer, to verify that each hammer fired when it was supposed to. If the computer doesn't get a pulse back from a hammer, the computer stops immediately, as we saw. 

  6. We noticed that there was solder smeared across the broken part of the trace. My suspicion is that the same problem happened a few years ago and was repaired by bridging the broken trace with solder. Eventually the heavy vibrations inside the printer caused a hairline crack in the solder, causing the problem to recur. By bridging the break with wire rather than just solder, we hope we have fixed the problem permanently. We also noticed the transistor connected to the broken trace had been replaced, so they must have tried that first in the previous repair. 

  7. The 1403 printer is documented in IBM 1403 Printer Component Description and 1403 Printers Field Engineering Maintenance Manual. See also this brief article about the 1403 printer in the IEEE Spectrum. For details on how the timing pulses work, see the 1403 Manual of Instruction, page 42. 

Bad relay: Fixing the card reader for a vintage IBM 1401 mainframe

As soon as we finished repairing a printer failure at the Computer History Museum, Murphy's law struck and the card reader started malfunctioning. The printer and card reader were attached to an IBM 1401, a business computer that was announced in 1959 and went on to become the best-selling computer of the mid-1960s. In that era, data records were punched onto 80-column punch cards and then loaded into the computer by the card reader, which read cards at the remarkable speed of 13 cards per second. This blog post describes how we debugged the card reader problem, eventually tracking down and replacing a faulty electromechanical relay inside the card reader.

The IBM 1402 card reader at the Computer History Museum. Cards are loaded into the hopper on the right. The front door of the card reader is open, revealing the relays and other circuitry.

The IBM 1402 card reader at the Computer History Museum. Cards are loaded into the hopper on the right. The front door of the card reader is open, revealing the relays and other circuitry.

The card reader malfunction started happening every time the "Non-Process Run-Out" mechanism (NPRO) was used. During normal use, if the card reader stopped in the middle of processing, unread cards could remain inside the reader. To remove them, the operator would press the "Non-Process Run-Out" switch, which would run the remaining cards through the reader without processing them. Normally, the "Reader Stop" light (below) would illuminate after performing an NPRO. The problem was the "Reader Check" light also came on, indicating an error in the card reader. Since there was no actual error and the light could be cleared simply by pressing the "Check Reset" button, this problem wasn't serious, but we still wanted to fix it.

Control panel of the 1402 card reader showing the "Reader Check" error. The Non-Process Run-Out switch is used to run cards out of the card reader without processing them.

Control panel of the 1402 card reader showing the "Reader Check" error. The Non-Process Run-Out switch is used to run cards out of the card reader without processing them.

To track down the problem, the 1401 restoration team started by probing the card reader error circuitry inside the computer with an oscilloscope. (The card reader itself is essentially electromechanical; the logic circuitry is all inside the computer.) The photo below shows the 1401 computer with a swing-out gate opened to access the circuitry. Finding a circuit inside the IBM 1401 is made possible by the binders full of documentation showing the location and wiring of all the computers circuitry. This documentation was computer generated (originally by a vacuum tube IBM 704 or 705 mainframe), so the diagrams were called Automated Logic Diagrams (ALD).

The IBM 1401 with gate 01B4 opened. The yellow wire-wrapped wiring connects the circuit boards that are plugged into the gate. The computer's console is visible on the front of the computer.

The IBM 1401 with gate 01B4 opened. The yellow wire-wrapped wiring connects the circuit boards that are plugged into the gate. The computer's console is visible on the front of the computer.

The reader check error condition is stored in a latch circuit; the latch is set when an error signal comes in, and cleared when you press the reset button. To find this circuit we turned to the ALD page that documented the card reader's error checking circuitry. The diagram below is a small part of this ALD page, showing the read check latch of interest: "READ CHK LAT". Each basic circuit (such as a logic gate) is drawn on the ALD as a box, with lines showing how they are connected. Inside the box, cryptic text indicates what the circuit does and where it is inside the computer. For example, the latch (lower two boxes) is constructed from a circuit card of type CQZV (an inverter) and a CHWW card (a NAND gate). The text inside the box also specifies the location of each card (slots D21 and D24 in gate 01B4), allowing us to find the cards in the computer. Note that the RD REL CHK signal comes from ALD page 56.70.21.2; this will be important later.

The read check latch circuit, excerpted from the ALD 36.14.11.2.

The read check latch circuit, excerpted from the ALD 36.14.11.2.

The schematic below shows the latch redrawn with modern symbols. If the RESET line goes low, it will force the inverted output high. Otherwise, the output will cycle around through the gates, latching the value. If any OR input is high (indicating an error), it will force the latch low. Note the use of wired-OR—instead of using an OR gate, signals are simply wired together so if any signal is high it will pull the line high. Because transistors were expensive when the IBM 1401 was built, IBM used tricks like wired-OR to reduce the transistor count. Unfortunately, the wired-OR made it harder to determine which error input was triggering the latch because the signals were all tied together.

The read check latch circuit redrawn with modern symbols.

The read check latch circuit redrawn with modern symbols.

Once we located the circuit cards for the latch, we used an oscilloscope to verify that the latch itself was operating properly. Next, we needed to determine why it was receiving an error input. After disconnecting wires to get around the wired-OR, we found that the error was not coming from the Read/punch error or the Feed error signal. The RD REL CHK was the obvious suspect; this signal was part of the optional Read Punch Release feature1. However, the team insisted that Read Punch Release wasn't installed in our 1401. The source of this signal was ALD 56.70.21.2 and
our documentation didn't include ALD section 56, confirming that this feature wasn't present in our system.

Additional oscilloscope tracing showed a lot of noise on some of the signals from the card reader. This wasn't unexpected since the card reader is built from electromechanical parts: relays, cam switches, brushes, motors, solenoids and other components that generate noise and voltage spikes. I considered the possibility that a noise spike was triggering the latch, but the noise wasn't reaching that circuit.

The plug charts show the type of card in each position in the computer, and the function assigned to it. This is part of the plug chart for gate 01B4.

The plug charts show the type of card in each position in the computer, and the function assigned to it. This is part of the plug chart for gate 01B4.

At this point, I was at a dead end, so I took another look at the RD REL CHK signal to see if maybe it did exist. The 1401's documentation includes "plug charts," diagrams that show what circuit card is plugged into each position in the computer. I looked at the plug chart for the card reader circuitry in gate 01B4 (swing-out gate, not logic gate). The plug chart (above) showed cards assigned to the mysterious ALD 56.70.21.2, such as the cards in slots A15-A17 and B15. (The plug chart also had numerous pencil updates, crossing out cards and adding new ones, which didn't give me a lot of confidence in its accuracy.) I looked inside the computer and found that these cards, the Read Punch Release cards generating RD REL CHK, were indeed installed in the computer. So somehow our computer did have this feature.

The gate in the 1401 holding the card reader circuitry. Note the cards in positions A15 and B15.

The gate in the 1401 holding the card reader circuitry. Note the cards in positions A15 and B15.

The problem was that even though these cards were present in the system, we didn't have the ALDs that included them, leaving us in the dark for debugging. I checked the second 1401 at the Computer History Museum; although it too had the cards for Read Punch Release, its documentation binders also mysteriously lacked the section 56 ALDs. Fortunately, back in 2006, the Australian Computer Museum Society sent us scans of the ALDs for their 1401 computer. I took a look and found that the Australian scans included the mysterious section 56. There was no guarantee that their 1401 had the same wiring as ours (since the design changed over time), but this was all I had to track down RD REL CHK.

Simplified excerpt of ALD 56.70.21.2 from an Australian 1401 computer.

Simplified excerpt of ALD 56.70.21.2 from an Australian 1401 computer.

According to the Australian ALD above, RD REL CHK was generated by the CGVV card in slot A17 (upper right box above). The oscilloscope trace below confirmed that this card was generating the RD REL CHK signal (yellow), but its input (RD BR IMP CB) (cyan) looked bad. Notice that the yellow line jumps up suddenly (as you'd expect from a logic signal), but the cyan line takes a long time to drop from the high level to the low level. Perhaps a weak transistor in a circuit was pulling the signal down slowly, or some other component had failed.

Oscilloscope trace showing the "circuit breaker" signal from the card reader (cyan) and the READ REL CHK error signal (yellow).

Oscilloscope trace showing the "circuit breaker" signal from the card reader (cyan) and the READ REL CHK error signal (yellow).

We looked into the RD BR IMP CB signal, short for "ReaD BRush IMPulse Circuit Breakers". In IBM terminology, a "circuit breaker" is a cam-operated switch, not a modern circuit breaker that trips when overloaded. The read brush circuit breakers generate timing pulses when the read brushes that detect holes in the punch card are aligned with a row of holes, telling the computer to read the hole pattern.

The NGXX integrator card contains resistor-capacitor filters. Unlike most cards, this one doesn't have any transistors. Photo courtesy of Randall Neff.

The NGXX integrator card contains resistor-capacitor filters. Unlike most cards, this one doesn't have any transistors. Photo courtesy of Randall Neff.

We looked up yet another ALD page to find the source of the strangely slow RD BR IMP CB signal. That signal originated in the card reader and then passed through an NGXX integrator card (above). Earlier I mentioned that the signals from the card reader were full of noise. This isn't a big problem inside the card reader since brief noise spikes won't affect relays. But once signals reach the computer, the noise must be eliminated. This is done in the 1401 by putting the signal through a resistor-capacitor low-pass filter, which IBM calls an "integrator". That card eliminates noise by making the signal change very slowly. In other words, although the signal on the oscilloscope looked strange, it was the expected behavior and not a problem. But why was there any signal there at all?

Part of IBM 1402 card reader schematic showing the cams (circles) that generate the CB read pulses and the relay that blocks the pulses during NPRO.

Part of IBM 1402 card reader schematic showing the cams (circles) that generate the CB read pulses and the relay that blocks the pulses during NPRO.

After some discussion, the team hypothesized that the pulses on RD BR IMP CB shouldn't be getting to the 1401 at all doing a Non-Process Run-Out, since cards aren't being read. The schematic4 for the card reader (above) shows the complex arrangement of cams and microswitches that generates the pulses. During an NPRO, relay #4 will be energized, opening the "READ STOP 4-4" relay contacts. This will stop the BRUSH IMP CB pulses from reaching the 1401.53 In other words, relay #4 should have blocked the pulses that we were seeing.

Frank King replacing a bad relay in the 1402 card reader. The relays are next to his right shoulder.

Frank King replacing a bad relay in the 1402 card reader. The relays are next to his right shoulder.

The card reader contains rows of relays; the reader's hardware is a generation older than the 1401 and it implements its basic control functions with relays rather than logic gates. Frank pulled out relay #4 and inspected it.6 The relay (below) has 6 sets of contacts, activated by an electromagnet coil (yellow) and held in position by a second coil. Springs help move the contacts to the correct positions. One of the springs appeared to be weak, preventing the relay from functioning properly.7 Frank put in a replacement relay and found that the card reader now performed Non-Process Run-Outs without any errors. We loaded a program from cards just to make sure the card reader still performed its main task, and that worked too. We had fixed the problem, just in time for lunch.

The faulty relay from the IBM 1402 card reader.

The faulty relay from the IBM 1402 card reader.

Conclusions

It is still a mystery why section 56 of the ALDs was missing from our documentation. As for the presence of the Read Punch Release feature on our 1401, that feature turns out to be standard on 1401 systems like the ones at the museum.8 I think the belief that our 1401 didn't include this feature resulted from confusion with the Punch Feed Read feature, which we don't have. (That feature allowed a card to be read and then punched with additional data as it passed through the card reader.)

The team that fixed this problem included Frank King, Alexey Toptygin, Ron Williams and Bill Flora. My previous blog post about fixing the 1402 card reader is here, tracking down an elusive problem with a misaligned cam.

I announce my latest blog posts on Twitter, so follow me at @kenshirriff for future articles. I also have an RSS feed. The Computer History Museum in Mountain View runs demonstrations of the IBM 1401 on Wednesdays and Saturdays so if you're in the area you should definitely check it out (schedule).

Notes and references

  1. Read Punch Release was a feature to allow the CPU to operate while reading a card. IBM's large 7000 series mainframes used "data channels," which were high-performance I/O connections using DMA and controlled by separate I/O processors. With a data channel, the CPU could process data while the channel performed I/O. But the 1401 was a much simpler machine, designed to replace electromechanical accounting machines that would read a card, process the card, and print out results. On the 1401, when the CPU executed an instruction to read a card, the CPU would wait while the card moved through the card reader and passed under the brushes to be read. The mechanical cycle to read a card took 75 ms (corresponding to 800 cards per minute), of which only 10 ms was available for the CPU to perform computation and the rest was wasted (from the CPU's perspective). The Read Punch Release feature was a workaround for this. The programmer could issue an SRF (Start Read Feed) instruction, which would cause a card to start moving through the card reader. The program had 21 ms to perform computation and execute the read instruction before the card reached the reading station. (If the program executed the read instruction too late, the computer wouldn't be able to read the card and would halt with an error.) This provided extra computation time in each card read cycle. IBM charged a monthly fee for additional features; Read Punch Release was relatively inexpensive at $25 per month (equivalent to about $200 today). 

  2. The Read Punch Release feature also provided a similar instruction for punching a card, allowing an extra 37 ms of computation while punching a card. See page 16 of the IBM 1402 Card Read-Punch Manual for details on card timing and the read punch release operation. 

  3. The card reader schematic shows that six separate cams were required to generate the RD BR IMP CB signal. The problem is that cards are read at high speed, so rows on the card are read just 3.75 ms apart. Cams and microswitches are too slow to generate pulses at this rate. To get around this, pulses for odd rows and even rows are generated separately. In addition, one set of switches closes for the start of a pulse and a second set opens for the end of a pulse. Needless to say, it is a pain to adjust all these cams so the pulses have the right timing and duration. If this timing is off, cards won't read correctly.

    To improve reliability and reduce maintenance, IBM eventually replaced these cams with a "solar cell" (i.e. a photo-cell), slotted disk, and light. The light passing through the slotted disk triggered a pulse from the photo-cell. Our ALDs had some penciled-in modifications suggesting that our 1401 was originally configured to work with a solar cell card reader and then modified to work with the older circuit breaker card reader. 

  4. The schematic for the 1402 card reader is here. The read brush impulse CB signal is generated on pdf page 8. This document also includes instructions on how to upgrade from the "circuit breaker" circuit to the "solar cell" circuit, a change that is indicated as taking 1.0 to 1.5 hours for the hardware installation, 1.5 to 2.0 hours for miscellaneous electrical changes, and 2.3 to 3.7 hours to wire up the new circuit. (See pdf pages 47-53.) 

  5. The relays involved in an NPRO operation are documented in 1402 Card Read-Punch Customer Engineering Manual of Instruction, page 4-3 or pdf page 31. 

  6. The relay is a "permissive make" relay, a type of relay that IBM designed to be twice as fast as regular relays. For a detailed discussion of IBM's relays, see Commutation and Control. The permissive make relay is discussed on page 59 (pdf page 18). 

  7. Stan Paddock on the 1401 team built a relay tester that we could use to check the bad relay. Unfortunately, the 1401 workshop at the Computer History Museum is closed due to construction so we couldn't access the tester (or the collection of spare relays in the workshop). Fortunately, we had a spare relay that wasn't in the workshop for some reason. 

  8. The IBM Sales Manual lists the various 1401 features and their prices (pdf page 50). It states the Read Punch Release feature is standard on the 1401 Model C, the "729 Tape/Card System". 

The printer that wouldn't print: Fixing an IBM 1401 mainframe from the 1960s

The Computer History Museum has two operational IBM 1401 computers used for demos, but a few weeks ago one computer suddenly couldn't print anything. I helped track down the problem, but it was more tricky than we expected; along the way we had to investigate the printer error checking circuits, the print buffer, and even low level core memory signals. This blog post discusses our investigation and how we traced the problem to a failed germanium transistor.

The IBM 1401 mainframe computer (left) at the Computer History Museum printing the Mandelbrot fractal on the 1403 printer (right).

The IBM 1401 mainframe computer (left) at the Computer History Museum printing the Mandelbrot fractal on the 1403 printer (right).

The IBM 1401 computer was announced in 1959, and went on to become the best-selling computer of the mid-1960s, with more than 10,000 systems in use. The 1401 leased for $2500 a month (about $20,000 in current dollars), a low price that let even medium-sized businesses use the 1401 for payroll, accounting, invoicing, and many other tasks. The IBM 1401 computer was constructed from small circuit boards (called SMS cards) plugged into units called "gates"—these are gates in the sense of something that swings open, not logic gates. The photo below shows the 1401 with one of the gates open, revealing dozens of brown SMS cards plugged into the gate.

The IBM 1401 computer, with one of the gates opened, showing the dozens of circuit boards (SMS cards) in each gate. 
The fan on the front of the gate keeps the cards cool.

The IBM 1401 computer, with one of the gates opened, showing the dozens of circuit boards (SMS cards) in each gate. The fan on the front of the gate keeps the cards cool.

One key selling point of the IBM 1401 was its high-speed line printer (the IBM 1403), which could hammer out 10 lines per second. (IBM claimed this was four times as fast as competing printers, but others dispute this.) The 1403 printer had excellent print quality, said to be the best printing until laser printers were introduced in the 1970s.1 IBM claims that "Even today, it remains the standard of quality for high-speed impact printing."

Closeup of the type chain (upside down) for an IBM 1403 line printer.

Closeup of the type chain (upside down) for an IBM 1403 line printer.

The 1403 printer used a chain of type slugs (above) that rotated at high speed above the paper, with an inked ribbon between the paper and the chain. Each of the 132 print columns had a hammer and an electromagnet. At the right moment, when the desired character passed the hammer, the electromagnet drove the hammer against the back of the paper, causing the paper and ribbon to hit the type slug, printing the character.2

Printing mechanism of the IBM 1401 line printer. From 1401 Reference Manual, p11.

Printing mechanism of the IBM 1401 line printer. From 1401 Reference Manual, p11.

Unfortunately, the printer at the Computer History Museum recently had a problem: whenever a line was printed, the computer would halt due to a "print check" error. Fortunately the museum has a team of volunteers to help keep the system running; people helping with this printer problem included Ron Williams, Frank King, Marc Verdiell, Carl Claunch, Michael Marineau, Robert Garner and Alexey Toptygin. By the time I arrived to help, Ron had written a simple test program that repeatedly attempted to print a line; he toggled the program into the computer by hand, and he disabled the error check. The printer printed the characters properly, so we suspected the problem was in the error reporting circuitry inside the computer. Our strategy was to find the error signal and then trace it back through the computer to determine why it was being generated.

We started by examining the latch circuit that holds the print check error condition and sends it to the rest of the computer. To find the circuit, we consulted the documentation: binders of cryptic computer-generated wiring diagrams, called Automated Logic Diagrams (ALD). A small piece of an ALD is shown below showing the print check latch (PR CHK LAT). Each box on the ALD corresponds to a circuit on an SMS board and the lines show how the boards are wired together. Deciphering the text inside the box on the right indicates a board of type 2JMX implementing a "2+AO" function, which in modern terms is AND-OR-Invert. The text in each box also indicates the location of the card: its gate (physical swing-out gate, not logic gate), gate 01A6 in this case, and the card's position in the gate (F10). Thus, to check the output (labeled H) of the latch with the oscilloscope, we swung out gate 01A6, found card F10, and hooked the oscilloscope to pin H. We found pin H went low (error) when pins F and G went high, which was the proper behavior for the latch. Pin G (PR CK SAMPLE) was essentially a clock to sample the error state, while pin F was the error signal itself. Our next task was to determine what was triggering the error signal on pin F.

Excerpt of an Automated Logic Diagram (ALD) for the IBM 1401, showing the print check latch (PRT CHK LAT). This page is denoted 36.37.21.2.

Excerpt of an Automated Logic Diagram (ALD) for the IBM 1401, showing the print check latch (PRT CHK LAT). This page is denoted 36.37.21.2.

The documentation also includes logic diagrams that show the circuitry at a logical level, which is slightly easier to understand than the physical connections on the ALD diagrams. The logic diagram below shows the printer error circuitry. At the right, the print check error signal (PRT CHK ERROR) comes out of the latch (PR CHK LAT) that holds the error signal. (This is the same latch as in the ALD diagram above, and you can match up the signal names.) To the left of the latch, several different error conditions are detected and combined to form the error signal fed into the latch. (Note that IBM's logic symbols didn't match standard symbols. The semicircle is an OR gate, not an AND gate. The triangle is an AND gate. An "I" in a box is an inverter.)

Logic diagram of the error checking logic for the IBM 1401/1403. From Instructional Logic Diagrams page 77 "Print Buffer Controls".

Logic diagram of the error checking logic for the IBM 1401/1403. From Instructional Logic Diagrams page 77 "Print Buffer Controls".

Several different conditions can trigger a print check error3 and we thought the "hammer fire" check was a likely candidate. Recall that the printer uses 132 hammers, one per column, to print a line of characters. To make sure the hammers are operating correctly, the computer has two special planes in core memory. (The 1401 contains 4,000 characters of core memory4; each bit of memory is a tiny ferrite ring that is magnetized one way to store a "1" and the other way for a "0". A grid of 4000 cores forms a plane, storing a 1-bit slice of memory. Multiple planes are stacked up to form the storage unit.) Each time the computer decides to fire a hammer, it records this in core memory in the "equal check" plane. When a hammer actually fires, the current pulse from the electromagnet stores a bit in the "hammer-fire" plane.5 Each print scan cycle, the computer compares the two core planes to see if a hammer was fired when it wasn't supposed to, or if a hammer failed to fire when it should have; a mismatch triggers the "hammer fire" check error.

Closeup of the hammer electromagnets in the IBM 1403 printer. An electromagnet (when energized through its pair of wires) pulls a metal armature, which drives the hammer, paper and ribbon against the type slug. There are 132 hammers, one for each column, arranged in two rows of 66.

Closeup of the hammer electromagnets in the IBM 1403 printer. An electromagnet (when energized through its pair of wires) pulls a metal armature, which drives the hammer, paper and ribbon against the type slug. There are 132 hammers, one for each column, arranged in two rows of 66.

After some difficulty6, we determined that the problem wasn't the hammer fire check, but a different check: "print line complete" (PLC). This check ensures that for each line, either exactly one character was printed in each column or the column was blank. This check uses a third special core plane, the "print line complete" plane. Each time a character is printed in a column, the corresponding bit is set. (For a blank or unprintable character, a separate circuit sets the column's bit.) At the end of the line (during scan 49), the print line complete cores are checked; if any core is zero, the printer failed to print that column and an error is reported. (You can see the PLC CHECK signal and the logic that generates it on the earlier logic diagram.)

Oscilloscope probing (below) showed that the PLC CHECK (yellow) was triggered because the system thought a second character was being printed in the same column. The cyan signal is the (inverted) PLC bit from core (PR LINE COMP LATCH); each low pulse indicates a character has been printed in that column. The pink pulse (PRINT COMPARE) indicates a new character is being printed. The problem is that the cyan and pink signals go low at the same time, indicating both an existing character and a new character in the column. This generates the extra blue pulse (PLC CHECK), which triggers the yellow pulse (PRINT CHK ERROR from the latch). (This circuit can be seen in the earlier logic diagram, labeled "Trying to print position twice".)

Oscilloscope trace from debugging the IBM 1401's printer.

Oscilloscope trace from debugging the IBM 1401's printer.

Several things could cause the system to think two characters were being printed in the column. Looking at the printer's output we saw that it printed just the expected character on the paper, so the circuit to print a character seemed to be working correctly (PRINT COMPARE, the single pink pulse above), We tested the blank / unprintable circuit and it was detecting blank and non-blank columns correctly. So the most likely problem was reading a 1 from core memory (the cyan line above, PR LINE COMP LATCH) when it should be a 0. But was the problem the wrong value going in to core, or the wrong value coming out?

The logic diagram below shows the circuit that writes to the Print Line Compare core memory. At the right, PR LINE COMP INH is the (inverted) signal written to core.8 On scan 49 (the error-checking print cycle after printing all 48 characters), this line is set high, clearing the memory. If a character is being printed, the PRINT COMPARE EQUAL signal will set the core. At the left, logic gates detect a blank or unprintable character. And if a 1 bit was already in core (PR LINE COMP LATCH), the 1 bit is rewritten to core.

Logic diagram of the print line complete logic for the IBM 1401/1403. From Instructional Logic Diagrams page 77 "Print Buffer Controls".

Logic diagram of the print line complete logic for the IBM 1401/1403. From Instructional Logic Diagrams page 77 "Print Buffer Controls".

We detected that this circuit was writing erroneous 1 bits to core because it was reading erroneous 1 bits from core. But that put us in a circle, not knowing if the initial problem was the read or the write. To resolve this, we triggered the oscilloscope on print scan 49, which is when the PLC bits get cleared, and then looked at the next print scan, which reads the cleared bits back. We saw 0's being written (i.e. PR LINE COMP INH high), but unexpectedly saw 1's coming back (PR LINE COMP LATCH). So we knew something was going wrong at a low level in the core memory.

I should mention that in the base 1401 system, the printer check bits were stored in the main core memory module, but our system used a separate "print storage" core memory for improved performance. The performance issue is due to how the printer uses core memory: each time a hammer lines up with a type slug, the computer reads the corresponding character from core memory and fires the hammer if the character in storage matches the character under the hammer. Since core memory is constantly in use while printing a line, the computer can't do any computation while printing. The solution was the print storage feature: an additional 132-address core memory that functioned as a print buffer.7 With print storage, a line to be printed was first rapidly copied from the main core memory to the print storage core memory. Then the computer could continue doing computation using the main core memory while the print circuitry read from the print storage core memory. Each option on the IBM 1401 had a monthly charge; IBM charged an extra $386 a month for the print storage feature.

This print storage gate has the circuitry to drive the printer buffer core memory. The core memory unit in the upper right has bundles of yellow wires attached.

This print storage gate has the circuitry to drive the printer buffer core memory. The core memory unit in the upper right has bundles of yellow wires attached.

The photo above shows the gate that implements the print storage feature. The core memory module is the block on the upper right with yellow wires attached. (Individual cores can be seen in the photo below.) Core memory requires a lot of supporting circuitry. To select an address, driver cards generate X and Y signals. To write a core, the inhibit signal is combined with the clock by a gate, and then a driver card amplifies the signal and sends it through the inhibit line that passes through all the cores in the plane.8 When a core is read, it induces a pulse on a sense wire. This pulse is amplified by a sense amplifier card, and then the bit is stored in a latch. The numerous SMS cards in the print storage gate provided these support functions.

The cores inside the print buffer. The wiring is not the usual core memory grid because each printer hammer is wired directly to a hammer check core. The image quality is bad because of the plastic cover over the cores.

The cores inside the print buffer. The wiring is not the usual core memory grid because each printer hammer is wired directly to a hammer check core. The image quality is bad because of the plastic cover over the cores.

We probed the sense amplifier and latch cards on the reading side of the core memory and they seemed to be operating correctly, so we moved to the writing side. The HN inhibit driver card seemed a candidate for failure since it operates at high current, but we swapped the card with a replacement and the printer still failed. Next, I tried looking at the input to that card, but found there was no signal on that line, which seemed very suspicious.

Oscilloscope of the bad "CHWW" NAND gate card: pink (3) and blue (4) are inputs, cyan (2) is the output, stuck high.

Oscilloscope of the bad "CHWW" NAND gate card: pink (3) and blue (4) are inputs, cyan (2) is the output, stuck high.

The missing signal was generated by a card of type CHWW, a NAND gate that combines the inhibit signal with the clock before sending it to the driver card. I hooked up the oscilloscope to the inputs and output of the NAND gate, yielding the trace above. This trace was the smoking gun: the output (cyan 2) remained high even when the two inputs (pink 3 and blue 4) went high. This showed that the NAND gate had failed and its output was stuck high. This explained everything: with this output stuck high, only 1's would be written to the PLC core plane. Then, when a character was printed, the print circuitry would read the 1 from core, think a character had already been printed in this column, the PLC check would fail, and the print check error would be triggered.

The printer successfully operating, printing out powers of 2.

The printer successfully operating, printing out powers of 2.

We swapped this card with a spare, and the printer started printing without any errors (above). This proved that we had finally traced the problem; it was a simple NAND gate in the depths of the printer buffer core memory circuit. The failed card is shown below. It implements three NAND gates (details) using diode-transistor logic (which IBM calls CDTL—Complemented Transistor Diode Logic). Each two-input gate uses one germanium transistor (circular metal can) and two diodes (striped glass components on the right). Pull up resistors (striped) and inductors (beige) on the left complete the circuits.

The failed CHWW card from the IBM 1401. This card implements three NAND gates. The lower left transistor failed, and has been replaced.

The failed CHWW card from the IBM 1401. This card implements three NAND gates. The lower left transistor failed, and has been replaced.

I tested the card with a signal generator and found that while two of the three NAND gates worked, the other was stuck at a high output, confirming what we saw inside the 1401. Next I tested the transistors using the diode test mode on a multimeter. The good transistors had voltage drops of 0.23V. (This may seem low, but remember that these are germanium transistors not silicon transistors.) In comparison, the bad transistor had a Vbe drop of 0.95V, much higher. Finally, we removed the transistors and checked them on a vintage Tektronix 577 curve tracer. We thought the bad transistor might just be too weak to operate the gate, but it was entirely dead—totally flatlined on the curve tracer.

We opened up the transistor on a lathe and looked inside. The transistor is an IBM 083 NPN germanium alloy transistor (germanium was used before silicon transistors). The transistor consists of a tiny germanium die (the shiny metallic square below), forming the base. Two wires are attached for the emitter and collector, connected to dots of tin alloy, a larger dot on the front for the collector and a smaller dot on the back for the emitter. Under the microscope, it looked like there was some corrosion on the alloy dots and the emitter wire didn't look solidly connected, so we suspect that is the root cause of the failure.

Inside a failed IBM 083 germanium transistor. The silver-colored square in the middle is the germanium die, wired to the base pin. The dot in the middle is tin alloy, forming the collector, with a wire to the collector pin on the left. A smaller dot on the other side of the germanium die forms the emitter, wired to the pin on the right.

Inside a failed IBM 083 germanium transistor. The silver-colored square in the middle is the germanium die, wired to the base pin. The dot in the middle is tin alloy, forming the collector, with a wire to the collector pin on the left. A smaller dot on the other side of the germanium die forms the emitter, wired to the pin on the right.

Conclusions

This was a harder problem to diagnose than most of the IBM 1401 issues. But we managed to track down the problem, replace the bad card, and get the printer back in operation. One nice thing about the IBM 1401 compared to modern systems is that it's not a black box—you can look inside all the circuitry, down to the individual transistors. In this case, we were able to find the bad transistor that was causing the system failure, and even determine that it was probably corrosion that killed the transistor.

I announce my latest blog posts on Twitter, so follow me at @kenshirriff for future articles. I also have an RSS feed. The Computer History Museum in Mountain View runs demonstrations of the IBM 1401 on Wednesdays and Saturdays so if you're in the area you should definitely check it out (schedule).

Notes and references

  1. One reason for the IBM 1403's high quality printing was its use of a type chain instead of typebars or a drum. Many earlier line printers used rows of typebars or a rotating drum of characters. Any timing imprecision would change the vertical positioning of characters, yielding ugly wavy text. The 1403, on the other hand, used a horizontally rotating chain of characters so misalignment caused a hardly-noticeable change in the spacing between characters. 

  2. You might expect that the 132 hammers align with 132 type slugs, so the matching hammers all fire at once, but that's not what happens. Instead, the hammers and type slugs are spaced slightly differently, so only one hammer is aligned at a time, and a tiny movement of the chain lines up a different hammer and type slug. (Essentially they form a vernier.) Specifically, every 11.1 microseconds, the chain moves 0.001 inches. This causes a new hammer / type slug alignment. For mechanical reasons, every third hammer lines up in sequence (1, 4, 7, ...) until the end of the line is reached; this is called a "subscan" and takes 555 microseconds. Two more subscans give each hammer in the line an option to fire, forming a print scan of 1.665 milliseconds. 48 print scans give each hammer a chance to print each character, and then the 49th print scan is used for error checking. (For more details of this timing, see Manual of Instruction, page 37.)

    The mechanism of scans and subscans may seem excessively complicated. But what it accomplishes is matching up the fast "electronic world" with the slower "mechanical world." Specifically, every 11.1 microseconds, a hammer and type slug line up. The computer reads the character in that column from core, compares it to the character on the type slug, and if they match, it fires the hammer. The important thing here is that a core memory cycle matches the time between hammer alignments, making it possible to read the character from core for each hammer alignment. If you want more information on how the print chain works, I have an animation here.

    One subtlety is that a hammer takes 1.52 milliseconds to impact (Manual of Instruction, p32). Thus, it's not really the case that the hammer fires when it lines up with the type, but when it will be lined up 1.52 milliseconds in the future. 

  3. It may seem excessive that the 1401 had multiple checks to ensure that the printer was operating properly. But for a business computer, print errors could be catastrophic: imagine if a day's payroll checks had a digit printed wrong or tax forms were printed incorrectly. IBM's scientific computers had much less error checking than the business computers, on the assumption that scientists would notice problems. 

  4. The 1401 stores 4,000 characters in core memory, not 4096, because it is a decimal machine (i.e. BCD), with decimal addresses. Its memory can be expanded to 16,000 characters with a dishwasher-sized memory expansion unit; I wrote about repairing this unit here. I wrote more about the 1401's core memory here

  5. Recording each hammer fire in core memory isn't done by the computer writing to core memory. Instead, each hammer is physically wired directly to a particular core; 132 wires from the hammer electromagnets to the cores. When a hammer fires, the current pulse from the hammer's electromagnet goes through a wire wrapped through the corresponding core, magnetizing that core. (You can see these wires in the earlier picture of the cores.) 

  6. It was tricky to determine which signal was triggering the error input F, due to the 1401's use of wired-OR. Because transistors were expensive when the IBM 1401 was built, IBM used many tricks to reduce the transistor count. One trick is the wired-OR—instead of using an OR gate, signals are simply wired together so if any signal is high it will pull the line high. Thus, We couldn't simply probe the signals feeding into pin F because they were all wired together. Instead, we needed to disconnect cards so we could test one signal at a time. 

  7. The print storage core memory has 12 core planes; that is, it stores 12 bits at each location. Like a regular core location, it uses 6 bits to store each BCDIC character, as well as a bit for the word mark (metadata indicating field locations), and a parity bit. In addition, the print storage has four planes for error detection: a hammer fire sense plane (recording the hammers that fired), equal check plane (recording the hammers that should fire), print line complete plane (recording columns with a character printed), and an error check plane (indicating the column that triggered an error). 

  8. The process to write to core memory may seem backwards, using a high signal on the inhibit line to write a 0. This is due to how cores function. The key that makes cores work is that they require a high current pulse to flip the core's magnetic state; a pulse with half the current has no effect on the core. Cores are arranged in a grid, with X and Y address lines that are pulsed to select a core. Multiple planes are stacked, one for each bit. Each line is pulsed with half the necessary current, so only the core where both lines cross has enough current to flip to the 1 state. Each plane has an inhibit line that passes through all the cores in the plane. To write a 1 to a plane, the inhibit line gets no current, causing the addressed core to flip to 1 as described. To write a 0 to a plane, the inhibit line gets half current in the opposite direction. The result is that none of the cores get enough current to flip, and the addressed core remains in the 0 state. Thus, by setting each plane's inhibit line appropriately, the desired 0's and 1's can be written to the address in the core stack. 

  9. For information on how the print checks work, see Instruction Logic, page 98. The 1403 printer is documented in IBM 1403 Printer Component Description, 1403 Printers Field Engineering Maintenance Manual and 1403 Printers Field Engineering Manual of Instruction. See also this brief article about the 1403 printer in the IEEE Spectrum. For a detailed description of the IBM 1401, see IBM 1401: a modern theory of operation

Repairing the card reader for a 1960s mainframe: cams, relays and a clutch

I recently helped repair the card reader for the Computer History Museum's vintage IBM 1401 mainframe. In the process, I learned a lot about the archaic but interesting electromechanical systems used in the card reader. Most of the card reader is mechanical, with belts, gears, and clutches controlling the movement of cards through the mechanism. The reader has a small amount of logic, but instead of transistorized circuits, the logic is implemented with electromechanical relays.1 Timing signals are generated by spinning electromechanical cams that generate pulses at the proper rotation angles. This post explains how these different pieces work together, and how a subtle timing problem caused the card reader to fail.

The IBM 1402 card reader/punch. The 1401 computer is in the background (left) and a tape drive is at the right.

The IBM 1402 card reader/punch. The 1401 computer is in the background (left) and a tape drive is at the right.

The IBM 1401 was a popular business computer of the early 1960s, used for applications such as payroll or inventory. Data records were stored on 80-column punch cards, which were read into the computer by the card reader and printed on the high-speed line printer. The IBM 1401's card reader (above) could read 800 cards per minute—over 13 cards per second—which is a remarkable speed for a mechanical device. Cards whizzed through the card reader, 80 metal brushes read the holes in each column, and then the cards were stacked in the hoppers in the middle of the reader.3 If anything went wrong, from a misread card to a card jam, the card reader would slam to a halt (hopefully before bent cards started piling up) and an error light would illuminate on the card reader's control panel. At this point, the operator would fix the problem and processing could continue.2

80-column punch cards. Each column of the card holds a character, represented by the holes punched in the column. THE text at the top of the card shows what is in each column. Black paper behind the first card makes the holes more visible.

80-column punch cards. Each column of the card holds a character, represented by the holes punched in the column. THE text at the top of the card shows what is in each column. Black paper behind the first card makes the holes more visible.

The photo below shows the IBM card reader/punch with the front doors opened up. The right half is the reader and the left half is the punch. (The punch records data on cards; I will ignore it in this article.) The blue panel has the operator controls and indicators. Cards enter through the feeder in the upper right, travel right-to-left through the brushes (behind the blue panel) and fall into the stacker slots (center). To the right of the stacker is the service panel with the dynamic timer (circle with knob); this will be relevant later. The relays, resistors, capacitors and diodes that make up the reader's circuitry are below the service panel. Converting the hole pattern to characters was done by the 1401 computer, not the reader, so this circuitry is fairly basic.3 The garbage can on the lower left collects "chips", the bits punched out of cards by the punch, and triggers a warning light when it is full.

With the front doors of the card reader/punch removed, the circuitry is visible. The left half is the punch and the right half is the reader. The *Reader Stop* light is illuminated on the panel, indicating a malfunction.

With the front doors of the card reader/punch removed, the circuitry is visible. The left half is the punch and the right half is the reader. The *Reader Stop* light is illuminated on the panel, indicating a malfunction.

A few months ago a problem with the card reader kept showing up under certain circumstances: the card reader would halt with a "Reader Stop" error, indicating a problem with card feeding and travel.2 Given the age of the card reader at the Computer History Museum, it's not surprising that Reader Stop problems arise. However, in this case, cards were feeding fine and there was nothing visibly wrong. Eventually the 1401 maintenance team determined that the problem occurred if you do multiple cycles of reading a card, printing a line on the printer, reading a card, printing a line, and so on. The first three cards read fine, but the fourth card read always failed with a Reader Stop. But during normal processing, hundreds of cards could be read without a problem. What was different about repeatedly reading and printing?

The card reader can read 800 cards per minute, but the line printer can only print 600 lines per minute. Thus, if you continuously read and print, the printer will eventually fall behind. The consequence is that the card reader will occasionally be blocked while the printer finishes a line. In particular, the system can read and print three cards in a row, but there is a delay before reading the fourth card.4 This is normal behavior for the system, but for some reason, this delay was triggering the Reader Stop error.

The question at this point of the investigation was why the delayed read triggered a Reader Stop. The logic diagram below shows the many different things can cause a Reader Stop.5 (This logic was implemented in the reader by relay circuits of Byzantine complexity.) Usually a card jam or card feed problem is responsible for a stop, but there weren't any card movement problem. And the card levers (microswitches) that detect card movement were operating correctly. Eventually, the team discovered that removing relay R-10 (Read Clutch Impulse near the bottom of the diagram) stopped the Reader Check from happening. Removing this relay prevented the clutch status from being checked8 so it prevented the error from being reported but didn't fix the underlying problems. It did, however, indicate that the problem was with the clutch.

Diagram showing the logic conditions that yield a Reader Stop error in the card reader. From the manual.

Diagram showing the logic conditions that yield a Reader Stop error in the card reader. From the manual.

At this point, I'll explain how the clutch works in the card reader. The card reader is powered by a 1/4 horsepower electric motor. This motor powers a variety of shafts, belts, rollers and cams inside the card reader. However, the reader needs a mechanism to rapidly start and stop card reading under computer control. This is accomplished by a clutch that powers the card feed rollers. With the clutch engaged, cards will be fed through the card reader. With the clutch disengaged, cards will not move (although other parts of the reader will keep running). The GIF below shows the clutch in action: it is disengaged for one cycle and engaged for one cycle. Although the silver pulley (driven by the motor) turns continuously, the black gear is started and stopped by the clutch, which is just behind the black gear.

The clutch in the 1402 card reader. The clutch is disengaged for one cycle and engaged for one cycle.

The clutch in the 1402 card reader. The clutch is disengaged for one cycle and engaged for one cycle.

The diagram below shows the main parts of the clutch. The ratchet (green) continuously rotates, powered by the motor. When the coil (orange) pulls the latch (purple), the dog (blue) will fall into the ratchet (green), engaging the clutch. This will cause the outer wheel to rotate in sync with the ratchet, feeding a card. After one cycle, the clutch can be disengaged by the latch, or can remain clutched for another cycle. Above, you can see the latch move just before the clutch engages. The dogs and ratchet are hidden behind the black gear. For more details, see the footnotes.67

The card reader's clutch mechanism. From the 1402 manual.

The card reader's clutch mechanism. From the 1402 manual.

The clutch requires careful timing. If the coil pulls the latch too late, the dog won't engage with the ratchet until the next cycle and no card will be fed. And if the latch is released too late, the clutch won't disengage until the next cycle, feeding an extra card. Either way, this would cause a serious problem if you are, for instance, printing payroll checks. To catch this, the card reader includes a clutch check circuit that ensures that a clutch happens only when requested.8 This circuit is what triggered the Reader Stop issue.

Timing in the card reader is controlled by the angular position of the main drive shaft; there isn't a clock signal. A 360° rotation of the drive shaft corresponds to one card read cycle.9 The mechanical and electrical components must be precisely adjusted so they activate at the correct rotational angles. For instance, a card's first row of holes reaches the brushes at 8° and card lever #1 is triggered at 227°. And the clutch is engaged or disengaged at exactly 315°. Electrical signals are generated by rotating cams that open and close microswitches at the appropriate degree angles. For example the read clutch is controlled by cam RC6 that closes at 272° and opens at 342°. The photo below shows a cam on a drive shaft, with the microswitch above it; when the cam rotates to a high point, the switch closes.

One of the cams and microswitches in the card reader.

One of the cams and microswitches in the card reader.

To examine timings, the card reader has an interesting maintenance feature called the Dynamic Timer, which is sort of a mechanical oscilloscope that visually shows the angle timing of signals. The dynamic timer consists of a neon bulb that spins behind a plastic disk labeled with the angles from 0° to 359°. If a signal is selected with a control panel switch, the bulb lights up when the signal is active.1 Because the neon bulb is spun by the main driveshaft, its position is synchronized to the rotation angle of the card reader's mechanisms. Thus, the neon bulb traces out sectors of a circle, indicating when the signal is active. For example, the image below shows the read brush signal, which is activated as each card row passes under the brushes. Since there are 12 rows on a punch card, the signal is activated 12 times in a card cycle. By reading the angle labels on the clear plastic dial, the signal timings can be checked for correctness. In addition, the knob in the middle of the dynamic timer can be rotated manually to move the card reader's cycle to any desired angle.

The dynamic timer of the 1402 card reader. The 12 signals correspond to the 12 rows on the card, indicating when the row passes through the brushes.

The dynamic timer of the 1402 card reader. The 12 signals correspond to the 12 rows on the card, indicating when the row passes through the brushes.

We measured the timing of read clutch control cam RC6 and found it was off by about 5 degrees. The manual called the timing of this cam "critical" saying it needed to be within 2° earlier or 0° later from the correct angle.10 Since the cam was well outside the allowed range, we tried to adjust it. The first problem was that the cam was frozen to the shaft; apparently after decades of storage in a a damp garage it had rusted to the shaft. Carl eventually hit the cam hard enough (yet carefully!) with a hammer to loosen it from the shaft so it could be adjusted.

The photo below shows how we manually adjusted cam timings. The dynamic timing wheel (left) can be turned by hand until the cam closes, and then the angle can be read from the dial. The adjustment process is rather tedious, requiring trial and error. First, the duration that contact is closed is corrected by moving the contact closer or farther from the cam. After each adjustment, the duration must be manually checked by slowly rotating the dynamic timing wheel 360°. Once the duration is correct, the next step is to rotate the cam a few degrees at a time until it closes and opens at the right angle. Again, the timing must be carefully checked after each adjustment until trial-and-error yields the right cam setting.

By manually rotating the dynamic timer wheel, the cam timing (center) can be checked. The light (upper right) shows when the cam has closed. This photo shows cam RL10 being adjusted.

By manually rotating the dynamic timer wheel, the cam timing (center) can be checked. The light (upper right) shows when the cam has closed. This photo shows cam RL10 being adjusted.

Once cam RC6 had the proper timing, we tested the card reader. Surprisingly, it failed just as before. We checked the timing of RC6 with the dynamic timer, and strangely, the timer still showed that it closed 5° too late, even though we had measured it as closing correctly. After puzzling over the schematics, I found that cam RC6 was powered through cam RL10, so a bad timing on cam RL10 would cause the dynamic timer to show the wrong time for RC6.

We continued the next week and measurement of RL10 showed it was closing 5° too late. So we repeated the cam adjustment process with RL10. Carl loosened RL10's setscrews, whacked it with a hammer, and was surprised when it spun around on the shaft—apparently it hadn't rusted onto the shaft the way RC6 had. Since the timing of RL10 was now thoroughly off, it took some extra effort to find the right position for it. One complication is that RL cams only rotate while the clutch is tripped, so we had to manually trip the clutch for each test. (This video shows how the clutch can be tripped manually.) After a couple hours of adjustment, the cam had the right duration and then the right start angle.

We powered up the card reader and this time it ran all the tests successfully! We had found the elusive problem. Apparently cam RL10 was open a bit too much, so the signal to read a card after a pause in reading wasn't quite able to engage the clutch. Then the clutch check circuitry would detect that the clutch had failed to engage, triggering the Read Stop.

Conclusion

Tracking down and fixing the mysterious Reader Stop errors on the card reader was tricky and more time-consuming than most problems. But it provided an interesting look back in time at the obsolete technology used inside the card reader: relay logic, spinning cams and a mechanical clutch. Debugging the card reader is a different experience from tracking down software problems or even normal hardware problems. Just understanding the schematics with their obsolete symbols can be a challenge. Hopefully what we learned in this repair will help us the next time the card reader malfunctions.

Many members of the 1401 restoration team were involved in fixing this problem, including Carl, Frank, Alexy, Ron, George, Glenn, Jim and Grant. The Computer History Museum in Mountain View runs demonstrations of the IBM 1401 on Wednesdays and Saturdays so check it out if you're in the area; the demo schedule is here.

Follow me on Twitter or RSS to find out about my latest blog posts.

Notes and references

  1. While IBM's computers were supposed to be transistorized by the time of the 1401, two vacuum tubes are hidden deep inside the card reader. These tubes power the neon bulbs for the dynamic timer used for diagnostics. The photo below shows the tubes and the transformer that powers them.

    The 1402 card reader contains two vacuum tubes to drive the neon bulbs in the dynamic timer.

    The 1402 card reader contains two vacuum tubes to drive the neon bulbs in the dynamic timer.

  2. The 1402 card reader reports three types of errors. A Reader Stop indicates a jam or card feed problem. (This is the error we encountered.) A Validity error indicates an invalid character caused by a bad hole pattern. A Reader Check indicates that the card was read incorrectly. An interesting technique was used to detect a bad read. The first set of brushes was used to count the number of holes in each column. The second set of brushes performed the actual read of the characters. If the number of holes in a column was different between the first and second reads, the system knew that the card had been read incorrectly. (This was typically caused by a brush that was slightly out of alignment or making bad contact.) You might wonder why the holes were counted, instead of just checking the characters. Counting holes required fewer bits of expensive core memory. 

  3. The card reader itself didn't convert the hole pattern into characters. Instead, there were 80 parallel wires directly from the 80 hole-sensing brushes to the computer, which did the hole-to-character conversion. The reader had one set of brushes to count holes (for the error check) and a second set to actually read the characters. The punch had a set of brushes to check the punched holes and optionally a set of brushes to read before punching. Thus, there could be up to four sets of brushes in the reader/punch. In addition, the punch had 80 wires from the computer to the 80 hole punch coils. As a result, two massive 200-wire cables connected the card reader and the computer. 

  4. The 1401 computer has a core-memory print buffer that allows computation to continue while printing a line. However, a complex interlock circuit will block the computer if it tries to print a line while the buffer is still in use. (This buffering and locking is all done in hardware; there is no operating system.) The print buffer was an optional feature for the computer, renting for an extra $386 per month. 

  5. The Reader Stop diagram shown earlier is somewhat confusing, but I'll give an overview. As cards pass through the card reader, they trip microswitches (called Card Levers or CLC). The logic tests if a microswitch isn't triggered, indicating if a card got jammed before reaching the microswitch. For example, the second triangle triggers an error if there are cards in the hopper, but no card reached card lever 1 after a feed signal. (Note that the logic symbols are backwards compared to modern usage; IBM used a semicircle for OR, and a triangle for AND. Thus, each set of inputs into a triangle shows a set of conditions that will trigger a reader stop.)

    A label like R-6 indicates that the signal comes from relay 6. Relay R-13 is the Reader Stop Control relay; it gets triggered if there was a card at microswitch 1 or 2 during angle 158°-188°. During this time, the card should have already passed the microswitch, so if it is still closed, there must be a jam. "CLC 1 DELAY" indicates that there was a card at CLC 1 in the previous cycle. If there is no card at CLC 2 this cycle, the card must have been jammed along the way. 

  6. Several documents on the 1402 card reader/punch are available on bitsavers. The IBM 1402 Card Read-Punch Manual is an operator's manual for the card reader. The Customer Engineering Manual of Instruction explains the implementation of the card reader for the service engineer, and is probably the best guide to how it operates. The Customer Engineering Instruction Reference Manual explains the internals of the card reader, with a focus on maintenance. The Wiring Diagram provides schematics of the card reader. The Service Index provides a summary of information for servicing. 

  7. The operation of the clutch mechanism is a bit tricky. In the diagram below, the ratchet (green) is constantly spinning. The remaining parts of the clutch are connected together and rotate only when the clutch is engaged. The key idea is that when the intermediate arm (middle) is latched, the control studs push the dogs up and out of the ratchet disengaging the clutch. (This happens because the intermediate arm can pivot slightly relative to the drive arm (left), so the dog pivot and control stud will move relative to each other, swinging the dogs upwards.) When the intermediate arm is unlatched, the dogs are no longer raised and they will engage with one of the ratchet's teeth. At this point, the mechanism is clutched and everything rotates in synchronization, until the intermediate arm hits the latch again, disengaging the dogs from the ratchet. The video of a simpler ratchet clutch may clarify the dog and ratchet mechanism.

    Diagram of the card reader's ratchet clutch, from the service manual.

    Diagram of the card reader's ratchet clutch, from the service manual.

  8. In case anyone (most likely me, in the future) needs to understand the clutch check circuit, I'll give an explanation. The reader sends a "Proc Feed" signal to the computer each time a card could be fed. If the computer wants to read a card, it sends a "Read Clutch" signal back. This signal is gated by cams RL10 and RC6 and cam RC5 and does two things. It triggers the read clutch magnet, the coil that pulls the clutch latch. It also energizes the clutch check relay R10. (See the schematic for details.) A bit later in the cycle, the clutch status is checked using the circuit below. If the clutch check relay is activated but the clutch did not latch, cam RL10 (not shown) will not be rotating and will remain closed. RL10 will trigger the Read Stop relay R4 through RC7 and R10's normally-open contacts. On the other hand, if the clutch check relay is not activated, but the clutch is latched, cam RL2 will be rotating and will trigger the Read Stop relay through the Read Stop relay R4's normally-closed contacts.

    The clutch check circuit in the card reader uses cams and relays to trigger a Read Stop if the clutch fails to latch. From the 1402 schematic.

    The clutch check circuit in the card reader uses cams and relays to trigger a Read Stop if the clutch fails to latch. From the 1402 schematic.

  9. Although one card can be read every 360° cycle of the reader, it takes three cycles to process a particular card. During the first cycle, the card is moved out of the hopper into the reader and triggers card lever 1. During the second cycle, the card passes through the first set of brushes (which counts the holes in each column), and triggers card lever 2. During the third cycle, the card passes through the second set of brushes (which actually reads the card), and is ejected into a hopper. Thus, there are typically three cards in flight in the reader at any time. (This is one reason the reader has so much circuitry to detect jams quickly, to prevent multiple cards from piling up in a crumpled heap.) 

  10. Many cams (including RC6) actually have three lobes, so they open and close three times every 360°, spaced 120° apart. This is due to an optional feature called Early Read (which cost an extra $10 per month). On the basic card reader, if the computer requests a card too late in the cycle, it has to wait for up to an entire revolution (75ms) until the clutch can latch, slowing down the system. With Early Read, there are three clutch points per revolution, cutting down the wasted time to at most 25ms. (One complication: the clutch is geared to half the speed of the regular shaft and turns 180° per cycle. This is why there are 6 teeth on the clutch ratchet.)