Showing posts with label ibm1401. Show all posts
Showing posts with label ibm1401. Show all posts

Hammer time: fixing the printer on a vintage IBM 1401 mainframe

The Computer History Museum has two operational IBM 1401 computers used for demos but one of the printers stopped working a few weeks ago. This blog post describes how the 1401 restoration team diagnosed and repaired the printer. After a lot of tricky debugging (as well as smoke coming out of the printer) we fixed a broken trace on a circuit board. (This printer repair might sound vaguely familiar because I wrote in September about an entirely different printer failure due to a failed transistor.)

The IBM 1401 business computer was announced in 1959, and went on to become the best-selling computer of the mid-1960s, with more than 10,000 systems in use. A key selling point of the IBM 1401 was its high-speed line printer, the IBM 1403. It printed 10 lines per second with excellent print quality, said to be the best printing until laser printers were introduced in the 1970s.

The IBM 1401 mainframe computer (left) at the Computer History Museum printing the Mandelbrot fractal on the 1403 printer (right).

The IBM 1401 mainframe computer (left) at the Computer History Museum printing the Mandelbrot fractal on the 1403 printer (right).

To print characters, the printer used a chain of type slugs (below) that rotated at high speed in front of the paper, with an inked ribbon between the paper and the chain. Each of the 132 print columns had a hammer and an electromagnet. At the right moment, when the desired character passed the hammer, an electromagnet drove the hammer against the back of the paper, causing the paper and ribbon to hit the type slug, printing the character.1

The type chain from the IBM 1401's printer. The chain has 48 different characters, repeated five times.

The type chain from the IBM 1401's printer. The chain has 48 different characters, repeated five times.

The printer required careful timing to make this process work. The chain spun around at 7.5 feet per second and each hammer had to fire at exactly the right time to print the right character perfectly aligned without smearing. Every 11.1 µs, a print slug lined up with a hammer, and the control circuitry checked if the slug matches the character to be printed. If so, the electromagnet was energized for 1.5 ms, printing the character.

Printing mechanism of the IBM 1401 line printer. From 1401 Reference Manual, p11.

Printing mechanism of the IBM 1401 line printer. From 1401 Reference Manual, p11.

While the printer is usually reliable, a few weeks ago the printer stopped working and displayed a "sync check" error on the console. The computer needs to know the exact position of the chain in order to fire the hammers at the right time. If something goes wrong with this synchronization, the computer stops with "sync check" rather than printing the wrong characters.

When the sync check light on the printer is illuminated, you have a problem.

When the sync check light on the printer is illuminated, you have a problem.

To track the chain position, the computer receives a sequence of pulses from the printer: a pulse when the first hammer is lined up with a type slug2 and a double pulse when the chain is in its "home" position with the first character lined up. The pulses are created by a slotted metallic timing disk inside the printer. A magnetic pickup detects these slots and produces a small 100 millivolt signal.7 This signal is amplified inside the printer by two differential amplifier cards to produce a stronger square wave signal. (This is the only electronic part of the printer. Everything else inside the printer is electromechanical or hydraulic; a high-speed hydraulic motor feeds paper through the printer and drips oil on the floor.)

The computer receives these pulses from the printer and generates a logic signal that increments counters to keep track of the chain's position. The schematic below shows part of the circuitry inside the computer, starting with the sense amplifier signal from the printer at the left. Don't try to understand this circuit; I just want to show the strange schematic symbols that IBM used in the 1960s. The box with "I" is an inverter. The triangle is an AND gate. The semicircle that looks like an AND gate is actually an OR gate. The large box with a "T" is a trigger, IBM's name for a flip-flop. The "SS" box is a "single shot" that creates a 400µs pulse; this detects the double pulse that indicates the chain's "home" position.

Excerpt from the 1401 Intermedia Level Diagrams (ILD) showing the chain detection circuitry.

Excerpt from the 1401 Intermedia Level Diagrams (ILD) showing the chain detection circuitry.

To track down the problem, we removed the printer's side panel to access the two amplifier circuit boards, which are visible below. We probed the boards with an oscilloscope. The first amplifier stage (on the right) looked okay, but the second stage (on the left) had problems. In the photo below, the computer is at the back, mostly hidden by the printer.

We took the side panel off the 1403 printer to reveal the circuit boards. We hooked an oscilloscope up to the front board to test it.

We took the side panel off the 1403 printer to reveal the circuit boards. We hooked an oscilloscope up to the front board to test it.

The trace below shows what should happen. The board receives a differential signal at the bottom, with alternating cyan and pink signals. The difference between these signals (middle) is amplified to produce the clean, uniform pulse train at the top. Note the double pulse in the middle indicating the chain's home position.

Oscilloscope trace from a working printer.

Oscilloscope trace from a working printer.

But when we measured the signal, we saw signals that were entirely garbled. The differential signals at the bottom are a mess, and track each other rather than alternating. The output signal (top) is basically random. With this signal from the printer, the computer couldn't keep track of the chain position and the sync check error resulted.

Oscilloscope trace from the faulty printer.

Oscilloscope trace from the faulty printer.

We swapped the board with the board from the other, working printer, and verified that the board was the problem. The museum has a filing cabinet full of replacement circuit boards, but unfortunately not a replacement for this "WW" amplifier board. Instead, we had to diagnose the problem with this card and repair it. On the board below you can see the diodes (small gray cylinders), capacitors (silver cylinders), resistors (striped cylinders and large tan cylinders), and germanium transistors (round metal cans). The transistors are germanium transistors as the 1401 predated silicon transistors.

The "WW" differential amplifier card used by the printer.

The "WW" differential amplifier card used by the printer.

We suspected a failed transistor, so we used Marc's vintage Hewlett-Packard Tektronix curve tracer (below) to test the transistors. One transistor was much weaker than the others. Since the performance of a differential amplifier depends on having transistors with closely matched characteristics, we searched through a couple dozen transistors to find a matching pair and replaced the transistors. (We later determined that these transistors were not part of the differential pair—they were "emitter follower" buffers, so our effort was wasted.)3

We used a vintage HP transistor curve tracer to test the transistors.

We used a vintage HP transistor curve tracer to test the transistors.

Back at the Computer History Museum, we tested out the repaired board and the printer still didn't work. Even worse, smoke started coming from the back of the printer! I quickly shut off the system as an acrid smell surrounded the printer. I expected to see a blackened transistor on the board, but it was fine. I examined the printer but couldn't find the source of the smoke.

I decided to test the board outside the printer by feeding in a 2kHz test signal, but the measurements didn't make sense. The board seemed to be ignoring one of the inputs, so I tested that input transistor but it was fine. Next I checked the diodes, capacitors and resistors again; all the components tested okay, but the board still mysteriously failed. I started carefully measuring voltages at various points in the circuit but the signals didn't make sense and weren't consistent. Since all the components were fine but the board didn't work, I was starting to losing confidence in electronics. Eventually, I nailed down a signal that randomly jumped between 10 volts and 1 volt. After wiggling all the components, I finally noticed that the voltage jumped if I flexed the board. Finally, I had an answer: a cracked trace on the circuit board between the input and the transistor was making intermittent contact.

The board had a cracked trace in the upper left, connecting the upper gold contact. Carl put a wire jumper across the bad section.

The board had a cracked trace in the upper left, connecting the upper gold contact. Carl put a wire jumper across the bad section.

To fix the board, Carl put a wire bridge across the bad trace6. We put the board in the printer, and the printer mostly worked. However, when the printer tried to print in column 85, the column failed to print and the printer stopped with an error.5 More testing revealed four columns of the printer were failing to print due to hammer problems. Each electromagnetic hammer coil is driven by a 60 volt, 5 amp pulse for 1.5 milliseconds. This is a lot of power (300 watts), so if anything goes wrong, hammer coils can easily burn up.

We swung open one of the computer's "gates" (lower left), revealing the cards that drive the printer.

We swung open one of the computer's "gates" (lower left), revealing the cards that drive the printer.

We looked at the printer driver cards inside the computer. Each card generates pulses for two hammers, so there are 66 of these cards. In the photo below, you can see the two large high-current transistors at the left that generate the pulses. (Note the felt insulators on top of these transistors. Due to their height, the transistors pose a risk of shorting against the bottom of the neighboring card.) Just to the right of these transistors are two colorful purple and yellow fuses. In the event of a fault, these fuses are supposed to burn out and protect the hammer coils. We checked the cards associated with the four bad columns on the printer and found four burnt-out fuses.

The "AEC" Alloy-Hammer Driver Latch card produces high-current pulses to drive the printer hammer coils.

The "AEC" Alloy-Hammer Driver Latch card produces high-current pulses to drive the printer hammer coils.

Why did the fuses blow? The circuit to drive the hammer coils is a bit tricky. Every 11 microseconds, a hammer lines up with a character slug and can be fired. But when a hammer is fired, the coil needs to be activated for about 1.5 ms, a much longer time interval. To accomplish this, the hammer driver latches on when a hammer is fired. Later in the print cycle, the hammer driver is turned off. This process is controlled by the chain position counters, which are driven by the pulses from the chain sensor, the same pulses that were intermittent. Thus, if the computer received enough pulses to start printing a line, but then the pulses dropped out in the middle of the line, hammer drivers could be left in the on state until the fuse blows. This explained the problem that we saw.

After Carl replaced the fuses, the printer worked fine except for two problems. First, characters in column 85 were shifted slightly so the text was slightly crooked. Frank explained that the hammer in this column must be moving a bit too slow, hitting the chain after it had moved past its position. This explained the smoke: in the time it took the fuse to blow, the coil must have overheated and been slightly damaged. We'll look into replacing this coil next week. The second problem was that the printer's Ready light didn't go on. This turned out to be simply a bad light bulb, unrelated to the rest of the problems. In any case, the printer was working well enough for demos so the repair was a success.

Closeup of the type chain (upside down) for an IBM 1403 line printer.

Closeup of the type chain (upside down) for an IBM 1403 line printer.

I announce my latest blog posts on Twitter, so follow me at @kenshirriff for future articles. I also have an RSS feed. The Computer History Museum in Mountain View runs demonstrations of the IBM 1401 on Wednesdays and Saturdays so if you're in the area you should definitely check it out (schedule).

Notes and references

  1. You might expect that the 132 hammers align with 132 type slugs, so the matching hammers all fire at once, but that's not what happens. Instead, the hammers and type slugs are spaced slightly differently, so only one hammer is aligned at a time, and a tiny movement of the chain lines up a different hammer and type slug. (Essentially they form a vernier.) Specifically, every 11.1 microseconds, the chain moves 0.001 inches. This causes a new hammer / type slug alignment. For mechanical reasons, every third hammer lines up in sequence (1, 4, 7, ...) until the end of the line is reached; this is called a "subscan" and takes 555 microseconds. Two more subscans give each hammer in the line an option to fire, forming a print scan of 1.665 milliseconds. If you want more information on how the print chain works, I have an animation here

  2. To be precise, the printer generates a pulse if hammer 1, 2, or 3 lines up with a type slug. This is due to the three "subscans", each using every third hammer. 

  3. I'll explain how the differential amplifier works in this footnote, since most readers may not want this much detail. The computer uses two differential amplifier boards in series, first a WV board and then a WW board. They use similar principles, except the WV uses NPN transistors and the WW uses PNP. The differential output from the WW board is transmitted to the computer where a third differential amplifier (an NT card) converts the signal to a logic output. Each board is a differential amplifier, which takes two inputs and amplifies the difference, essentially an op amp with two outputs.4

    A differential pair circuit.

    A differential pair circuit.

    The basic differential pair circuit for a differential amplifier is shown above. (Op amps contain a similar differential pair.) The resistor at the top sets a fixed current I. If the two inputs are equal, the current will be split, with half going through each transistor and branch resistor. But if one if the inputs is slightly lower, that transistor will conduct more and most of the current will go through that branch. Thus, the difference between the inputs steers the current down one side or the other, yielding an amplified signal across the lower resistors.

    Schematic of the WW amplifier board from the SMS documentation.

    Schematic of the WW amplifier board from the SMS documentation.

    The IBM 1401 documentation provides the schematic above for the board, but it's hard to follow what's happening. (Note the unusual transistor symbol, three boxes with an emitter arrow in or out.) I redrew the main part of the circuit below, so it resembles the simple differential pair. It has the same resistors at top and bottom as the differential pair, but there is an R-C circuit in each branch. To simplify, if there is a DC offset or low-frequency input, the capacitor will charge and counteract this offset. Thus, the amplifier operates as a high-pass amplifier; it cuts out low-frequency noise while amplifying the 1800 Hz sync pulses. The diodes clip the output, yielding a square wave. The differential output goes through emitter-follower buffers (omitted below) so the signal is strong enough to be transmitted through an under-floor cable from the printer to the computer.

    The differential amplifier circuit of the WW card.

    The differential amplifier circuit of the WW card.

     

  4. An op amp with positive and negative outputs is known as a "fully differential op amp". 

  5. The IBM 1403 printer has multiple error checks to avoid printing incorrect data. For a business machine, it would be bad to drop digits in, say, payroll checks or tax records. To detect hammer failures, the printer has 132 wires from the hammers back to the computer, to verify that each hammer fired when it was supposed to. If the computer doesn't get a pulse back from a hammer, the computer stops immediately, as we saw. 

  6. We noticed that there was solder smeared across the broken part of the trace. My suspicion is that the same problem happened a few years ago and was repaired by bridging the broken trace with solder. Eventually the heavy vibrations inside the printer caused a hairline crack in the solder, causing the problem to recur. By bridging the break with wire rather than just solder, we hope we have fixed the problem permanently. We also noticed the transistor connected to the broken trace had been replaced, so they must have tried that first in the previous repair. 

  7. The 1403 printer is documented in IBM 1403 Printer Component Description and 1403 Printers Field Engineering Maintenance Manual. See also this brief article about the 1403 printer in the IEEE Spectrum. For details on how the timing pulses work, see the 1403 Manual of Instruction, page 42. 

Bad relay: Fixing the card reader for a vintage IBM 1401 mainframe

As soon as we finished repairing a printer failure at the Computer History Museum, Murphy's law struck and the card reader started malfunctioning. The printer and card reader were attached to an IBM 1401, a business computer that was announced in 1959 and went on to become the best-selling computer of the mid-1960s. In that era, data records were punched onto 80-column punch cards and then loaded into the computer by the card reader, which read cards at the remarkable speed of 13 cards per second. This blog post describes how we debugged the card reader problem, eventually tracking down and replacing a faulty electromechanical relay inside the card reader.

The IBM 1402 card reader at the Computer History Museum. Cards are loaded into the hopper on the right. The front door of the card reader is open, revealing the relays and other circuitry.

The IBM 1402 card reader at the Computer History Museum. Cards are loaded into the hopper on the right. The front door of the card reader is open, revealing the relays and other circuitry.

The card reader malfunction started happening every time the "Non-Process Run-Out" mechanism (NPRO) was used. During normal use, if the card reader stopped in the middle of processing, unread cards could remain inside the reader. To remove them, the operator would press the "Non-Process Run-Out" switch, which would run the remaining cards through the reader without processing them. Normally, the "Reader Stop" light (below) would illuminate after performing an NPRO. The problem was the "Reader Check" light also came on, indicating an error in the card reader. Since there was no actual error and the light could be cleared simply by pressing the "Check Reset" button, this problem wasn't serious, but we still wanted to fix it.

Control panel of the 1402 card reader showing the "Reader Check" error. The Non-Process Run-Out switch is used to run cards out of the card reader without processing them.

Control panel of the 1402 card reader showing the "Reader Check" error. The Non-Process Run-Out switch is used to run cards out of the card reader without processing them.

To track down the problem, the 1401 restoration team started by probing the card reader error circuitry inside the computer with an oscilloscope. (The card reader itself is essentially electromechanical; the logic circuitry is all inside the computer.) The photo below shows the 1401 computer with a swing-out gate opened to access the circuitry. Finding a circuit inside the IBM 1401 is made possible by the binders full of documentation showing the location and wiring of all the computers circuitry. This documentation was computer generated (originally by a vacuum tube IBM 704 or 705 mainframe), so the diagrams were called Automated Logic Diagrams (ALD).

The IBM 1401 with gate 01B4 opened. The yellow wire-wrapped wiring connects the circuit boards that are plugged into the gate. The computer's console is visible on the front of the computer.

The IBM 1401 with gate 01B4 opened. The yellow wire-wrapped wiring connects the circuit boards that are plugged into the gate. The computer's console is visible on the front of the computer.

The reader check error condition is stored in a latch circuit; the latch is set when an error signal comes in, and cleared when you press the reset button. To find this circuit we turned to the ALD page that documented the card reader's error checking circuitry. The diagram below is a small part of this ALD page, showing the read check latch of interest: "READ CHK LAT". Each basic circuit (such as a logic gate) is drawn on the ALD as a box, with lines showing how they are connected. Inside the box, cryptic text indicates what the circuit does and where it is inside the computer. For example, the latch (lower two boxes) is constructed from a circuit card of type CQZV (an inverter) and a CHWW card (a NAND gate). The text inside the box also specifies the location of each card (slots D21 and D24 in gate 01B4), allowing us to find the cards in the computer. Note that the RD REL CHK signal comes from ALD page 56.70.21.2; this will be important later.

The read check latch circuit, excerpted from the ALD 36.14.11.2.

The read check latch circuit, excerpted from the ALD 36.14.11.2.

The schematic below shows the latch redrawn with modern symbols. If the RESET line goes low, it will force the inverted output high. Otherwise, the output will cycle around through the gates, latching the value. If any OR input is high (indicating an error), it will force the latch low. Note the use of wired-OR—instead of using an OR gate, signals are simply wired together so if any signal is high it will pull the line high. Because transistors were expensive when the IBM 1401 was built, IBM used tricks like wired-OR to reduce the transistor count. Unfortunately, the wired-OR made it harder to determine which error input was triggering the latch because the signals were all tied together.

The read check latch circuit redrawn with modern symbols.

The read check latch circuit redrawn with modern symbols.

Once we located the circuit cards for the latch, we used an oscilloscope to verify that the latch itself was operating properly. Next, we needed to determine why it was receiving an error input. After disconnecting wires to get around the wired-OR, we found that the error was not coming from the Read/punch error or the Feed error signal. The RD REL CHK was the obvious suspect; this signal was part of the optional Read Punch Release feature1. However, the team insisted that Read Punch Release wasn't installed in our 1401. The source of this signal was ALD 56.70.21.2 and
our documentation didn't include ALD section 56, confirming that this feature wasn't present in our system.

Additional oscilloscope tracing showed a lot of noise on some of the signals from the card reader. This wasn't unexpected since the card reader is built from electromechanical parts: relays, cam switches, brushes, motors, solenoids and other components that generate noise and voltage spikes. I considered the possibility that a noise spike was triggering the latch, but the noise wasn't reaching that circuit.

The plug charts show the type of card in each position in the computer, and the function assigned to it. This is part of the plug chart for gate 01B4.

The plug charts show the type of card in each position in the computer, and the function assigned to it. This is part of the plug chart for gate 01B4.

At this point, I was at a dead end, so I took another look at the RD REL CHK signal to see if maybe it did exist. The 1401's documentation includes "plug charts," diagrams that show what circuit card is plugged into each position in the computer. I looked at the plug chart for the card reader circuitry in gate 01B4 (swing-out gate, not logic gate). The plug chart (above) showed cards assigned to the mysterious ALD 56.70.21.2, such as the cards in slots A15-A17 and B15. (The plug chart also had numerous pencil updates, crossing out cards and adding new ones, which didn't give me a lot of confidence in its accuracy.) I looked inside the computer and found that these cards, the Read Punch Release cards generating RD REL CHK, were indeed installed in the computer. So somehow our computer did have this feature.

The gate in the 1401 holding the card reader circuitry. Note the cards in positions A15 and B15.

The gate in the 1401 holding the card reader circuitry. Note the cards in positions A15 and B15.

The problem was that even though these cards were present in the system, we didn't have the ALDs that included them, leaving us in the dark for debugging. I checked the second 1401 at the Computer History Museum; although it too had the cards for Read Punch Release, its documentation binders also mysteriously lacked the section 56 ALDs. Fortunately, back in 2006, the Australian Computer Museum Society sent us scans of the ALDs for their 1401 computer. I took a look and found that the Australian scans included the mysterious section 56. There was no guarantee that their 1401 had the same wiring as ours (since the design changed over time), but this was all I had to track down RD REL CHK.

Simplified excerpt of ALD 56.70.21.2 from an Australian 1401 computer.

Simplified excerpt of ALD 56.70.21.2 from an Australian 1401 computer.

According to the Australian ALD above, RD REL CHK was generated by the CGVV card in slot A17 (upper right box above). The oscilloscope trace below confirmed that this card was generating the RD REL CHK signal (yellow), but its input (RD BR IMP CB) (cyan) looked bad. Notice that the yellow line jumps up suddenly (as you'd expect from a logic signal), but the cyan line takes a long time to drop from the high level to the low level. Perhaps a weak transistor in a circuit was pulling the signal down slowly, or some other component had failed.

Oscilloscope trace showing the "circuit breaker" signal from the card reader (cyan) and the READ REL CHK error signal (yellow).

Oscilloscope trace showing the "circuit breaker" signal from the card reader (cyan) and the READ REL CHK error signal (yellow).

We looked into the RD BR IMP CB signal, short for "ReaD BRush IMPulse Circuit Breakers". In IBM terminology, a "circuit breaker" is a cam-operated switch, not a modern circuit breaker that trips when overloaded. The read brush circuit breakers generate timing pulses when the read brushes that detect holes in the punch card are aligned with a row of holes, telling the computer to read the hole pattern.

The NGXX integrator card contains resistor-capacitor filters. Unlike most cards, this one doesn't have any transistors. Photo courtesy of Randall Neff.

The NGXX integrator card contains resistor-capacitor filters. Unlike most cards, this one doesn't have any transistors. Photo courtesy of Randall Neff.

We looked up yet another ALD page to find the source of the strangely slow RD BR IMP CB signal. That signal originated in the card reader and then passed through an NGXX integrator card (above). Earlier I mentioned that the signals from the card reader were full of noise. This isn't a big problem inside the card reader since brief noise spikes won't affect relays. But once signals reach the computer, the noise must be eliminated. This is done in the 1401 by putting the signal through a resistor-capacitor low-pass filter, which IBM calls an "integrator". That card eliminates noise by making the signal change very slowly. In other words, although the signal on the oscilloscope looked strange, it was the expected behavior and not a problem. But why was there any signal there at all?

Part of IBM 1402 card reader schematic showing the cams (circles) that generate the CB read pulses and the relay that blocks the pulses during NPRO.

Part of IBM 1402 card reader schematic showing the cams (circles) that generate the CB read pulses and the relay that blocks the pulses during NPRO.

After some discussion, the team hypothesized that the pulses on RD BR IMP CB shouldn't be getting to the 1401 at all doing a Non-Process Run-Out, since cards aren't being read. The schematic4 for the card reader (above) shows the complex arrangement of cams and microswitches that generates the pulses. During an NPRO, relay #4 will be energized, opening the "READ STOP 4-4" relay contacts. This will stop the BRUSH IMP CB pulses from reaching the 1401.53 In other words, relay #4 should have blocked the pulses that we were seeing.

Frank King replacing a bad relay in the 1402 card reader. The relays are next to his right shoulder.

Frank King replacing a bad relay in the 1402 card reader. The relays are next to his right shoulder.

The card reader contains rows of relays; the reader's hardware is a generation older than the 1401 and it implements its basic control functions with relays rather than logic gates. Frank pulled out relay #4 and inspected it.6 The relay (below) has 6 sets of contacts, activated by an electromagnet coil (yellow) and held in position by a second coil. Springs help move the contacts to the correct positions. One of the springs appeared to be weak, preventing the relay from functioning properly.7 Frank put in a replacement relay and found that the card reader now performed Non-Process Run-Outs without any errors. We loaded a program from cards just to make sure the card reader still performed its main task, and that worked too. We had fixed the problem, just in time for lunch.

The faulty relay from the IBM 1402 card reader.

The faulty relay from the IBM 1402 card reader.

Conclusions

It is still a mystery why section 56 of the ALDs was missing from our documentation. As for the presence of the Read Punch Release feature on our 1401, that feature turns out to be standard on 1401 systems like the ones at the museum.8 I think the belief that our 1401 didn't include this feature resulted from confusion with the Punch Feed Read feature, which we don't have. (That feature allowed a card to be read and then punched with additional data as it passed through the card reader.)

The team that fixed this problem included Frank King, Alexey Toptygin, Ron Williams and Bill Flora. My previous blog post about fixing the 1402 card reader is here, tracking down an elusive problem with a misaligned cam.

I announce my latest blog posts on Twitter, so follow me at @kenshirriff for future articles. I also have an RSS feed. The Computer History Museum in Mountain View runs demonstrations of the IBM 1401 on Wednesdays and Saturdays so if you're in the area you should definitely check it out (schedule).

Notes and references

  1. Read Punch Release was a feature to allow the CPU to operate while reading a card. IBM's large 7000 series mainframes used "data channels," which were high-performance I/O connections using DMA and controlled by separate I/O processors. With a data channel, the CPU could process data while the channel performed I/O. But the 1401 was a much simpler machine, designed to replace electromechanical accounting machines that would read a card, process the card, and print out results. On the 1401, when the CPU executed an instruction to read a card, the CPU would wait while the card moved through the card reader and passed under the brushes to be read. The mechanical cycle to read a card took 75 ms (corresponding to 800 cards per minute), of which only 10 ms was available for the CPU to perform computation and the rest was wasted (from the CPU's perspective). The Read Punch Release feature was a workaround for this. The programmer could issue an SRF (Start Read Feed) instruction, which would cause a card to start moving through the card reader. The program had 21 ms to perform computation and execute the read instruction before the card reached the reading station. (If the program executed the read instruction too late, the computer wouldn't be able to read the card and would halt with an error.) This provided extra computation time in each card read cycle. IBM charged a monthly fee for additional features; Read Punch Release was relatively inexpensive at $25 per month (equivalent to about $200 today). 

  2. The Read Punch Release feature also provided a similar instruction for punching a card, allowing an extra 37 ms of computation while punching a card. See page 16 of the IBM 1402 Card Read-Punch Manual for details on card timing and the read punch release operation. 

  3. The card reader schematic shows that six separate cams were required to generate the RD BR IMP CB signal. The problem is that cards are read at high speed, so rows on the card are read just 3.75 ms apart. Cams and microswitches are too slow to generate pulses at this rate. To get around this, pulses for odd rows and even rows are generated separately. In addition, one set of switches closes for the start of a pulse and a second set opens for the end of a pulse. Needless to say, it is a pain to adjust all these cams so the pulses have the right timing and duration. If this timing is off, cards won't read correctly.

    To improve reliability and reduce maintenance, IBM eventually replaced these cams with a "solar cell" (i.e. a photo-cell), slotted disk, and light. The light passing through the slotted disk triggered a pulse from the photo-cell. Our ALDs had some penciled-in modifications suggesting that our 1401 was originally configured to work with a solar cell card reader and then modified to work with the older circuit breaker card reader. 

  4. The schematic for the 1402 card reader is here. The read brush impulse CB signal is generated on pdf page 8. This document also includes instructions on how to upgrade from the "circuit breaker" circuit to the "solar cell" circuit, a change that is indicated as taking 1.0 to 1.5 hours for the hardware installation, 1.5 to 2.0 hours for miscellaneous electrical changes, and 2.3 to 3.7 hours to wire up the new circuit. (See pdf pages 47-53.) 

  5. The relays involved in an NPRO operation are documented in 1402 Card Read-Punch Customer Engineering Manual of Instruction, page 4-3 or pdf page 31. 

  6. The relay is a "permissive make" relay, a type of relay that IBM designed to be twice as fast as regular relays. For a detailed discussion of IBM's relays, see Commutation and Control. The permissive make relay is discussed on page 59 (pdf page 18). 

  7. Stan Paddock on the 1401 team built a relay tester that we could use to check the bad relay. Unfortunately, the 1401 workshop at the Computer History Museum is closed due to construction so we couldn't access the tester (or the collection of spare relays in the workshop). Fortunately, we had a spare relay that wasn't in the workshop for some reason. 

  8. The IBM Sales Manual lists the various 1401 features and their prices (pdf page 50). It states the Read Punch Release feature is standard on the 1401 Model C, the "729 Tape/Card System". 

The printer that wouldn't print: Fixing an IBM 1401 mainframe from the 1960s

The Computer History Museum has two operational IBM 1401 computers used for demos, but a few weeks ago one computer suddenly couldn't print anything. I helped track down the problem, but it was more tricky than we expected; along the way we had to investigate the printer error checking circuits, the print buffer, and even low level core memory signals. This blog post discusses our investigation and how we traced the problem to a failed germanium transistor.

The IBM 1401 mainframe computer (left) at the Computer History Museum printing the Mandelbrot fractal on the 1403 printer (right).

The IBM 1401 mainframe computer (left) at the Computer History Museum printing the Mandelbrot fractal on the 1403 printer (right).

The IBM 1401 computer was announced in 1959, and went on to become the best-selling computer of the mid-1960s, with more than 10,000 systems in use. The 1401 leased for $2500 a month (about $20,000 in current dollars), a low price that let even medium-sized businesses use the 1401 for payroll, accounting, invoicing, and many other tasks. The IBM 1401 computer was constructed from small circuit boards (called SMS cards) plugged into units called "gates"—these are gates in the sense of something that swings open, not logic gates. The photo below shows the 1401 with one of the gates open, revealing dozens of brown SMS cards plugged into the gate.

The IBM 1401 computer, with one of the gates opened, showing the dozens of circuit boards (SMS cards) in each gate. 
The fan on the front of the gate keeps the cards cool.

The IBM 1401 computer, with one of the gates opened, showing the dozens of circuit boards (SMS cards) in each gate. The fan on the front of the gate keeps the cards cool.

One key selling point of the IBM 1401 was its high-speed line printer (the IBM 1403), which could hammer out 10 lines per second. (IBM claimed this was four times as fast as competing printers, but others dispute this.) The 1403 printer had excellent print quality, said to be the best printing until laser printers were introduced in the 1970s.1 IBM claims that "Even today, it remains the standard of quality for high-speed impact printing."

Closeup of the type chain (upside down) for an IBM 1403 line printer.

Closeup of the type chain (upside down) for an IBM 1403 line printer.

The 1403 printer used a chain of type slugs (above) that rotated at high speed above the paper, with an inked ribbon between the paper and the chain. Each of the 132 print columns had a hammer and an electromagnet. At the right moment, when the desired character passed the hammer, the electromagnet drove the hammer against the back of the paper, causing the paper and ribbon to hit the type slug, printing the character.2

Printing mechanism of the IBM 1401 line printer. From 1401 Reference Manual, p11.

Printing mechanism of the IBM 1401 line printer. From 1401 Reference Manual, p11.

Unfortunately, the printer at the Computer History Museum recently had a problem: whenever a line was printed, the computer would halt due to a "print check" error. Fortunately the museum has a team of volunteers to help keep the system running; people helping with this printer problem included Ron Williams, Frank King, Marc Verdiell, Carl Claunch, Michael Marineau, Robert Garner and Alexey Toptygin. By the time I arrived to help, Ron had written a simple test program that repeatedly attempted to print a line; he toggled the program into the computer by hand, and he disabled the error check. The printer printed the characters properly, so we suspected the problem was in the error reporting circuitry inside the computer. Our strategy was to find the error signal and then trace it back through the computer to determine why it was being generated.

We started by examining the latch circuit that holds the print check error condition and sends it to the rest of the computer. To find the circuit, we consulted the documentation: binders of cryptic computer-generated wiring diagrams, called Automated Logic Diagrams (ALD). A small piece of an ALD is shown below showing the print check latch (PR CHK LAT). Each box on the ALD corresponds to a circuit on an SMS board and the lines show how the boards are wired together. Deciphering the text inside the box on the right indicates a board of type 2JMX implementing a "2+AO" function, which in modern terms is AND-OR-Invert. The text in each box also indicates the location of the card: its gate (physical swing-out gate, not logic gate), gate 01A6 in this case, and the card's position in the gate (F10). Thus, to check the output (labeled H) of the latch with the oscilloscope, we swung out gate 01A6, found card F10, and hooked the oscilloscope to pin H. We found pin H went low (error) when pins F and G went high, which was the proper behavior for the latch. Pin G (PR CK SAMPLE) was essentially a clock to sample the error state, while pin F was the error signal itself. Our next task was to determine what was triggering the error signal on pin F.

Excerpt of an Automated Logic Diagram (ALD) for the IBM 1401, showing the print check latch (PRT CHK LAT). This page is denoted 36.37.21.2.

Excerpt of an Automated Logic Diagram (ALD) for the IBM 1401, showing the print check latch (PRT CHK LAT). This page is denoted 36.37.21.2.

The documentation also includes logic diagrams that show the circuitry at a logical level, which is slightly easier to understand than the physical connections on the ALD diagrams. The logic diagram below shows the printer error circuitry. At the right, the print check error signal (PRT CHK ERROR) comes out of the latch (PR CHK LAT) that holds the error signal. (This is the same latch as in the ALD diagram above, and you can match up the signal names.) To the left of the latch, several different error conditions are detected and combined to form the error signal fed into the latch. (Note that IBM's logic symbols didn't match standard symbols. The semicircle is an OR gate, not an AND gate. The triangle is an AND gate. An "I" in a box is an inverter.)

Logic diagram of the error checking logic for the IBM 1401/1403. From Instructional Logic Diagrams page 77 "Print Buffer Controls".

Logic diagram of the error checking logic for the IBM 1401/1403. From Instructional Logic Diagrams page 77 "Print Buffer Controls".

Several different conditions can trigger a print check error3 and we thought the "hammer fire" check was a likely candidate. Recall that the printer uses 132 hammers, one per column, to print a line of characters. To make sure the hammers are operating correctly, the computer has two special planes in core memory. (The 1401 contains 4,000 characters of core memory4; each bit of memory is a tiny ferrite ring that is magnetized one way to store a "1" and the other way for a "0". A grid of 4000 cores forms a plane, storing a 1-bit slice of memory. Multiple planes are stacked up to form the storage unit.) Each time the computer decides to fire a hammer, it records this in core memory in the "equal check" plane. When a hammer actually fires, the current pulse from the electromagnet stores a bit in the "hammer-fire" plane.5 Each print scan cycle, the computer compares the two core planes to see if a hammer was fired when it wasn't supposed to, or if a hammer failed to fire when it should have; a mismatch triggers the "hammer fire" check error.

Closeup of the hammer electromagnets in the IBM 1403 printer. An electromagnet (when energized through its pair of wires) pulls a metal armature, which drives the hammer, paper and ribbon against the type slug. There are 132 hammers, one for each column, arranged in two rows of 66.

Closeup of the hammer electromagnets in the IBM 1403 printer. An electromagnet (when energized through its pair of wires) pulls a metal armature, which drives the hammer, paper and ribbon against the type slug. There are 132 hammers, one for each column, arranged in two rows of 66.

After some difficulty6, we determined that the problem wasn't the hammer fire check, but a different check: "print line complete" (PLC). This check ensures that for each line, either exactly one character was printed in each column or the column was blank. This check uses a third special core plane, the "print line complete" plane. Each time a character is printed in a column, the corresponding bit is set. (For a blank or unprintable character, a separate circuit sets the column's bit.) At the end of the line (during scan 49), the print line complete cores are checked; if any core is zero, the printer failed to print that column and an error is reported. (You can see the PLC CHECK signal and the logic that generates it on the earlier logic diagram.)

Oscilloscope probing (below) showed that the PLC CHECK (yellow) was triggered because the system thought a second character was being printed in the same column. The cyan signal is the (inverted) PLC bit from core (PR LINE COMP LATCH); each low pulse indicates a character has been printed in that column. The pink pulse (PRINT COMPARE) indicates a new character is being printed. The problem is that the cyan and pink signals go low at the same time, indicating both an existing character and a new character in the column. This generates the extra blue pulse (PLC CHECK), which triggers the yellow pulse (PRINT CHK ERROR from the latch). (This circuit can be seen in the earlier logic diagram, labeled "Trying to print position twice".)

Oscilloscope trace from debugging the IBM 1401's printer.

Oscilloscope trace from debugging the IBM 1401's printer.

Several things could cause the system to think two characters were being printed in the column. Looking at the printer's output we saw that it printed just the expected character on the paper, so the circuit to print a character seemed to be working correctly (PRINT COMPARE, the single pink pulse above), We tested the blank / unprintable circuit and it was detecting blank and non-blank columns correctly. So the most likely problem was reading a 1 from core memory (the cyan line above, PR LINE COMP LATCH) when it should be a 0. But was the problem the wrong value going in to core, or the wrong value coming out?

The logic diagram below shows the circuit that writes to the Print Line Compare core memory. At the right, PR LINE COMP INH is the (inverted) signal written to core.8 On scan 49 (the error-checking print cycle after printing all 48 characters), this line is set high, clearing the memory. If a character is being printed, the PRINT COMPARE EQUAL signal will set the core. At the left, logic gates detect a blank or unprintable character. And if a 1 bit was already in core (PR LINE COMP LATCH), the 1 bit is rewritten to core.

Logic diagram of the print line complete logic for the IBM 1401/1403. From Instructional Logic Diagrams page 77 "Print Buffer Controls".

Logic diagram of the print line complete logic for the IBM 1401/1403. From Instructional Logic Diagrams page 77 "Print Buffer Controls".

We detected that this circuit was writing erroneous 1 bits to core because it was reading erroneous 1 bits from core. But that put us in a circle, not knowing if the initial problem was the read or the write. To resolve this, we triggered the oscilloscope on print scan 49, which is when the PLC bits get cleared, and then looked at the next print scan, which reads the cleared bits back. We saw 0's being written (i.e. PR LINE COMP INH high), but unexpectedly saw 1's coming back (PR LINE COMP LATCH). So we knew something was going wrong at a low level in the core memory.

I should mention that in the base 1401 system, the printer check bits were stored in the main core memory module, but our system used a separate "print storage" core memory for improved performance. The performance issue is due to how the printer uses core memory: each time a hammer lines up with a type slug, the computer reads the corresponding character from core memory and fires the hammer if the character in storage matches the character under the hammer. Since core memory is constantly in use while printing a line, the computer can't do any computation while printing. The solution was the print storage feature: an additional 132-address core memory that functioned as a print buffer.7 With print storage, a line to be printed was first rapidly copied from the main core memory to the print storage core memory. Then the computer could continue doing computation using the main core memory while the print circuitry read from the print storage core memory. Each option on the IBM 1401 had a monthly charge; IBM charged an extra $386 a month for the print storage feature.

This print storage gate has the circuitry to drive the printer buffer core memory. The core memory unit in the upper right has bundles of yellow wires attached.

This print storage gate has the circuitry to drive the printer buffer core memory. The core memory unit in the upper right has bundles of yellow wires attached.

The photo above shows the gate that implements the print storage feature. The core memory module is the block on the upper right with yellow wires attached. (Individual cores can be seen in the photo below.) Core memory requires a lot of supporting circuitry. To select an address, driver cards generate X and Y signals. To write a core, the inhibit signal is combined with the clock by a gate, and then a driver card amplifies the signal and sends it through the inhibit line that passes through all the cores in the plane.8 When a core is read, it induces a pulse on a sense wire. This pulse is amplified by a sense amplifier card, and then the bit is stored in a latch. The numerous SMS cards in the print storage gate provided these support functions.

The cores inside the print buffer. The wiring is not the usual core memory grid because each printer hammer is wired directly to a hammer check core. The image quality is bad because of the plastic cover over the cores.

The cores inside the print buffer. The wiring is not the usual core memory grid because each printer hammer is wired directly to a hammer check core. The image quality is bad because of the plastic cover over the cores.

We probed the sense amplifier and latch cards on the reading side of the core memory and they seemed to be operating correctly, so we moved to the writing side. The HN inhibit driver card seemed a candidate for failure since it operates at high current, but we swapped the card with a replacement and the printer still failed. Next, I tried looking at the input to that card, but found there was no signal on that line, which seemed very suspicious.

Oscilloscope of the bad "CHWW" NAND gate card: pink (3) and blue (4) are inputs, cyan (2) is the output, stuck high.

Oscilloscope of the bad "CHWW" NAND gate card: pink (3) and blue (4) are inputs, cyan (2) is the output, stuck high.

The missing signal was generated by a card of type CHWW, a NAND gate that combines the inhibit signal with the clock before sending it to the driver card. I hooked up the oscilloscope to the inputs and output of the NAND gate, yielding the trace above. This trace was the smoking gun: the output (cyan 2) remained high even when the two inputs (pink 3 and blue 4) went high. This showed that the NAND gate had failed and its output was stuck high. This explained everything: with this output stuck high, only 1's would be written to the PLC core plane. Then, when a character was printed, the print circuitry would read the 1 from core, think a character had already been printed in this column, the PLC check would fail, and the print check error would be triggered.

The printer successfully operating, printing out powers of 2.

The printer successfully operating, printing out powers of 2.

We swapped this card with a spare, and the printer started printing without any errors (above). This proved that we had finally traced the problem; it was a simple NAND gate in the depths of the printer buffer core memory circuit. The failed card is shown below. It implements three NAND gates (details) using diode-transistor logic (which IBM calls CDTL—Complemented Transistor Diode Logic). Each two-input gate uses one germanium transistor (circular metal can) and two diodes (striped glass components on the right). Pull up resistors (striped) and inductors (beige) on the left complete the circuits.

The failed CHWW card from the IBM 1401. This card implements three NAND gates. The lower left transistor failed, and has been replaced.

The failed CHWW card from the IBM 1401. This card implements three NAND gates. The lower left transistor failed, and has been replaced.

I tested the card with a signal generator and found that while two of the three NAND gates worked, the other was stuck at a high output, confirming what we saw inside the 1401. Next I tested the transistors using the diode test mode on a multimeter. The good transistors had voltage drops of 0.23V. (This may seem low, but remember that these are germanium transistors not silicon transistors.) In comparison, the bad transistor had a Vbe drop of 0.95V, much higher. Finally, we removed the transistors and checked them on a vintage Tektronix 577 curve tracer. We thought the bad transistor might just be too weak to operate the gate, but it was entirely dead—totally flatlined on the curve tracer.

We opened up the transistor on a lathe and looked inside. The transistor is an IBM 083 NPN germanium alloy transistor (germanium was used before silicon transistors). The transistor consists of a tiny germanium die (the shiny metallic square below), forming the base. Two wires are attached for the emitter and collector, connected to dots of tin alloy, a larger dot on the front for the collector and a smaller dot on the back for the emitter. Under the microscope, it looked like there was some corrosion on the alloy dots and the emitter wire didn't look solidly connected, so we suspect that is the root cause of the failure.

Inside a failed IBM 083 germanium transistor. The silver-colored square in the middle is the germanium die, wired to the base pin. The dot in the middle is tin alloy, forming the collector, with a wire to the collector pin on the left. A smaller dot on the other side of the germanium die forms the emitter, wired to the pin on the right.

Inside a failed IBM 083 germanium transistor. The silver-colored square in the middle is the germanium die, wired to the base pin. The dot in the middle is tin alloy, forming the collector, with a wire to the collector pin on the left. A smaller dot on the other side of the germanium die forms the emitter, wired to the pin on the right.

Conclusions

This was a harder problem to diagnose than most of the IBM 1401 issues. But we managed to track down the problem, replace the bad card, and get the printer back in operation. One nice thing about the IBM 1401 compared to modern systems is that it's not a black box—you can look inside all the circuitry, down to the individual transistors. In this case, we were able to find the bad transistor that was causing the system failure, and even determine that it was probably corrosion that killed the transistor.

I announce my latest blog posts on Twitter, so follow me at @kenshirriff for future articles. I also have an RSS feed. The Computer History Museum in Mountain View runs demonstrations of the IBM 1401 on Wednesdays and Saturdays so if you're in the area you should definitely check it out (schedule).

Notes and references

  1. One reason for the IBM 1403's high quality printing was its use of a type chain instead of typebars or a drum. Many earlier line printers used rows of typebars or a rotating drum of characters. Any timing imprecision would change the vertical positioning of characters, yielding ugly wavy text. The 1403, on the other hand, used a horizontally rotating chain of characters so misalignment caused a hardly-noticeable change in the spacing between characters. 

  2. You might expect that the 132 hammers align with 132 type slugs, so the matching hammers all fire at once, but that's not what happens. Instead, the hammers and type slugs are spaced slightly differently, so only one hammer is aligned at a time, and a tiny movement of the chain lines up a different hammer and type slug. (Essentially they form a vernier.) Specifically, every 11.1 microseconds, the chain moves 0.001 inches. This causes a new hammer / type slug alignment. For mechanical reasons, every third hammer lines up in sequence (1, 4, 7, ...) until the end of the line is reached; this is called a "subscan" and takes 555 microseconds. Two more subscans give each hammer in the line an option to fire, forming a print scan of 1.665 milliseconds. 48 print scans give each hammer a chance to print each character, and then the 49th print scan is used for error checking. (For more details of this timing, see Manual of Instruction, page 37.)

    The mechanism of scans and subscans may seem excessively complicated. But what it accomplishes is matching up the fast "electronic world" with the slower "mechanical world." Specifically, every 11.1 microseconds, a hammer and type slug line up. The computer reads the character in that column from core, compares it to the character on the type slug, and if they match, it fires the hammer. The important thing here is that a core memory cycle matches the time between hammer alignments, making it possible to read the character from core for each hammer alignment. If you want more information on how the print chain works, I have an animation here.

    One subtlety is that a hammer takes 1.52 milliseconds to impact (Manual of Instruction, p32). Thus, it's not really the case that the hammer fires when it lines up with the type, but when it will be lined up 1.52 milliseconds in the future. 

  3. It may seem excessive that the 1401 had multiple checks to ensure that the printer was operating properly. But for a business computer, print errors could be catastrophic: imagine if a day's payroll checks had a digit printed wrong or tax forms were printed incorrectly. IBM's scientific computers had much less error checking than the business computers, on the assumption that scientists would notice problems. 

  4. The 1401 stores 4,000 characters in core memory, not 4096, because it is a decimal machine (i.e. BCD), with decimal addresses. Its memory can be expanded to 16,000 characters with a dishwasher-sized memory expansion unit; I wrote about repairing this unit here. I wrote more about the 1401's core memory here

  5. Recording each hammer fire in core memory isn't done by the computer writing to core memory. Instead, each hammer is physically wired directly to a particular core; 132 wires from the hammer electromagnets to the cores. When a hammer fires, the current pulse from the hammer's electromagnet goes through a wire wrapped through the corresponding core, magnetizing that core. (You can see these wires in the earlier picture of the cores.) 

  6. It was tricky to determine which signal was triggering the error input F, due to the 1401's use of wired-OR. Because transistors were expensive when the IBM 1401 was built, IBM used many tricks to reduce the transistor count. One trick is the wired-OR—instead of using an OR gate, signals are simply wired together so if any signal is high it will pull the line high. Thus, We couldn't simply probe the signals feeding into pin F because they were all wired together. Instead, we needed to disconnect cards so we could test one signal at a time. 

  7. The print storage core memory has 12 core planes; that is, it stores 12 bits at each location. Like a regular core location, it uses 6 bits to store each BCDIC character, as well as a bit for the word mark (metadata indicating field locations), and a parity bit. In addition, the print storage has four planes for error detection: a hammer fire sense plane (recording the hammers that fired), equal check plane (recording the hammers that should fire), print line complete plane (recording columns with a character printed), and an error check plane (indicating the column that triggered an error). 

  8. The process to write to core memory may seem backwards, using a high signal on the inhibit line to write a 0. This is due to how cores function. The key that makes cores work is that they require a high current pulse to flip the core's magnetic state; a pulse with half the current has no effect on the core. Cores are arranged in a grid, with X and Y address lines that are pulsed to select a core. Multiple planes are stacked, one for each bit. Each line is pulsed with half the necessary current, so only the core where both lines cross has enough current to flip to the 1 state. Each plane has an inhibit line that passes through all the cores in the plane. To write a 1 to a plane, the inhibit line gets no current, causing the addressed core to flip to 1 as described. To write a 0 to a plane, the inhibit line gets half current in the opposite direction. The result is that none of the cores get enough current to flip, and the addressed core remains in the 0 state. Thus, by setting each plane's inhibit line appropriately, the desired 0's and 1's can be written to the address in the core stack. 

  9. For information on how the print checks work, see Instruction Logic, page 98. The 1403 printer is documented in IBM 1403 Printer Component Description, 1403 Printers Field Engineering Maintenance Manual and 1403 Printers Field Engineering Manual of Instruction. See also this brief article about the 1403 printer in the IEEE Spectrum. For a detailed description of the IBM 1401, see IBM 1401: a modern theory of operation