Overview of the ARM1 chip
The ARM1 chip is built from functional blocks, each with a different purpose. Registers store data, the ALU (arithmetic-logic unit) performs simple arithmetic, instruction decoders determine how to handle each instruction, and so forth. Compared to most processors, the layout of the chip is simple, with each functional block clearly visible. (In comparison, the layout of chips such as the 6502 or Z-80 is highly hand-optimized to avoid any wasted space. In these chips, the functional blocks are squished together, making it harder to pick out the pieces.)The diagram below shows the most important functional blocks of the ARM chip.[2] The actual processing happens in the bottom half of the chip, which implements the data path. The chip operates on 32 bits at a time so it is structured as 32 horizontal layers: bit 31 at the top, down to bit 0 at the bottom. Several data buses run horizontally to connect different sections of the chip. The large register file, with 25 registers, stands out in the image. The Program Counter (register 15) is on the left of the register file and register 0 is on the right.[3]
Computation takes place in the ALU (arithmetic-logic unit), which is to the right of the registers. The ALU performs 16 different operations (add, add with carry, subtract, logical AND, logical OR, etc.) It takes two 32-bit inputs and produces a 32-bit output. The ALU is described in detail here.[4] To the right of the ALU is the 32-bit barrel shifter. This large component performs a binary shift or rotate operation on its input, and is described in more detail below. At the left is the address circuitry which provides an address to memory through the address pins. At the right data circuitry reads and writes data values to memory.
Above the datapath circuitry is the control circuitry. The control lines run vertically from the control section to the data path circuits below. These signals select registers, tell the ALU what operation to perform, and so forth. The instruction decode circuitry processes each instruction and generates the necessary control signals. The register decode block processes the register select bits in an instruction and generates the control signals to select the desired registers.[5]
The pins
The squares around the outside of the image above are the pads that connect the processor to the outside world. The photo below shows the 84-pin package for the ARM1 processor chip. The gold-plated pins are wired to the pads on the silicon chip inside the package.
 
Most of the pads are used for the address and data lines to memory. The chip has 26 address lines, allowing it to access 64MB of memory, and has 32 data lines, allowing it to read or write 32 bits at a time. The address lines are in the lower left and the data lines are in the lower right. As the simulator runs, you can see the address pins step through memory and the data pins read data from memory. The right hand side of the simulator shows the address and data values in hex, e.g. "A:00000020 D:e1a00271". If you know hex, you can easily match these values to the pin states.
Each corner of the chip has a power pin (+) and a ground pin (-), providing 5 volts to run the chip. Various control signals are at the top of the chip. In the simulator, it is easy to spot the the two clock signals that step the chip through its operations (below). The phase 1 and phase 2 clocks alternate, providing a tick-tock rhythm to the chip. In the simulator, the clock runs at a couple cycles per second, while the real chip has a 8MHz clock, more than a million times faster. Finally, note below the manufacturer's name "ACORN" on the chip in place of pin 82.
 
History of the ARM chip
The ARM1 was designed in 1985 by engineers Sophie Wilson (formerly Roger Wilson) and Steve Furber of Acorn Computers. The chip was originally named the Acorn RISC Machine and intended as a coprocessor for the BBC Micro home/educational computer to improve its performance. Only a few hundred ARM1 processors were fabricated, so you might expect ARM to be a forgotten microprocessor, a historical footnote of the 1980s. However, the original ARM1 chip led to the amazingly successful ARM architecture with more than 50 billion ARM chips produced. What happened?In the early 1980s, academic research suggested that instead of making processor instruction sets more complex, designers would get better performance from a processor that was simple but fast: the Reduced Instruction Set Computer or RISC.[6] The Berkeley and Stanford research papers on RISC inspired the ARM designers to choose a RISC design. In addition, given the small size of the design team at Acorn, a simple RISC chip was a practical choice.[7]
The simplicity of a RISC design is clear when comparing the ARM1 and Intel's 80386, which came out the same year: the ARM1 had about 25,000 transistors versus 275,000 in the 386.[8] The photos below show the two chips at the same scale; the ARM1 is 50mm2 compared to 104mm2 for the 386. (Twenty years later, an ARM7TDMI core was 0.1mm2; magnified at the same scale it would be the size of this square vividly illustrating Moore's law.)
Because of the ARM1's small transistor count, the chip used very little power: about 1/10 Watt, compared to nearly 2 Watts for the 386. The combination of high performance and low power consumption made later versions of ARM chip very popular for embedded systems. Apple chose the ARM processor for its ill-fated Newton handheld system and in 1990, Acorn Computers, Apple, and chip manufacturer VLSI Technology formed the company Advanced RISC Machines to continue ARM development.[9]
In the years since then, ARM has become the world's most-used instruction set with more than 50 billion ARM processors manufactured. The majority of mobile devices use an ARM processor; for instance, the Apple A8 processor inside iPhone 6 uses the 64-bit ARMv8-A. Despite its humble beginnings, the ARM1 made IEEE Spectrum's list of 25 microchips that shook the world and PC World's 11 most influential microprocessors of all time.
Looking at the low-level construction of the ARM1 chip
Getting back to the chip itself, the ARM1 chip is constructed from five layers. If you zoom in on the chip in the simulator, you can see the components of the chip, built from these layers. As seen below, the simulator uses a different color for each layer, and highlights circuits that are turned on. The bottom layer is the silicon that makes up the transistors of the chip. During manufacturing, regions of the silicon are modified (doped) by applying different impurities. Silicon can be doped positive to form a PMOS transistor (blue) or doped negative for an NMOS transistor (red). Undoped silicon is basically an insulator (black).
Polysilicon wires (green) are deposited on top of the silicon. When polysilicon crosses doped silicon, it forms the gate of a transistor (yellow). Finally, two layers of metal (gray) are on top of the polysilicon and provide wiring.[10] Black squares are contacts that form connections between the different layers.
For our purposes, a MOS transistor can be thought of as a switch, controlled by the gate. When it is on (closed), the source and drain silicon regions are connected. When it is off (open), the source and drain are disconnected. The diagram below shows the three-dimensional structure of a MOS transistor.
Like most modern processors, the ARM1 was built using CMOS technology, which uses two types of transistors: NMOS and PMOS. NMOS transistors turn on when the gate is high, and pull their output towards ground. PMOS transistors turn on when the gate is low, and pull their output towards +5 volts.
Understanding the register file
The register file is a key component of the ARM1, storing information inside the chip. (As a RISC chip, the ARM1 makes heavy use of its registers.) The register file consists of 25 registers, each holding 32 bits. This section describes step-by-step how the register file is built out of individual transistors.The diagram below shows two transistors forming an inverter. If the input is high (as below), the NMOS transistor (red) turns on, connecting ground to the output so the output is low. If the input is low, the PMOS transistor (blue) turns on, connecting power to the output so the output is high. Thus, the output is the opposite of the input, making an inverter.
Combining two inverters into a loop forms a simple storage circuit. If the first inverter outputs 1, the second inverter outputs 0, causing the first inverter to output 1, and the circuit is stable. Likewise, if the first inverter outputs 0, the second outputs 1, and the circuit is again stable. Thus, the circuit will remain in either state indefinitely, "remembering" one bit until forced into a different state.
To make this circuit into a useful register cell, read and write bus lines are added, along with select lines to connect the cell to the bus lines. When the write select line is activated, the pass connector connects the write bus to the inverter, allowing a new value to be overwrite the current bit. Likewise, pass transistors connect the bit to a read bus when activated by the corresponding select line, allowing the stored value to be read out.
To create the register file, the register cell above is repeated 32 times vertically for each bit, and 25 times horizontally to form each register. Each bit has three horizontal bus lines — the write bus and the two read buses — so there are 32 triples of bus lines. Each register has three vertical control lines — the write select line and two read select lines — so there are 25 triples of control lines. By activating the desired control lines, two registers can be read and one register can be written at a time.[11] When the simulator is running, you can see the vertical control lines activated to select registers, and you can see the data bits flowing on the horizontal bus lines.
By looking at a memory cell in the simulator, you can see which inverter is on and determine if the bit is a 0 or a 1. The diagram below shows a few register bits. If the upper inverter input is active, the bit is 0; if the lower inverter input is active, the bit is 1. (Look at the green lines above or below the bit values.) Thus, you can read register values right out of the simulator if you look closely.
The barrel shifter
The barrel shifter, which performs binary shifts, is another interesting component of the ARM1. Most instructions use the barrel shifter, allowing a binary argument to be shifted left, shifted right, or rotated by any amount (0 to 31 bits). While running the simulator, you can see diagonal lines jumping back and forth in the barrel shifter.The diagram below shows the structure of the barrel shifter. Bits flows into the shifter vertically with bit 0 on the left and bit 31 on the right. Output bits leave the shifter horizontally with bit 0 on the bottom and bit 31 on top. The diagonal lines visible in the barrel shifter show where the vertical lines are connected to the horizontal lines, generating a shifted output. Different positions of the diagonals result in different shifts. The upper diagonal line shifts bits to the left, and the lower diagonal line shifts bits to the right. For a rotation, both diagonals are active; it may not be immediately obvious but in a rotation part of the word is shifted left and part is shifted right.
Zooming in on the barrel shifter shows exactly how it works. It contains a 32 by 32 crossbar grid of transistors, each connecting one vertical line to one horizontal line. The transistor gates are connected by diagonal control lines; transistors along the active diagonal connect the appropriate vertical and horizontal lines. Thus, by activating the appropriate diagonals, the output lines are connected to the input lines, shifted by the desired amounts. Since the chip's input lines all run horizontally, there are 32 connections between input lines and the corresponding vertical bit lines.
The demonstration program
When you run the simulator, it executes a short hardcoded program that performs shifts of increasing amounts. You don't need to understand the code, but if you're curious it is:
0000  E1A0100F mov     r1, pc        @ Some setup
0004  E3A0200C mov     r2, #12
0008  E1B0F002 movs    pc, r2
000C  E1A00000 nop
0010  E1A00000 nop
0014  E3A02001 mov     r2, #1        @ Load register r2 with 1
0018  E3A0100F mov     r1, #15       @ Load r1 with value to shift
001C  E59F300C ldr     r3, pointer
    loop:
0020  E1A00271 ror     r0, r1, r2    @ Rotate r1 by r2 bits, store in r0
0024  E2822001 add     r2, r2, #1    @ Add 1 to r2
0028  E4830004 str     r0, [r3], #4  @ Write result to memory
002C  EAFFFFFB b       loop          @ Branch to loop
Inside the loop, register r1 (0x000f) is rotated to the right by r2 bit positions and the result is stored in register r0.
Then r2 is incremented and the shift result written to memory.
As the simulator runs, watch as r2 is incremented and as r0 goes through the various values of 4 bits rotated. The A and D values show the address and data pins as instructions are read from memory.
The changing shift values are clearly visible in the barrel shifter, as the diagonal line shifts position. If you zoom in on the register file, you can read out the values of the registers, as described earlier.
Conclusion
The ARM1 processor led to the amazingly successful ARM processor architecture that powers your smart phone. The simple RISC architecture of the ARM1 makes the circuitry of the processor easy to understand, at least compared to a chip such as the 386.[12] The ARM1 simulator provides a fascinating look at what happens inside a processor, and hopefully this article has helped explain what you see in the simulator.
P.S. If you want to read more about ARM1 internals, see Dave Mugridge's series of posts:
Inside the armv1 Register Bank
Inside the armv1 Register Bank - register selection
Inside the armv1 Read Bus
Inside the ALU of the armv1 - the first ARM microprocessor
Notes and references
[1] I should make it clear that I am not part of the Visual 6502 team that built the ARM1 simulator. More information on the simulator is in the Visual 6502 team's blog post The Visual ARM1.[2] The block diagram below shows the components of the chip in more detail. See the ARM Evaluation System manual for an explanation of each part.
[3] You may have noticed that the ARM architecture describes 16 registers, but the chip has 25 physical registers. There are 9 "extra" registers because there are extra copies of some registers for use while handling interrupts.
Another interesting thing about the register file is the PC register is missing a few bits. Since the ARM1 uses 26-bit addresses, the top 6 bits are not used. Because all instructions are aligned on a 32-bit boundary, the bottom two address bits in the PC are always zero. These 8 bits are not only unused, they are omitted from the chip entirely.
[4] The ALU doesn't support multiplication (added in ARM 2) or division (added in ARMv7).
[5] A bit more detail on the decode circuitry. Instruction decoding is done through three separate PLAs. The ALU decode PLA generates control signals for the ALU based on the four operation bits in the instruction. The shift decode PLA generates control signals for the barrel shifter. The instruction decode PLA performs the overall decoding of the instruction. The register decode block consists of three layers. Each layer takes a 4-bit register id and activates the corresponding register. There are three layers because ARM operations use two registers for inputs and a third register for output.
[6] In a RISC computer, the instruction set is restricted to the most-used instructions, which are optimized for high performance and can typically execute in a single clock cycle. Instructions are a fixed size, simplifying the instruction decoding logic. A RISC processor requires much less circuitry for control and instruction decoding, leaving more space on the chip for registers. Most instructions operate on registers, and only load and store instructions access memory. For more information on RISC vs CISC, see RISC architecture.
[7] For details on the history of the ARM1, see Conversation with Steve Furber: The designer of the ARM chip shares lessons on energy-efficient computing.
[8] The 386 and the ARM1 instruction sets are different in many interesting ways. The 386 has instructions from 1 byte to 15 bytes, while all ARM1 instructions are 32-bits long. The 386 has 15 registers - all with special purposes, while the ARM1 has 25 registers, mostly general-purpose. 386 instructions can usually operate on memory, while ARM1 instructions operate on registers except for load and store. The 386 has about 140 different instructions, compared to a couple dozen in the ARM1 (depending how you count). Take a look at the 386 opcode map to see how complex decoding a 386 instruction is. ARM1 instructions fall into 5 categories and can be simply decoded. (I'm not criticizing the 386's architecture, just pointing out the major architectural differences.)
See the Intel 80386 Programmer's Reference Manual and 80386 Hardware Reference Manual for more details on the 386 architecture.
[9] Interestingly the ARM company doesn't manufacture chips. Instead, the ARM intellectual property is licensed to hundreds of different companies that build chips that use the ARM architecture. See The ARM Diaries: How ARM's business model works for information on how ARM makes money from licensing the chip to other companies.
[10] The first metal layer in the chip runs largely top-to-bottom, while the second metal layer runs predominantly horizontally. Having two layers of metal makes the layout much simpler than single-layer processors such as the 6502 or Z-80.
[11] In the register file, alternating bits are mirrored to simplify the layout. This allows neighboring bits to share power and ground lines. The ARM1's register file is triple-ported, so two register can be read and one register written at the same time. This is in contrast to chips such as the 6502 or Z-80, which can only access registers one at a time.
[12] For more information on the ARM1 internals, the book VLSI Risc Architecture and Organization by ARM chip designer Steven Furber has a hundred pages of information on the ARM chip internals. An interesting slide deck is A Brief History of ARM by Lee Smith, ARM Fellow.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
18 comments:
Interesting read - as always.
Unfortunately the link in 2 to the ARM Eval System manual does not work for me (404) and the bitsavers page lists it as "no copies known to be online". Fortunately I found another source (although it could be a mirror of the first and disappear too): http://bitsavers.informatik.uni-stuttgart.de/pdf/acorn/ARM_Evaluation_Sys_Aug86.pdf
I'm curious about the clocking scheme. Apparently two-phase clocks are common in ICs from this era. What's the reason for this though? The first thing that came to my mind was to overcome pad speed limitations, although this could be solved by a XOR combining both phi1 and 2 signals inside the chip - which is not the case here.
Sven: two-phase clocks are used internally because the circuits are level-triggered, not edge-triggered. So you need one clock phase to let data flow from A to B, and then another to let data flow from B to A. (Hopefully that makes sense.) The timing of the phases is tricky - they can't overlap or else you get a cycle. Some chips (e..g. 6502) generate the two phases internally, but others take the easy way of using two clock pins and having the clock logic externally.
Also, thanks for mentioning the broken link - apparently bitsavers doesn't support https.
I just wanted to thank you for all the articles you write. I make my living on the software side of things (web development), but coming of age in the 80s, I always had an interest in how the hardware works -- practically a necessity in the 8-bit computing world (I cut my teeth on a Radio Shack Color Computer). I can still remember how there were these mysterious PEEK and POKE commands, and for the longest time, I wondered how/why they worked, and how people figured out what values to poke into which addresses. Then I started learning assembly language. disassembling ROMs, and learning more about the hardware beyond the CPU. And one day, I guess I saw an extended memory map of the computer, and how some of the addresses corresponded to hardware functions, and things finally *snapped* into place. I started college as a EE, but quickly changed to CompSci. But I've always kept an interest in what's going on in the hardware underneath, even down to the silicon. So all of the time you spend on your tear-downs and explanations is greatly appreciated -- I always finish up knowing a little bit more than I did before!
>> "Since the ARM uses 26-bit addresses"
Perhaps it would be better stated as "Since early ARM processors used 26-bit addresses".
It might be helpful to readers to note that ARM processors moved to full 32-bit addressing in around 1991 with the release of ARM6, ARM600, ARM610 (The Apple Newton was released with an ARM610 with full 32-bit addressing in 1993).
ARM has also released a 64-bit architecture named AArch64 as part of the ARMv8-A architecture, with the first volume shipments being in the Apple iPhone 5S released in 2013.
I like to think that the design was a judicious mix of custom and standard cell layout.
If a chip looks good it probably is good.
---Jamie
Very interesting, but I really wonder how it's possible to write this article without even stating Acorn successfully used its ARM processors in the Archimedes (the world fastest computer under €1500 at the time) and RISC PC range, and also manufactured the A4 laptop.
You name the U.S. Apple Newton, though. Strange.
Is it possible to get the markings from under the sticker on the chip?
I have some early ARM (either 1 or 2) in PLCC I would like to compare it too (mine are VL2333 from 1987)
Thanks!
RISC systems are very much alive and active. ALl of the IBM POWER (Performance Optimized With enhanced RISC) servers running AIX and Linux (including WATSON which won "Jeopardy!"), the PowerPC (made by Apple, IBM, and Motorola and used in the Apple PowerBooks and PowerMacs), the IBM RISC processor in the Mars Pathfinder spacecraft (the first on Mars!), and the Xbox 360 processor made by IBM are just a few of the large scale that RISC processors have in the home, business, and scientific computing world.
Hi John,
VL2333 is an early ARM2 (c1987) - ARM1 (c1985) was marked VC2588
Just incase it helps you identify any other old Acorn chips you may have - here are some other part numbers:
Early prototypes:
VC2588 - ARM1 (Autumn)
VC2304 - MEMC (Anna)
VC2303 - VIDC (Arabella)
VC2311 - IOC (Albion)
VL2333 - ARM2
Volume production:
VL86C010 - ARM2
VL86C020 - ARM3
VL86C110 - MEMC
VL86C210 - Unallocated?
VL86C310 - VIDC
VL86C410 - IOC
VY86C610 - ARM610
VY86C710 - ARM710
Cheers,
--John
If you want to buy a piece of history!
acorn a4 laptop
http://www.ebay.co.uk/itm/182075342752?ssPageName=STRK:MESELX:IT&_trksid=p3984.m1555.l2649
Xavier: Well, one of them is from a company that's now one of the most valuable in the world, and the other is from a company that hasn't existed since last century. The former also happened to be a revolutionary (if flawed) tablet computer from a company that's still making tablet computers with the same CPU family.
It's not surprising one of these is more interesting to people than the other. That's not unique to this blog. The Apple Newton also has a lengthy Wikipedia article, while the A4 laptop got 3 sentences at the bottom of the Acorn Archimedes page.
Apple isn't mentioned because it's in the US. It's mentioned because it made ARM successful.
"Despite its humble beginnings, the ARM1 made..." it into a pay-walled site and a link-rotted site. (end of funny pseudo-quote)
I could not find any good replacements for the sites (images were gone, yada), but I found a site which appears to be a shameless rip-off of the pcworld page, with text subtly changed to prevent detection.
Original text, but no images: https://www.pcworld.com/article/525281/influential-processors.html
Plagiarized text (even erroneous in some places) but with images: https://gigazine.net/
When searching for alternatives, don't visit gigazine!
Where is a full transistors schematic of this processor ?
The site is good, but I would like to see the full transistor-lever schematic of this ARM1 processor, which is not here.
Hey Ken, excellent article, and a great read. I realise I'm a bit late to the party, but would just like to point out a minor inaccuracy to your note 3.
The program counter (PC) in early ARM chips was R15, and was, for the most part, a general purpose register the same as other registers. As you point out, the top 6 bits and the bottom 2 are not used as part of the program counter, however those bits _are_ present on R15, and used as a processor status register. The top 6 bits were used for processor status flags, and the bottom two bits were used to indicate the processor mode (user, interrupt request, fast interrupt request, and supervisor). The processor mode would dictate which 15 registers from the file of 25 physical registers are in use.
As R15 was essentially a general purpose register this meant that it could be manipulated like any other register, allowing those status flags to be directly read/set/cleared.
This changed with the ARM6 series of CPUs (the series that came immediately after ARM3) - that introduced a full 32 bit address bus and moved the status register out of R15.
Just tried to post a comment, which appears to have gone walkabout, so in the event that it does appear sorry if this is kinda a duplicate.
Firstly thanks Ken for the great post - very informative and a great read.
I just wanted to make one quick comment/correction about your note 3. In that you state that the PC in the register file is missing a few bits, and those bits are omitted. That's not exactly accurate...
R15 on an ARM 1 (and ARM2 and ARM3) is where you'll find the PC, but that is also combined with the processor status register. As the early ARM chips as you point out only supported a 26 bit address bus, it's the upper 6 bits that get used for status flags. (Those upper bits are kinda special, as they get used for conditional instruction execution.) The bottom two bits get used to indicate the processor mode (User, IRQ, FIQ, Supervisor) which control how the 25 physical registers are mapped to the 16 active registers. So all 32 bits are actually present on R15.
For the most part, all registers on an ARM1 are general purpose, which means that the flag bits in R15 can be manipulated directly. Handy if you wanted to return from a subroutine and restore flags at the same time, for instance.
This all changed with the ARMv3 architecture, introduced with the ARM610 chip. That chip got a 32 bit address bus, so the PSR had to move out of R15. Additional instructions got added for PSR manipulation.
Steve: if you look at the chip schematic, the status bits aren’t there. They are part of architectural r15, but they aren’t part of the physical register.
Post a Comment