Ken Shirriff's blog: August 2016

PRU tips: Understanding the BeagleBone's built-in microcontrollers

The BeagleBone Black is an inexpensive, credit-card sized computer that has two built-in microcontrollers called PRUs. While the PRUs provide the real-time processing capability lacking in Linux, using these processors has a learning curve. In this article, I show how to run a simple program on the PRU, and then dive into the libraries and device drivers to show what is happening behind the scenes. Warning: this post uses the 3.8.13-bone79 kernel; many things have changed since then.

The BeagleBone uses the Sitara AM3358, an ARM processor chip running at 1 GHz—this is the thumbnail-sized chip in the center of the board below. If you want to perform real-time operations, the BeagleBone's ARM processor won't work well since Linux isn't a real-time operating system. However, the Sitara chip contains two 32-bit microcontrollers, called Programmable Realtime Units or PRUs. (It's almost fractal, having processors inside the processor.) By using a PRU, you can achieve fast, deterministic, real-time control of I/O pins and devices.

The BeagleBone computer is tiny. For some reason it was designed to fit inside an Altoids mint tin. The square thumbnail-sized chip in the center is the AM3358 Sitara processor chip. This chip contains an ARM processor as well as two 32-bit microcontrollers called PRUs.

The nice thing about using the PRU microcontrollers on the BeagleBone is that you get both real-time Arduino-style control, and "real computer" features such as a web server, WiFi, Ethernet, and multiple languages. The main processor on the BeagleBone can communicate with the PRUs, giving you the best of both worlds. The downside of the PRUs is there's a significant learning curve to use them since they have their own instruction set and run outside the familiar Linux world. Hopefully this article will help with the learning curve.

I wrote an article a couple weeks ago on The BeagleBone's I/O pins. That article discussed the pins controlled by the ARM processor, while this article focuses on the PRU microcontroller. If you think you've seen the present article before, they cover two different things, but I won't blame you for getting deja vu!

Running a "blink" program on the PRU

To motivate the discussion, I'll use a simple program that uses the PRU to flash an LED. This example is based on PRU GPIO example

Blinking an LED using the BeagleBone's PRU microcontroller.

The easiest way to compile and assemble PRU code is to do it on the BeagleBone, since the necessary tools are installed by default (at least if you get the Adafruit BeagleBone). Perform the following steps on the BeagleBone.

Connect an LED to BeagleBone header P8_11 through a 1K resistor.
Download the assembly code file blink.p:
Download the host file to load and run the PRU code, loader.c.
Download the device tree file, /lib/firmware/PRU-GPIO-EXAMPLE-00A0.dts.

Compile and install the device tree file to enable the PRU:

# dtc -O dtb -I dts -o /lib/firmware/PRU-GPIO-EXAMPLE-00A0.dtbo -b 0 -@ PRU-GPIO-EXAMPLE-00A0.dts
# echo PRU-GPIO-EXAMPLE > /sys/devices/bone_capemgr.?/slots
# cat /sys/devices/bone_capemgr.?/slots

Assemble blink.p and compile the loader:

# pasm -b blink.p
# gcc -o loader loader.c -lprussdrv

Run the loader to execute the PRU binary:
```
# ./loader blink.bin
```

If all goes well, the LED should blink 10 times.[1]

Documentation

The most important document that describes the Sitara chip is the 5041-page Technical Reference Manual (TRM for short). This article references the TRM where appropriate, if you want more information. Information on the PRU is inconveniently split between the TRM and the AM335x PRU-ICSS Reference Guide. For specifics on the AM3358 chip used in the BeagleBone, see the 253 page datasheet. Texas Instruments' has the PRU wiki with more information.

If you're looking to use the BeagleBone and/or PRU I highly recommend the detailed and informative book Exploring BeagleBone. Helpful web pages on the PRU include BeagleBone Black PRU: Hello World, Working with the PRU and BeagleBone PRU GPIO example.

The assembly code

If you're familiar with assembly code, the PRU's instruction set should be fairly straightforward. The instruction set is documented on the wiki and in the PRU reference manual (section 5).[2]

The demonstration blink code uses register R1 to count the number of blinks (10) and register R0 to provide the delay between flashes. The delay timing illustrates an important feature of the PRU: it is entirely deterministic. Most instructions takes 5 nanoseconds (at 200 MHz), although reads are slower. There is no pipelining, branch delays, memory paging, interrupts, scheduling, or anything else that interferes with instruction execution. This makes PRU execution predictable and suitable for real-time processing. (In contrast, the main ARM processor has all the scheduling and timing issues of Linux, making it unsuitable for real-time processing.)

Since the delay loop is two instructions long and executes 0xa00000 times (an arbitrary value that gave a visible delay), we can compute that each delay is 104.858 milliseconds, regardless of what else the processor is doing. (This is shown in the oscilloscope trace below.) A similar loop on the ARM processor would have variable timing, depending on system load.

LED output from the BeagleBone PRU demo, showing the 104ms oscillations.

The I/O pin

How do we know that pin P8_11 is controlled by bit 15 of R30? We're using one of the PRU output pins called "pr1_pru0_pru_r30_15", which is a PRU 0 output pin controlled by R30 bit 15. (Note that the PRU GPIO pins are separate from the "regular" GPIO pins.)[3] The BeagleBone header chart shows that this PRU I/O pin is connected to BeagleBone header pin P8_11.[4] To tell the BeagleBone we want to use this pin with the PRU, the device tree configuration is updated as discussed below.

Communication between the main processor and the PRU

The PRU microcontrollers are part of the processor chip, yet operate independently. In the blink demo, the main processor runs a loader program that downloads the PRU code to the microcontroller, signals the PRU to start executing, and then waits for the PRU to finish. But what happens behind the scenes to make this work? The PRU Linux Application Library (PRUSSDRV) provides the API between code running under Linux on the BeagleBone and the PRU.[5]

The prussdrv_exec_program() API call loads a binary program (in our case, the blink program) into the PRU so it can be executed. To understand how this works requires a bit of background on how memory is set up.

Each PRU microcontroller has 8K (yes, just 8K) of instruction memory, starting at address 0. Each PRU also has 8K of data memory, starting at address 0 (see TRM 4.3).[6] Since the main processor can't access all these memories at address 0, they are mapped into different parts of the main processor's memory as defined by the "global memory map".[7] For instance, PRU0's instruction memory is accessed by the host starting at physical address 0x4a334000. Thus, to load the binary code into the PRU, the API code copies the PRU binary file to that address.[8]

Then, to start execution, the API code starts the PRU running by setting the enable bit in the PRU's control register, which the host can access at a specific physical address (see TRM 4.5.1.1).

To summarize, the loader program running on the main processor loads the executable file into the PRU by writing it to special memory addresses. It then starts up the PRU by writing to another special memory address that is the PRU's control register.

The Userspace I/O (UIO) driver

At this point, you may wonder how the loader program can access to physical memory and low-level chip registers - usually is not possible from a user-level program. This is accomplished through UIO, the Linux Userspace I/O driver (details). The idea behind a UIO driver is that many devices (the PRU in our case) are controlled through a set of registers. Instead of writing a complex kernel device driver to control these registers, simply expose the registers to user code, and then user-level code can access the registers to control the device. The advantage of UIO is that user-level code is easier to write and debug than kernel code. The disadvantage is there's no control over device usage—buggy or malicious user-level code can easily mess up the device operation.[9]

For the PRU, the uio_pruss driver provides UIO access. This driver exposes the physical memory associated with the PRU as a device /dev/uio0. The user-level PRUSSDRV API library code does a mmap on this device to map the address space into its own memory. This gives the loader code access to the PRU memory through addresses in its own address space. The uio_pruss driver exposes information on the mapping to the user-level code through the directory /sys/class/uio/uio0/maps.

Another photo of the BeagleBone Black single-board computer inside an Altoids mint tin. At this point in the article, you may need a break from the text.

Device Tree

How does the uio_pruss driver know the address of the PRU? This is configured through the device tree, a complicated set of configuration files used to configure the BeagleBone hardware. I won't go into the whole explanation of device trees, but just the parts relevant to our story.[10] The pruss entry in the device tree is defined in am33xx.dtsi. Important excerpts are:

pruss: pruss@4a300000 {
 compatible = "ti,pruss-v2";
 reg = <0x4a300000 0x080000>;
 status = "disabled";
 interrupts = <20 21 22 23 24 25 26 27>;
};

The address 4a300000 in the device tree corresponds to the PRU's instruction/data/control space in the processor's address space. (See TRM Table 2-4 and section 4.3.2.) This address is provided to the uio_pruss driver so it knows what memory to access.

The "ti,pruss-v2" compatible line causes the matching uio_pruss driver to be loaded. Note that the status is "disabled", so the PRU is disabled by default until enabled in a device tree overlay.

The "interrupts" line specifies the eight ARM interrupts that are handled by the driver; these will be discussed in the Interrupts section below.

Device tree overlay

To enable the PRU and the GPIO output pin, we needed to load the PRU-GPIO-EXAMPLE-00A0.dts device tree overlay file.[11] This file is discussed in detail at credentiality, so I'll just give some highlights.

As described earlier, each header pin has eight different functions, so we need to tell the pin multiplexer driver which of the eight modes to use for the pin P8_11. Looking at the P8 header diagram, pin P8_11 has internal address 0x34 and has the PRU output function in mode 6. This is expressed in the overlay file:

exclusive-use = "P8.11", "pru0";
...
    target = <&am33xx_pinmux>;
...
       pinctrl-single,pins = <
         0x34 0x06

The above entry extends the am33xx_pinmux device tree description and is passed to the pinctrl-single driver, which configures the pins as desired by updating the appropriate chip control registers.

The device tree file also overrides the previous pruss entry, changing the status to "okay", which enables the uio_pruss device driver:

    target = <&pruss>;
    __overlay__ {
      status = "okay";

The cape manager

This device tree overlay was loaded with the command "echo PRU-GPIO-EXAMPLE > /sys/devices/bone_capemgr.?/slots". What does this actually do?

The idea of the cape manager is if you plug a board (called a cape) onto the BeagleBone, the cape manager reads an EEPROM on the board, figures out what resources the board needs, and automatically loads the appropriate device tree fragments (from /lib/firmware) to configure the board (source, details, documentation). The cape manager is configured in the device tree file am335x-bone-common.dtsi.

Examples of BeagleBone capes. Image from beaglebone.org, CC BY-SA 3.0.

In our case, we aren't installing a new board, but just want to manually install a new device tree file. The cape manager exposes a bunch of files in /sys/devices/bone_capemgr.*. Sending a name to the file "slots" causes the cape manager to load the corresponding device tree file. This enables the PRU and the desired output pin.

Interrupts

Interrupts can be used to communicate between the PRU code and the host code. At the end of the PRU blink code, it sends an interrupt to the host by writing 35 to register R31. This wakes up the loader program, which is waiting with the API call "prussdrv_pru_wait_event(PRU_EVTOUT_0)". This process seems like it should be straightforward, but it's surprisingly complex. In this section, I'll explain how PRU interrupts work.

I'll start with the host (loader program) side. The PRU can send eight different events to the host program. The prussdrv_pru_wait_event(N) call waits for event N (0 through 7) by reading from /dev/uioN. This read will block until an interrupt happens, and then return. (This seems like a strange way to receive interrupts, but using the file system fits the Unix philosophy.)[12]

If you look in the file system, there are 8 uio files: /dev/uio0 through /dev/uio7. You might expect /dev/uio1 provides access to PRU1, but that's not the case. Instead, there are 8 event channels between the PRUs and the main processor, allowing 8 different types of interrupts. Each one of these has a separate /dev/uioN file; reading from the file waits for the corresponding event, labeled PRU_EVTOUT0 through PRU_EVTOUT7.

Jumping now to the PRU side, the interrupt is triggered by a write to R31. Register R31 is a special register - writing to it sends an event to the interrupt system.[13] The blink code generates PRU internal event 3 (with the crazy name pr1_pru_mst_intr[3]_intr_req), which is mapped to system event 19.

The following diagram shows how an event (right) gets mapped to a channel, then a host, and finally generates an interrupt (left). In our case, system event 19 from the PRU is given the label PRU0_ARM_INTERRUPT. This interrupt is mapped to channel 2, then host 2 which goes to ARM interrupt PRU_EVTOUT0, which is ARM interrupt 20 (see TRM 6.3). The device tree configured the pruss_uio driver to handle events 20 through 27, so the driver receives the interrupt and unblocks /dev/uio0, informing the host process of the PRU event.

Interrupt handling on the BeagleBone between the PRU microcontrollers and the ARM processor. System events (right) are mapped to channels, then hosts, finally generating interrupts (left).

How does the above complex mapping get set up? You might guess the device tree, but it's done by the PRUSSDRV user-level library code. The interrupt configuration is defined in PRUSS_INTC_INITDATA, a struct that defines the various mappings shown above. This struct is passed to prussdrv_pruintc_init(), which writes to the appropriate PRU interrupt registers (mapped into memory).[14]

Next steps

A real application would use a host program much more powerful than the example loader. For instance, the host program could run a web server and actively communicate with the PRU. Or the host program could do computation on data received from the PRU. Such a system uses the ARM processor to get all the advantages of Linux, and the PRU microcontrollers to perform real-time activities impossible in Linux.

It's easier to program the PRU in C using an IDE. That's a topic for another post, but for now you can look here, here, here or here for information on using the C compiler.

Conclusion

The PRU microcontrollers give the BeagleBone real-time, deterministic processing. Combining the BeagleBone's full Linux system with the PRU microcontrollers yields a very powerful system. The Linux side can provide powerful processing, network connectivity, web servers, and so forth, while the PRUs can interface with the external world. If you're using an Arduino but want more (Ethernet, web connectivity, USB, WiFi), the BeagleBone is an alternative to consider.

There is a significant learning curve to using the PRU microcontrollers, however. Much of what happens may seem like magic, hidden behind APIs and device tree files. Hopefully this article has filled in the gaps, explaining what happens behind the scene.

Notes and references

[1] If you get the error "prussdrv_open() failed" the problem is the PRU didn't get enabled. Make sure you loaded the device tree file into the cape manager.

[2] One of the instructions in the PRUs instruction set is LBBO (load byte burst), which reads a block of memory into the register file. That sounded to me like an obscure instruction I'd never need, but it turns out that this instruction is used for any memory reads. So don't make my mistake of ignoring this instruction!

[3] To understand the pin name "pr1_pru0_pru_r30_15": pru0 indicates the pin is controlled by PRU 0 (rather than PRU 1), r30 indicates the pin is an output pin, controlled by the output register R30 (as opposed to the input register R31), and 15 indicates I/O pin 15, which is controlled by bit 15 of the register. A mnemonic to remember that R30 is the output register and R31 is the input register: 0 looks like O for output, 1 looks like I for input.

[4] Most PRU pins conflict with HDMI or another subsystem, so you need to disable HDMI to use them. To avoid this problem, I picked one of the few non-conflicting PRU pins. Disabling HDMI is described here and here. See this chart for a list of available PRU pins and potential conflicts.

[5] The PRU library is documented in PRU Linux Application Loader API Guide. The library is often installed on the BeagleBone by default but can also be downloaded in the am335x_pru_package along with PRU documentation and some coding examples. Library source code is here.

[6] The PRU, like many microcontrollers, uses a Harvard architecture, with independent memory for code and data.

[7] In the main processor's address space, PRU0's data RAM starts at address 0x4a300000, PRU1's data RAM starts at 0x4a302000, PRU0's instruction RAM starts at 0x4a334000, and PRU1's instruction RAM starts at 0x4a338000. (See TRM Table 2-4 and section 4.3.2.) The PRU control registers also have addresses in the main processor's physical address space; PRU0's control registers start at address 0x4a322000 and PRU1's at 0x4a324000.

[8] The API uses prussdrv_pru_write_memory() to copy the contents into the memory-mapped region for PRU's instruction memory. The prussdrv_load_datafile API call is used to load a file into the PRU's data memory.

[9] Note that using Userspace I/O to control the PRU is the exact opposite philosophy of how the regular GPIO pins on the BeagleBone are controlled. Most of the BeagleBone's I/O pins are controlled through complex device drivers that provide access through simple file system interfaces. This is easy to use: just write 1 to /sys/class/gpio/gpio60/value to turn on GPIO 60. On the other hand, the BeagleBone gives you raw access to the PRU control registers - you can do whatever you want, but you'll need to figure out how to do it yourself. It's a bit strange that two parts of the BeagleBone have such wildly different philosophies.

[10] Device trees are discussed in more detail in my previous BeagleBone article. Also see A Tutorial on the Device Tree, Device Tree for Dummies, or Introduction to the BeagleBone Black Device Tree.

[11] Why is the example device tree overlay (and most device tree files) labeled with version 00A0? I found out (thanks to Robert Nelson) that A0 is a common initial version for contract hardware. So 00A0 is the first version of something, followed by 00A1, etc.

[12] The blink example only sends events from the PRU to the host, but what about the other direction? The host can send an event (e.g. ARM_PRU0_INTERRUPT, 21) to a PRU using prussdrv_pru_send_event(). This writes the PRU's SRSR0 (System Event Status Raw/Set Register, TRM 4.5.3.13) to generate the event. The interrupt diagram shows this event flows to R31 bit 30. The PRUs don't receive interrupts as such, but must poll the top two bits of R31 to see if an interrupt signal has arrived.

[13] The PRU has 64 system events for all the different things that can trigger interrupts, such as timers or I/O (see TRM 4.4.2.2). When generating an event with R31, bit 5 triggers an event, while the lower 4 bits select the event number (see TRM 4.4.1.2). PRU system event numbers are offset by 16 from the internal event numbers, so internal event 3 is system event 19. See PRU interrupts for the full list of events.

[14] Multiple PRU registers are set up to initialize interrupt handling. The CMR (channel map registers, TRM 4.5.3.20) map from each system event to a channel. The HMR (host interrupt map registers, TRM 4.5.3.36) map from a channel to a host interrupt. The PRU interrupt registers (INTC) start at address 0x4a320000 (TRM Table 4-8).

The BeagleBone's I/O pins: inside the software stack that makes them work

The BeagleBone is a inexpensive, credit-card sized computer with many I/O pins. These pins can be easily controlled from software, but it can be very mysterious what is really happening. To control a general purpose input/output (GPIO) pin, you simply write a character to a special file and the pin turns on or off. But how do these files control the hardware pins? In this article, I dig into the device drivers and hardware and explain exactly what happens behind the scenes. Warning: this post uses the 3.8.13-bone79 kernel; many things have changed since then. (Various web pages describe the GPIO pins, but if you just want a practical guide of how to use the GPIO pins, I recommend the detailed and informative book Exploring BeagleBone.)

This article focuses on the BeagleBone Black, the popular new member of the BeagleBoard family. If you're familiar with the Arduino, the BeagleBone is much more complex; while the Arduino is a microcontroller, the BeagleBone is a full computer running Linux. If you need more than an Arduino can easily provide (more processing, Ethernet, WiFi), the BeagleBone may be a good choice.

Beaglebone Black single-board computer. Photo by Gareth Halfacree, CC BY-SA 2.0

The BeagleBone uses the Sitara AM3358 processor chip running at 1 GHz - this is the thumbnail-sized chip in the center of the board above. This chip is surprisingly complicated; it appears they threw every feature someone might want into the chip. The diagram below shows what is inside the chip: it includes a 32-bit ARM processor, 64KB of memory, a real-time clock, USB, Ethernet, an LCD controller, a 3D graphics engine, digital audio, SD card support, various networks, I/O pins, an analog-digital converter, security hardware, a touch screen controller, and much more.[1] To support real-time applications, the Sitara chip also includes two separate 32-bit microcontrollers (on the chip itself - processors within processors!).

Functional diagram of the complex processor powering the BeagleBone Black. The TI AM3358 Sitara processor contains many functional units. Diagram from Texas Instruments.

The main document that describes the Sitara chip is the Technical Reference Manual, which I will call the TRM for short. This is a 5041 page document describing all the feature of the Sitara architecture and how to control them. But for specifics on the AM3358 chip used in the BeagleBone, you also need to check the 253 page datasheet. I've gone through these documents, so you won't need to, but I reference the relevant sections in case you want more details.

Using a GPIO pin

The chip's pins can be accessed through two BeagleBone connectors, called P8 and P9. To motivate the discussion, I'll use an LED connected to GPIO 49, which is on pin 23 of header P9 (i.e. P9_23). (How do you know this pin is GPIO 49? I explain that below.) Note that the Sitara chip's outputs can provide much less current than the Arduino, so be careful not to overload the chip; I used a 1K resistor.

To output a signal to this GPIO pin, simply write some strings to some files:

# echo 49 > /sys/class/gpio/export
# echo out > /sys/class/gpio/gpio49/direction
# echo 1 > /sys/class/gpio/gpio49/value

The first line causes a directory for gpio49 to be created. The next line sets the pin mode as output. The third line turns the pin on; writing 0 will turn the pin off.

This may seem like a strange way to control the GPIO pins, using special file system entries. It's also strange that strings ("49", "out", and "1") are used to control the pin. However, the file-based approach fits with the Unix model, is straightforward to use from any language, avoids complex APIs, and hides the complexity of the chip.

The file-system approach is very slow, capable of toggling a GPIO pin at about 160 kHz, which is rather awful performance for a 1 GHz processor.[2] In addition, these oscillations may be interrupted for several milliseconds if the CPU is being used for other tasks, as you can see from the image below. This is because file system operations have a huge amount of overhead and context switches before anything gets done. In addition, Linux is not a real-time operating system. While the processor is doing something else, the CPU won't be toggling the pins.

An I/O pin can be toggled to form an oscillator. Unfortunately, oscillations can stop for several milliseconds if the processor is doing something else.

The moral here is that controlling a pin from Linux is fine if you can handle delays of a few milliseconds. If you want to produce an oscillation, don't manually toggle the pin, use a PWM (pulse width modulator) output instead. If you need more complex real-time outputs, use one of the microcontrollers (called a PRU) inside the processor chip.

What's happening internally with GPIOs?

Next, I'll jump to the low level, explaining what happens inside the chip to control the GPIO pins. A GPIO pin is turned on or off when a chip register at a particular address is updated. To access memory directly, you can use devmem2, a simple program that allows a memory register to be read or written. Using the devmem2 program, we can turn the GPIO pin first on and then off by writing a value to the appropriate register (assuming the pin has been initialized). The first number is the address, and the second number is value written to the address.

devmem2 0x4804C194 w 0x00020000
devmem2 0x4804C190 w 0x00020000

What are all these mystery numbers? GPIO 49 is also known as GPIO1_17, the 17th pin in the GPIO1 bank (as will be explained later). The value written to the register, 0x00020000, is a 1 bit shifted left 17 positions (1<<17) to control pin 17. The address 4804C194 is the address of the register used to turn on a GPIO pin. Specifically it is SETDATAOUT register for the 32 GPIO1 pins. By writing a bit to this register, the corresponding pin is turned on. Similarly, the address 4804C190 is the address of the CLEARDATAOUT register, which turns a pin off.

How do you determine these register addresses? It's all defined in the TRM document if you know where to look. The address 4804C000 is the starting address of GPIO1's registers (see Table 2-3 in the TRM). The offset 194h is the offset of the GPIO_SETDATAOUT register, and 190h is the GPIO_CLEARDATAOUT register (see TRM 25.4.1). Combining these, we get the address 4804C194 for GPIO1's SETDATAOUT and 4804C190 for CLEARDATAOUT. Writing a 1 bit to the SETDATAOUT register will set that GPIO, while leaving others unchanged. Likewise, writing a 1 bit to the CLEARDATOUT register will clear that GPIO. (Note that this design allows the selected register to be modified without risk of modifying other GPIO values.)

Putting this all together, writing the correct bit pattern to the address 0x4804C194 or 0x4804C190 turns the GPIO pin on or off. Even if you use the filesystem API to control the pin, these registers are what end up controlling the pin.

How does devmem2 work? It uses Linux's /dev/mem, which is a device file that is an image of physical memory. devmem2 maps the relevant 4K page of /dev/mem into the process's address space and then reads or writes the address corresponding to the desired physical address.

By using mmap and toggling the register directly from a C++ program, the GPIO can be toggled at about 2.8 MHz, much faster than the device driver approach.[3] If you think this approach will solve your problems, think again. As before, there are jitter and multi-millisecond dropouts if there is any other load on the system.

The names of pins

Confusingly, each pin has multiple names and numbers. For instance, pin 23 on header 9 is GPIO1_17, which is GPIO 49, which is pin entry 17 (unrelated to GPIO1_17) which is pin V14 on the chip itself. Meanwhile, the pin has 7 other functions including GMPC_A1, which is the name used for the pin in the documentation. This section explains the different names and where they come form. (If you just want to know the pin names, see the diagrams P8 header and P9 header from Exploring Beaglebone.)

The first complication is that each pin has eight different functions since the processor has many more internal functions than physical pins.[4] (The Sitara chip has about 639 I/O signals, but only 124 output pins available for these signals.) The solution is that each physical pin has eight different internal functions mapped to it. For each pin, you select a mode 0 through 7, which selects which of the eight possible functions you want.

Figuring out "from scratch" which functions are on each BeagleBone header pin is tricky and involves multiple documents. To start, you need to determine what functions are on each physical pin (actually ball) of the chip. Table 4-1 in the chip datasheet shows which signals are associated with each physical pin (see the "ZCZ ball number"). (Or see section 4.3 for the reverse mapping, from the signal to the pin.) Then look at the BeagleBone schematic to see the connection from each pin on the chip (page 3) to an external connector (page 11).

For instance, consider the GPIO pin used earlier. With the ZCZ chip package, chip ball V14 is called GPMC_A1 (in mode 0 this pin is used for the General Purpose Memory Controller). The Beaglebone documentation names the pin with the the more useful mode 7 name "GPIO1_17", indicating GPIO 17 in GPIO bank 1. (Each GPIO bank (0-3) has 32 pins. Bank 0 is numbered 0-31, bank 1 is 32-63, and so forth. So pin 17 in bank 1 is GPIO number 32+17=49.) The schematic shows this chip ball is connected to header P9 pin 23, earning it the name P9_23. This name is relevant when connecting wires to the board. It is also the name used in BeagleScript (discussed later).

Another pin identifier is the sequential pin number based on the the pin control registers. (This number is indicated under the column $PINS in the header diagrams.) Table 9-10 in the TRM lists all the pins in order along with their control address offsets. Inconveniently, the pin names are the mode 0 names (e.g. conf_gpmc_a1), not the more relevant names (e.g. gpio1_17). From this table, we can determine that the offset of con_gpmc_a1 is 844h. Counting the entries (or computing (844h-800h)/4) shows that this is pin 17 in the register list. (It's entirely coincidental that this number 17 matches GPIO1_17). This number is necessary when using the pin multiplexer (described later).

How writing to a file toggles a GPIO

When you write a string to /sys/class/gpio/gpio49/value, how does that end up modifying the GPIO? Now that some background has been presented, this can be explained. At the hardware level, toggling a GPIO pin simply involves setting a bit in a control register, as explained earlier. But it takes thousands of instructions in many layers to get from writing to the file to updating the register and updating the GPIO pin.

The write goes through the standard Linux file system code (user library, system call code, virtual file system layer) and ends up in the sysfs virtual file system. Sysfs is an in-memory file system that exposes kernel objects through virtual files. Sysfs will dispatch the write to the gpio device driver, which processes the request and updates the appropriate GPIO control register.

In more detail, the /sys/class/gpio filesystem entries are provided by the gpio device driver (documentation for sysfs and gpio). The main gpio driver code is gpiolib.c. There are separate drivers for many different types of GPIO chips; the Sitara chip (and the related OMAP family) uses gpio-omap.c. The Linux code is in C, but you can think of gpiolib.c as the superclass, with subclasses for each different chip. C function pointers are used to access the different "methods".

The gpiolib.c code informs sysfs of the various files to control GPIO pin attributes (/active_low, /direction/, /edge, /value), causing them to appear in the file system for each GPIO pin. Writes to the /value file are linked to the function gpio_value_store, which parses the string value, does error checking and calls gpio_set_value_cansleep, which calls chip->set(). This function pointer is where handling switches from the generic gpiolib.c code to the device-specific gpio-omap.c code. It calls gpio_set, which calls _set_gpio_dataout_reg, which determines the register and bit to set. This calls raw_writel, which is inline ARM assembler code to do a STR (store) instruction. This is the point at which the GPIO control register actually gets updated, changing the GPIO pin value.[5] The key point is that a lot of code runs to get from the file system operation to the actual register modification.

How does the code know the addresses of the registers to update? The file gpio-omah.h contains constants for the GPIO register offsets. Note that these are the same values we used earlier when manually updating the registers.

#define OMAP4_GPIO_CLEARDATAOUT  0x0190
#define OMAP4_GPIO_SETDATAOUT  0x0194

But how does the code know that the registers start at 0x4804C000? And how does the system know this is the right device driver to use? These things are specified, not in the code, but in a complex set of configuration files known as Device Trees, explained next.

Device Trees

How does the Linux kernel know what features are on the BeagleBone? How does it know what each pin does? How does it know what device drivers to use and where the registers are located? The BeagleBone uses a Linux feature called the Device Tree, where the hardware configuration is defined in a device tree file.

Linux used to define the hardware configuration in the kernel. But each new ARM chip variant required inconvenient kernel changes, which led to Linus Torvald's epic rant on the situation. The solution was to move this configuration out of kernel code and into files known as the Device Tree, which specifies the hardware associated with the device. This switch, in the 3.8 kernel, is described here.

As if device trees weren't complex enough, the next problem was that BeagleBone users wanted to change pin configuration dynamically. The solution to this was device tree overlays, which allow device tree files to modify other device tree files. With a device tree overlay, the base hardware configuration can be modified by configuration in the overlay file. Then the Capemgr was implemented to dynamically load device tree fragments, to manage the installation of BeagleBoard daughter cards (known as capes).

I won't go into the whole explanation of device trees, but just the parts relevant to our story. For more information see A Tutorial on the Device Tree, Device Tree for Dummies, or Introduction to the BeagleBone Black Device Tree.

The BeagleBone's device tree is defined starting with am335x-boneblack.dts, which includes am33xx.dtsi and am335x-bone-common.dtsi.

The relevant lines are in am33xx.dtsi:

gpio1: gpio@44e07000 {
  compatible = "ti,omap4-gpio";
  ti,hwmods = "gpio1";
  gpio-controller;
  interrupt-controller;
  reg = <0x44e07000 0x1000>;
  interrupts = <96>;
};

The "compatible" line is very important as it causes the correct device driver to be loaded. While processing a device tree file, the kernel checks all the device drivers to find one with a matching "compatible" line. In this case, the winning device driver is gpio-omap.c, which we looked at earlier.

The other lines of interest specify the register base address 44e07000. This is how the kernel knows where to find the necessary registers for the chip. Thus, the Device Tree is the "glue" that lets the device drivers in the kernel know the specific details of the registers on the processor chip.

BoneScript

One easy way to control the BeagleBone pins is by using JavaScript along with BoneScript, a Node.js library. The BoneScript API is based on the Arduino model. For instance, the following code will turn on the GPIO on pin 23 of header P9.

var b = require('bonescript');
b.pinMode('P9_23', b.OUTPUT);
b.digitalWrite('P9_23', b.HIGH);

Using BoneScript is very slow: you can toggle a pin at about 370 Hz, with a lot of jitter and the occasional multi-millisecond gap. But for programs that don't require high speed, BoneScript provides a convenient programming model, especially for applications with web interaction.

For the most part, the BoneScript library works by reading and writing the file system pseudo-devices described earlier. You might expect BoneScript to have some simple code to convert the method calls to the appropriate file system operations, but BoneScript is surprisingly complex. The first problem is BoneScript supports different kernel versions with different cape managers and pin multiplexers, so it implements everything four different ways (source).

A bit surprise is that BoneScript generates and installs new device tree files as a program is running. In particular, for the common 3.8.13 kernel, BoneScript creates a new device tree overlay (e.g. /lib/firmware/bspm_P9_23_2f-00A0.dts) on the fly from a template, runs it through the dtc compiler and installs the device tree overlay through the cape manager by writing to /sys/devices/bone_capemgr.N/slots (source). That's right, when you do a pinMode() operation in the code, BoneScript runs a compiler!

Conclusion

The BeagleBone's GPIO pins can be easily controlled through the file system, but a lot goes on behind the scenes, making it very mysterious what is actually happening. Examining the documentation and the device drivers reveals how these file system writes affect the pins by writing to various control registers.[6] Hopefully after reading this article, the internal operation of the Beaglebone will be less mysterious.

Notes and references

[1] Many of the modules of the Sitara chip have cryptic names. A brief explanation of some of them, along with where they are described in the TRM:

PRU-ICSS (Programmable Real-Time Unit / Industrial Communication Subsystem, chapter 4): this is the two real-time microcontrollers included in the chip. They contain their own modules.
Industrial Ethernet is an extension of Ethernet protocols for industrial environments that require real-time, predictable communication. The Industrial Ethernet module provides timers and I/O lines that can be used to implement these protocols.
SGX (chapter 5) is a 2D/3D graphics accelerator.
GPMC is the general-purpose memory controller. OCMC is the on-chip memory controller, providing access to 64K of on-chip RAM. EMIF is the external memory interface. It is used on the BeagleBone to access the DDR memory. ELM (error location module) is used with flash memory to detect errors. Memory is discussed in chapter 7 of the TRM.
L1, L2, L3, L4: The processor has L1 instruction and data caches, and a larger L2 cache. The L3 interconnect provides high-speed communication between many of the modules of the processor using a "NoC" (Network on Chip) infrastructure. The slower L4 interconnect is used for communication with low-bandwidth modules on the chip. (See chapter 10 of the TRM.)
EDMA (enhanced direct memory access, chapter 11) provides transfers between external memory and internal modules.
TS_ADC (touchscreen, analog/digital converter, chapter 12) is a general-purpose analog to digital converter subsystem that includes support for resistive touchscreens. This module is used on the BeagleBone for the analog inputs.
LCD controller (chapter 13) provides support for LCD screens (up to 2K by 2K pixels). This module provides many signals to the TDA19988 HDMI chip that generates the BeagleBone's HDMI output.
EMAC (Ethernet Media Access Controller, chapter 14) is the Ethernet subsystem. RMII (reduced media independent interface) is the Ethernet interface used by the BeagleBone. MDIO (management data I/O) provides control signals for the Ethernet interface. GMII and RGMII are similar for gigabit Ethernet (unused on the BeagleBone).
The PWMSS (Pulse-width modulation subsystem, chapter 15) contains three modules. The eHRPWM (enhanced high resolution pulse width modulator) generates PWM signals, digital pulse trains with selectable duty cycle. These are useful for LED brightness or motor speed, for instance. These are sometimes called analog outputs, although technically they are not. This subsystem also includes the eCAP (enhanced capture) module which measures the time of incoming pulses. eQEP (Enhanced quadrature encoder pulse) module is a surprisingly complex module to process optical shaft encoder inputs (e.g. an optical encoder disk in a mouse) to determine its rotation.
MMC (multimedia card, chapter 18) provides the SD card interface on the BeagleBone.
UART (Universal Asynhronous Receiver/Transmitter, chapter 19) handles serial communication.
I2C (chapter 21) is a serial bus used for communication with devices that support this protocol.
The McASP (Multichannel audio serial port, chapter 22) provides digital audio input and output.
CAN (controller area network, chapter 23) is a bus used for communication on vehicles.
McSPI (Multichannel serial port interface, chapter 24) is used for serial communication with devices supporting the SPI protocol.

[2] The following C++ code uses the file system to switch GPIO 49 on and off. Remove the usleeps for maximum speed. Note: this code omits pin initialization; you must manually do "echo 49 > /sys/class/gpio/export" and "echo out > /sys/class/gpio/gpio49/direction".

#include <unistd.h>
#include <fcntl.h>

int main() {
  int fd =  open("/sys/class/gpio/gpio49/value", O_WRONLY);
  while (1) {
    write(fd, "1", 1);
    usleep(100000);
    write(fd, "0", 1);
    usleep(100000);
  }
}

[3] Here's a program based on devmem2 to toggle the GPIO pin by accessing the register directly. The usleeps are delays to make the flashing visible; remove them for maximum speed. For simplicity, this program does not set the pin directions; you must do that manually.

#include <fcntl.h>
#include <sys/mman.h>

int main() {
    int fd = open("/dev/mem", O_RDWR | O_SYNC);
    void *map_base = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED,
            fd, 0x4804C000);
    while (1) {
        *(unsigned long *)(map_base + 0x194) = 0x00020000;
        usleep(100000);
        *(unsigned long *)(map_base + 0x190) = 0x00020000;
        usleep(100000);
    }
}

For more on this approach, see BeagleBone Black GPIO through /dev/mem.

[4] Selecting one of the eight functions for each pin is done through the pin multiplexer (pinmux) which uses the pinctrl-single device driver. Pin usage is defined in the device tree. The details of this are complex and change from kernel to kernel. See GPIOs on the Beaglebone Black using the Device Tree Overlays or BeagleBone and the 3.8 Kernel for details.

[5] The function names tend to change in different kernel versions. I'm describing the Linux 3.8 kernel code.

[6] I've discussed just the GPIO pins, but other pins (LED, PWM, timer, etc) have similar file system entries, device drivers, and device tree entries.

PRU tips: Understanding the BeagleBone's built-in microcontrollers

Running a "blink" program on the PRU

Documentation

The assembly code

The I/O pin

Communication between the main processor and the PRU

The Userspace I/O (UIO) driver

Device Tree

Device tree overlay

The cape manager

Interrupts

Next steps

Conclusion

Notes and references

The BeagleBone's I/O pins: inside the software stack that makes them work

Using a GPIO pin

What's happening internally with GPIOs?

The names of pins

How writing to a file toggles a GPIO

Device Trees

BoneScript

Conclusion

Notes and references

Don't miss a post!