PRU tips: Understanding the BeagleBone's built-in microcontrollers

The BeagleBone Black is an inexpensive, credit-card sized computer that has two built-in microcontrollers called PRUs. While the PRUs provide the real-time processing capability lacking in Linux, using these processors has a learning curve. In this article, I show how to run a simple program on the PRU, and then dive into the libraries and device drivers to show what is happening behind the scenes. Warning: this post uses the 3.8.13-bone79 kernel; many things have changed since then.

The BeagleBone uses the Sitara AM3358, an ARM processor chip running at 1 GHz—this is the thumbnail-sized chip in the center of the board below. If you want to perform real-time operations, the BeagleBone's ARM processor won't work well since Linux isn't a real-time operating system. However, the Sitara chip contains two 32-bit microcontrollers, called Programmable Realtime Units or PRUs. (It's almost fractal, having processors inside the processor.) By using a PRU, you can achieve fast, deterministic, real-time control of I/O pins and devices.

The BeagleBone computer is tiny. For some reason it was designed to fit inside an Altoids mint tin. The square thumbnail-sized chip in the center is the AM3358 Sitara processor chip. This chip contains an ARM processor as well as two 32-bit microcontrollers called PRUs.

The nice thing about using the PRU microcontrollers on the BeagleBone is that you get both real-time Arduino-style control, and "real computer" features such as a web server, WiFi, Ethernet, and multiple languages. The main processor on the BeagleBone can communicate with the PRUs, giving you the best of both worlds. The downside of the PRUs is there's a significant learning curve to use them since they have their own instruction set and run outside the familiar Linux world. Hopefully this article will help with the learning curve.

I wrote an article a couple weeks ago on The BeagleBone's I/O pins. That article discussed the pins controlled by the ARM processor, while this article focuses on the PRU microcontroller. If you think you've seen the present article before, they cover two different things, but I won't blame you for getting deja vu!

Running a "blink" program on the PRU

To motivate the discussion, I'll use a simple program that uses the PRU to flash an LED. This example is based on PRU GPIO example

Blinking an LED using the BeagleBone's PRU microcontroller.

The easiest way to compile and assemble PRU code is to do it on the BeagleBone, since the necessary tools are installed by default (at least if you get the Adafruit BeagleBone). Perform the following steps on the BeagleBone.

Connect an LED to BeagleBone header P8_11 through a 1K resistor.
Download the assembly code file blink.p:
Download the host file to load and run the PRU code, loader.c.
Download the device tree file, /lib/firmware/PRU-GPIO-EXAMPLE-00A0.dts.

Compile and install the device tree file to enable the PRU:

# dtc -O dtb -I dts -o /lib/firmware/PRU-GPIO-EXAMPLE-00A0.dtbo -b 0 -@ PRU-GPIO-EXAMPLE-00A0.dts
# echo PRU-GPIO-EXAMPLE > /sys/devices/bone_capemgr.?/slots
# cat /sys/devices/bone_capemgr.?/slots

Assemble blink.p and compile the loader:

# pasm -b blink.p
# gcc -o loader loader.c -lprussdrv

Run the loader to execute the PRU binary:
```
# ./loader blink.bin
```

If all goes well, the LED should blink 10 times.[1]

Documentation

The most important document that describes the Sitara chip is the 5041-page Technical Reference Manual (TRM for short). This article references the TRM where appropriate, if you want more information. Information on the PRU is inconveniently split between the TRM and the AM335x PRU-ICSS Reference Guide. For specifics on the AM3358 chip used in the BeagleBone, see the 253 page datasheet. Texas Instruments' has the PRU wiki with more information.

If you're looking to use the BeagleBone and/or PRU I highly recommend the detailed and informative book Exploring BeagleBone. Helpful web pages on the PRU include BeagleBone Black PRU: Hello World, Working with the PRU and BeagleBone PRU GPIO example.

The assembly code

If you're familiar with assembly code, the PRU's instruction set should be fairly straightforward. The instruction set is documented on the wiki and in the PRU reference manual (section 5).[2]

The demonstration blink code uses register R1 to count the number of blinks (10) and register R0 to provide the delay between flashes. The delay timing illustrates an important feature of the PRU: it is entirely deterministic. Most instructions takes 5 nanoseconds (at 200 MHz), although reads are slower. There is no pipelining, branch delays, memory paging, interrupts, scheduling, or anything else that interferes with instruction execution. This makes PRU execution predictable and suitable for real-time processing. (In contrast, the main ARM processor has all the scheduling and timing issues of Linux, making it unsuitable for real-time processing.)

Since the delay loop is two instructions long and executes 0xa00000 times (an arbitrary value that gave a visible delay), we can compute that each delay is 104.858 milliseconds, regardless of what else the processor is doing. (This is shown in the oscilloscope trace below.) A similar loop on the ARM processor would have variable timing, depending on system load.

LED output from the BeagleBone PRU demo, showing the 104ms oscillations.

The I/O pin

How do we know that pin P8_11 is controlled by bit 15 of R30? We're using one of the PRU output pins called "pr1_pru0_pru_r30_15", which is a PRU 0 output pin controlled by R30 bit 15. (Note that the PRU GPIO pins are separate from the "regular" GPIO pins.)[3] The BeagleBone header chart shows that this PRU I/O pin is connected to BeagleBone header pin P8_11.[4] To tell the BeagleBone we want to use this pin with the PRU, the device tree configuration is updated as discussed below.

Communication between the main processor and the PRU

The PRU microcontrollers are part of the processor chip, yet operate independently. In the blink demo, the main processor runs a loader program that downloads the PRU code to the microcontroller, signals the PRU to start executing, and then waits for the PRU to finish. But what happens behind the scenes to make this work? The PRU Linux Application Library (PRUSSDRV) provides the API between code running under Linux on the BeagleBone and the PRU.[5]

The prussdrv_exec_program() API call loads a binary program (in our case, the blink program) into the PRU so it can be executed. To understand how this works requires a bit of background on how memory is set up.

Each PRU microcontroller has 8K (yes, just 8K) of instruction memory, starting at address 0. Each PRU also has 8K of data memory, starting at address 0 (see TRM 4.3).[6] Since the main processor can't access all these memories at address 0, they are mapped into different parts of the main processor's memory as defined by the "global memory map".[7] For instance, PRU0's instruction memory is accessed by the host starting at physical address 0x4a334000. Thus, to load the binary code into the PRU, the API code copies the PRU binary file to that address.[8]

Then, to start execution, the API code starts the PRU running by setting the enable bit in the PRU's control register, which the host can access at a specific physical address (see TRM 4.5.1.1).

To summarize, the loader program running on the main processor loads the executable file into the PRU by writing it to special memory addresses. It then starts up the PRU by writing to another special memory address that is the PRU's control register.

The Userspace I/O (UIO) driver

At this point, you may wonder how the loader program can access to physical memory and low-level chip registers - usually is not possible from a user-level program. This is accomplished through UIO, the Linux Userspace I/O driver (details). The idea behind a UIO driver is that many devices (the PRU in our case) are controlled through a set of registers. Instead of writing a complex kernel device driver to control these registers, simply expose the registers to user code, and then user-level code can access the registers to control the device. The advantage of UIO is that user-level code is easier to write and debug than kernel code. The disadvantage is there's no control over device usage—buggy or malicious user-level code can easily mess up the device operation.[9]

For the PRU, the uio_pruss driver provides UIO access. This driver exposes the physical memory associated with the PRU as a device /dev/uio0. The user-level PRUSSDRV API library code does a mmap on this device to map the address space into its own memory. This gives the loader code access to the PRU memory through addresses in its own address space. The uio_pruss driver exposes information on the mapping to the user-level code through the directory /sys/class/uio/uio0/maps.

Another photo of the BeagleBone Black single-board computer inside an Altoids mint tin. At this point in the article, you may need a break from the text.

Device Tree

How does the uio_pruss driver know the address of the PRU? This is configured through the device tree, a complicated set of configuration files used to configure the BeagleBone hardware. I won't go into the whole explanation of device trees, but just the parts relevant to our story.[10] The pruss entry in the device tree is defined in am33xx.dtsi. Important excerpts are:

pruss: pruss@4a300000 {
 compatible = "ti,pruss-v2";
 reg = <0x4a300000 0x080000>;
 status = "disabled";
 interrupts = <20 21 22 23 24 25 26 27>;
};

The address 4a300000 in the device tree corresponds to the PRU's instruction/data/control space in the processor's address space. (See TRM Table 2-4 and section 4.3.2.) This address is provided to the uio_pruss driver so it knows what memory to access.

The "ti,pruss-v2" compatible line causes the matching uio_pruss driver to be loaded. Note that the status is "disabled", so the PRU is disabled by default until enabled in a device tree overlay.

The "interrupts" line specifies the eight ARM interrupts that are handled by the driver; these will be discussed in the Interrupts section below.

Device tree overlay

To enable the PRU and the GPIO output pin, we needed to load the PRU-GPIO-EXAMPLE-00A0.dts device tree overlay file.[11] This file is discussed in detail at credentiality, so I'll just give some highlights.

As described earlier, each header pin has eight different functions, so we need to tell the pin multiplexer driver which of the eight modes to use for the pin P8_11. Looking at the P8 header diagram, pin P8_11 has internal address 0x34 and has the PRU output function in mode 6. This is expressed in the overlay file:

exclusive-use = "P8.11", "pru0";
...
    target = <&am33xx_pinmux>;
...
       pinctrl-single,pins = <
         0x34 0x06

The above entry extends the am33xx_pinmux device tree description and is passed to the pinctrl-single driver, which configures the pins as desired by updating the appropriate chip control registers.

The device tree file also overrides the previous pruss entry, changing the status to "okay", which enables the uio_pruss device driver:

    target = <&pruss>;
    __overlay__ {
      status = "okay";

The cape manager

This device tree overlay was loaded with the command "echo PRU-GPIO-EXAMPLE > /sys/devices/bone_capemgr.?/slots". What does this actually do?

The idea of the cape manager is if you plug a board (called a cape) onto the BeagleBone, the cape manager reads an EEPROM on the board, figures out what resources the board needs, and automatically loads the appropriate device tree fragments (from /lib/firmware) to configure the board (source, details, documentation). The cape manager is configured in the device tree file am335x-bone-common.dtsi.

Examples of BeagleBone capes. Image from beaglebone.org, CC BY-SA 3.0.

In our case, we aren't installing a new board, but just want to manually install a new device tree file. The cape manager exposes a bunch of files in /sys/devices/bone_capemgr.*. Sending a name to the file "slots" causes the cape manager to load the corresponding device tree file. This enables the PRU and the desired output pin.

Interrupts

Interrupts can be used to communicate between the PRU code and the host code. At the end of the PRU blink code, it sends an interrupt to the host by writing 35 to register R31. This wakes up the loader program, which is waiting with the API call "prussdrv_pru_wait_event(PRU_EVTOUT_0)". This process seems like it should be straightforward, but it's surprisingly complex. In this section, I'll explain how PRU interrupts work.

I'll start with the host (loader program) side. The PRU can send eight different events to the host program. The prussdrv_pru_wait_event(N) call waits for event N (0 through 7) by reading from /dev/uioN. This read will block until an interrupt happens, and then return. (This seems like a strange way to receive interrupts, but using the file system fits the Unix philosophy.)[12]

If you look in the file system, there are 8 uio files: /dev/uio0 through /dev/uio7. You might expect /dev/uio1 provides access to PRU1, but that's not the case. Instead, there are 8 event channels between the PRUs and the main processor, allowing 8 different types of interrupts. Each one of these has a separate /dev/uioN file; reading from the file waits for the corresponding event, labeled PRU_EVTOUT0 through PRU_EVTOUT7.

Jumping now to the PRU side, the interrupt is triggered by a write to R31. Register R31 is a special register - writing to it sends an event to the interrupt system.[13] The blink code generates PRU internal event 3 (with the crazy name pr1_pru_mst_intr[3]_intr_req), which is mapped to system event 19.

The following diagram shows how an event (right) gets mapped to a channel, then a host, and finally generates an interrupt (left). In our case, system event 19 from the PRU is given the label PRU0_ARM_INTERRUPT. This interrupt is mapped to channel 2, then host 2 which goes to ARM interrupt PRU_EVTOUT0, which is ARM interrupt 20 (see TRM 6.3). The device tree configured the pruss_uio driver to handle events 20 through 27, so the driver receives the interrupt and unblocks /dev/uio0, informing the host process of the PRU event.

Interrupt handling on the BeagleBone between the PRU microcontrollers and the ARM processor. System events (right) are mapped to channels, then hosts, finally generating interrupts (left).

How does the above complex mapping get set up? You might guess the device tree, but it's done by the PRUSSDRV user-level library code. The interrupt configuration is defined in PRUSS_INTC_INITDATA, a struct that defines the various mappings shown above. This struct is passed to prussdrv_pruintc_init(), which writes to the appropriate PRU interrupt registers (mapped into memory).[14]

Next steps

A real application would use a host program much more powerful than the example loader. For instance, the host program could run a web server and actively communicate with the PRU. Or the host program could do computation on data received from the PRU. Such a system uses the ARM processor to get all the advantages of Linux, and the PRU microcontrollers to perform real-time activities impossible in Linux.

It's easier to program the PRU in C using an IDE. That's a topic for another post, but for now you can look here, here, here or here for information on using the C compiler.

Conclusion

The PRU microcontrollers give the BeagleBone real-time, deterministic processing. Combining the BeagleBone's full Linux system with the PRU microcontrollers yields a very powerful system. The Linux side can provide powerful processing, network connectivity, web servers, and so forth, while the PRUs can interface with the external world. If you're using an Arduino but want more (Ethernet, web connectivity, USB, WiFi), the BeagleBone is an alternative to consider.

There is a significant learning curve to using the PRU microcontrollers, however. Much of what happens may seem like magic, hidden behind APIs and device tree files. Hopefully this article has filled in the gaps, explaining what happens behind the scene.

Notes and references

[1] If you get the error "prussdrv_open() failed" the problem is the PRU didn't get enabled. Make sure you loaded the device tree file into the cape manager.

[2] One of the instructions in the PRUs instruction set is LBBO (load byte burst), which reads a block of memory into the register file. That sounded to me like an obscure instruction I'd never need, but it turns out that this instruction is used for any memory reads. So don't make my mistake of ignoring this instruction!

[3] To understand the pin name "pr1_pru0_pru_r30_15": pru0 indicates the pin is controlled by PRU 0 (rather than PRU 1), r30 indicates the pin is an output pin, controlled by the output register R30 (as opposed to the input register R31), and 15 indicates I/O pin 15, which is controlled by bit 15 of the register. A mnemonic to remember that R30 is the output register and R31 is the input register: 0 looks like O for output, 1 looks like I for input.

[4] Most PRU pins conflict with HDMI or another subsystem, so you need to disable HDMI to use them. To avoid this problem, I picked one of the few non-conflicting PRU pins. Disabling HDMI is described here and here. See this chart for a list of available PRU pins and potential conflicts.

[5] The PRU library is documented in PRU Linux Application Loader API Guide. The library is often installed on the BeagleBone by default but can also be downloaded in the am335x_pru_package along with PRU documentation and some coding examples. Library source code is here.

[6] The PRU, like many microcontrollers, uses a Harvard architecture, with independent memory for code and data.

[7] In the main processor's address space, PRU0's data RAM starts at address 0x4a300000, PRU1's data RAM starts at 0x4a302000, PRU0's instruction RAM starts at 0x4a334000, and PRU1's instruction RAM starts at 0x4a338000. (See TRM Table 2-4 and section 4.3.2.) The PRU control registers also have addresses in the main processor's physical address space; PRU0's control registers start at address 0x4a322000 and PRU1's at 0x4a324000.

[8] The API uses prussdrv_pru_write_memory() to copy the contents into the memory-mapped region for PRU's instruction memory. The prussdrv_load_datafile API call is used to load a file into the PRU's data memory.

[9] Note that using Userspace I/O to control the PRU is the exact opposite philosophy of how the regular GPIO pins on the BeagleBone are controlled. Most of the BeagleBone's I/O pins are controlled through complex device drivers that provide access through simple file system interfaces. This is easy to use: just write 1 to /sys/class/gpio/gpio60/value to turn on GPIO 60. On the other hand, the BeagleBone gives you raw access to the PRU control registers - you can do whatever you want, but you'll need to figure out how to do it yourself. It's a bit strange that two parts of the BeagleBone have such wildly different philosophies.

[10] Device trees are discussed in more detail in my previous BeagleBone article. Also see A Tutorial on the Device Tree, Device Tree for Dummies, or Introduction to the BeagleBone Black Device Tree.

[11] Why is the example device tree overlay (and most device tree files) labeled with version 00A0? I found out (thanks to Robert Nelson) that A0 is a common initial version for contract hardware. So 00A0 is the first version of something, followed by 00A1, etc.

[12] The blink example only sends events from the PRU to the host, but what about the other direction? The host can send an event (e.g. ARM_PRU0_INTERRUPT, 21) to a PRU using prussdrv_pru_send_event(). This writes the PRU's SRSR0 (System Event Status Raw/Set Register, TRM 4.5.3.13) to generate the event. The interrupt diagram shows this event flows to R31 bit 30. The PRUs don't receive interrupts as such, but must poll the top two bits of R31 to see if an interrupt signal has arrived.

[13] The PRU has 64 system events for all the different things that can trigger interrupts, such as timers or I/O (see TRM 4.4.2.2). When generating an event with R31, bit 5 triggers an event, while the lower 4 bits select the event number (see TRM 4.4.1.2). PRU system event numbers are offset by 16 from the internal event numbers, so internal event 3 is system event 19. See PRU interrupts for the full list of events.

[14] Multiple PRU registers are set up to initialize interrupt handling. The CMR (channel map registers, TRM 4.5.3.20) map from each system event to a channel. The HMR (host interrupt map registers, TRM 4.5.3.36) map from a channel to a host interrupt. The PRU interrupt registers (INTC) start at address 0x4a320000 (TRM Table 4-8).

6 comments:

curveshipAugust 31, 2016 at 12:11 PM
Great article! Wish I had it when I was getting started with the PRU. One small factual correction. You write:

> Each instruction takes 5 nanoseconds (at 200 MHz).

That's not correct for read instructions, some of which have a best-case of over 100 cycles. See http://processors.wiki.ti.com/index.php/AM335x_PRU_Read_Latencies .

If you need very precise timing, you'll probably need to poll the IEP timer, which ticks at a guaranteed 200MHz.
Ken ShirriffAugust 31, 2016 at 12:29 PM
Thanks, curveship - very good point! I've updated the article.
przemekSeptember 6, 2016 at 12:10 PM
You use the UIO infrastructure, which is older and well-established, but it is being phased out by the general remoteproc/remotemsg infrastructure for coprocessors and multi-cpu machines. Newer kernels used to only contain remoteproc, which made uio people unhappy, but Robert Nelson and others managed to backport uio so it's also available now. Still, remoteproc is more general and looks like will be the thing to use in the future.
WilliamKingNovember 21, 2016 at 1:48 AM
nice
StanJanuary 14, 2017 at 3:17 PM
Thanks a lot! This page is helping me really out to make my first steps with the PRU. Unfortunately there is the issue with the cape-manager not being enabled since the kernel version 3.8. Now I'm still trying to figure out how to connect the PRU-GPIO-EXAMPLE with my Device Tree on my BBB with the Linux Version 4.4.9. Now the cape manager moved to /sys/devices/platform/bone_capemgr/slots and the way to handle it is described in the manual to be something like:

sh -c "echo 'PRU-GPIO-EXAMPLE' > /sys/devices/platform/bone_capemgr/slots"

But still I couldn't get it running due to some I/O Problems.

UnknownSeptember 27, 2017 at 6:42 PM
I am very happy to find and read this article, The most well explained about PRUs on BBB.

I got BBB and using rt kernel (4.8.13-rt6) on it. In this case, rt kernel automatically uses two PRUs?

Thanks in advance.

Jerry