Using the Raspberry Pi as a DSP?

hifiberry-dsp-lightA reader asked why I created a separate DSP board instead of just added analogue inputs and outputs to the Raspberry Pi and do the DSP processing in software on the Raspberry Pi CPU. I never though about this option, but they might be interesting.

Operating system

Independent from the processing power of the Raspberry CPU (we will come to this later), one problem is the operating system. The OS does not have realtime capabilities, which means the OS kernel can block every user process as long as it wants. However, this is more a theoretical problem, than a practical one. Using some buffering this will most likely work well. But do not expect delays in the range < 1ms, they will be longer than this!

CPU

The CPU is running at 700MHz and can be clocked even up to 1GHz which is way faster than the DSP chip on the HiFiBerry DSP light. This chip runs at 50MHz only. However, clock rate does not say a lot about performance. DSP chips are specially created for the algorithms use in digital signal processing, while a normal CPU is optimized to do most of the tasks reasonably well. Let’s have a look on the Raspberry Pi CPU: The SoC is produced by Broadcom and uses a ARM1176JZ-F core. It is based on the ARMv6 architecture, which is quite old (first CPUs based on this architecture shipped in 2002). One good thing about it is, that it features a floating point numerical co-processor – a VFPv2. It even has a DSP command set. However, the DSP commands are useless for our use, because they work only with 16bit. It would be possible to emulate 32bit operations with it, but the performance will be relatively low. Therefore we will have a look on the floating point unit.

The VFPv2 floating point unit

After having a look at the technical reference manual, it seems, that the floating point unit can process most floating point operations in a single clock cycle. On the Raspberry Pi this would mean a theoretical floating point performance of 1 GFlops. However this is a completely theoretical number. Data has to be transferred between the main memory and the VFP. This means, the practical performance will be much lower. However, it still looks promising.

Simple test in C

Let’s see, what happens if we use s simple C program that runs a floating point multiplication over and over. Out program uses this inner multiplication (both variables are floats). It loops exactly 1,000,000,000 times over this operation.

f1[i] *= f2[i];

We use arrays to make sure to have the least efficient access (from memory to CPU back to memory) to the data. Therefore this is a worst-case scenario. With 1GFlops performance, the program should be finished in 1 second. Let’s see:

[email protected]:~/fp$ time ./ft

real	1m9.358s
user	1m7.960s
sys	0m0.050s

Oops, that’s almost 70 seconds. That would result in a floating point performance of only 14 MFlops. And – yes, we did use the hardware VFP unit, the program was compiled that way.
That’s a lot less than even the simple HiFiBerry DSP light board. With 48kHz sample rate this would result in less than 300 operations per sample. That’s not much.

Now let’s use some code that does not need memory access all the time. It uses only two variables:

f *= g

This should need less memory access.

[email protected]:~/fp$ time ./ft

real	0m44.142s
user	0m43.340s
sys	0m0.010s

It looks better, but the performance is still less than 25MFlops.

It is interesting to see, that even having the simple loop around the operations we used takes about 14 seconds. That means, the floating point performance is a bit better. In the second case it would be a bit more than 30 MFlops.

Conclusion: Using C code, the Raspberry Pi can be used only for simple DSP operations. However, there might be a chance to dramatically improve the performance by using highly optimized assembler code.

References:

4 thoughts on “Using the Raspberry Pi as a DSP?

  1. Pingback: MP3 แบบเร็วๆ | Raspberry Pi Thailand

  2. g.g

    There’s also the QPU hardware that could be usable for DSP purposes. Have you looked into that, now that the docs from Broadcom seem to have been released?

    Reply
  3. Shining Surya Rao

    You have used a C program. Will there be any improvement if we had used FORTRAN which can do number cruching more efficiently?? ( I don’t know myself, I heard it can do math pretty well)..
    Also what if we tweak with the job control of the linux system and it the highest priority??

    Reply
  4. erotavlas

    Hi everyone,
    I read that the new Raspberry Pi 2 CPU is based on architecture Cortex-A7 https://en.wikipedia.org/wiki/ARM_Cortex-A7 which supports the Neon SIMD extensions. What about the new performance on FFT? The GPU of Raspberry Pi 2 is the same of Raspberry Pi and as consequence it has the same performance on computing FFT http://www.raspberrypi.org/accelerating … g-the-gpu/.
    What about the DSP contained in the CPU Broadcom BCM2836 of Raspberry Pi 2?
    Thank you

    Reply

Leave a Reply