First off I wasn\'t sure if I should post this as a Ubuntu question or here. But I\'m guessing it\'s more of an Python question than a OS one.
My Python application
OK. I have the answer to my own question. Yes, it's taken me over 3 months to get this far.
It appears to be GIL thrashing in Python that is the reason for the massive 'system' CPU spikes and associated pauses. Here is a good explanation of where the thrashing comes from. That presentation also pointed me in the right direction.
Python 3.2 introduced a new GIL implementation to avoid this thrashing. The result can be shown with a simple threaded example (taken from the presentation above):
from threading import Thread
import psutil
def countdown():
n = 100000000
while n > 0:
n -= 1
t1 = Thread(target=countdown)
t2 = Thread(target=countdown)
t1.start(); t2.start()
t1.join(); t2.join()
print(psutil.Process().cpu_times())
On my Macbook Pro with Python 2.7.9 this uses 14.7s of 'user' CPU and 13.2s of 'system' CPU.
Python 3.4 uses 15.0s of 'user' (slightly more) but only 0.2s of 'system'.
So, the GIL is still in place, it still only runs as fast as when the code is single threaded, but it avoids all the GIL contention of Python 2 that manifests as kernel ('system') CPU time. This contention, I believe, is what was causing the issues of the original question.
An additional cause to the CPU problem was found to be with OpenCV/TBB. Fully documented in this SO question.