I wrote a short CUDA program that uses the highly-optimized CUB library to demonstrate that one core from an old, quad-core Intel Q6600 processor (all four are supposedly ca
Thanks to Robert Crovella, it turns out I was using the "Debug" mode that is notoriously slow instead of "Release" mode.