GPU has controllable dedicated caches, CPU has better branching. Other than that, compute performance relies on SIMD width, integer core density, and instruction level parallelism.
Also another important parameter is that how far the data is to a CPU or GPU. (Your data could be an opengl buffer in a discrete GPU and you may need to download it to RAM before computing with CPU, same effect can be seen when a host buffer is in RAM and needs to be computed on discrete GPU )