I am not sure whether I should post this question here, because this seems to be a programming-oriented website.
Anyway, I think there must be some gurus here who knows
I used to work on a fairly large signal processing system which ran on a large cluster. We used to reckon for heavy maths crunching, the Intel compiler gave us about 10% less CPU load than GCC. That's very unscientific but it was our experience (that was about 18 months ago).
What would have been interesting is if we'd been able to use Intel's math libraries as well which use their chipset more efficiently.