We\'ve always been an Intel shop. All the developers use Intel machines, recommended platform for end users is Intel, and if end users want to run on AMD it\'s their lookou
Sorry if you hit my general button.
This is on the subject of low-level optimization, so it only matters for code that 1) the program counter spends much time in, and 2) the compiler actually sees. For example, if the PC spends most of its time in library routines that you don't compile, it shouldn't matter very much.
Whether or not conditions 1 & 2 are met, here's my experience of how optimization goes:
Several iterations of sampling and fixing are done. In each of these, a problem is identified and most often it is not about where the program counter is. Rather it is that there are function calls at mid-levels of the call stack that, since performance is paramount, could be replaced. To find them quickly, I do this.
Keep in mind that if there is a function call instruction that is on the stack for a significant fraction of execution time, whether in a few long invocations, or a great many short ones, that call is responsible for that fraction of time, so removing it or executing it less often can save a lot of time. And, that savings far exceeds any low-level optimization.
The program can now be many times faster than it was to begin with. I've never seen any good-sized program, no matter how carefully written, that could not benefit from this process. If the process has not been done, it should not be assumed that low-level optimization is the only way to speed up the program.
After this process has been done to the point where it simply can't be done any more, and if samples show that the PC is in code that the compiler sees, then the low-level optimization can make a difference.