How much should I worry about the Intel C++ compiler emitting suboptimal code for AMD?

心不动则不痛 提交于 2019-12-18 13:55:07

问题


We've always been an Intel shop. All the developers use Intel machines, recommended platform for end users is Intel, and if end users want to run on AMD it's their lookout. Maybe the test department had an AMD machine somewhere to check we didn't ship anything completely broken, but that was about it.

Up until a few of years ago we just used the MSVC compiler and since it doesn't really offer a lot of processor tuning options beyond SSE level, noone worried too much about whether the code might favour one x86 vendor over another. However, more recently we've been using the Intel compiler a lot. Our stuff definitely gets some significant performance benefits from it (on our Intel hardware), and its vectorization capabilities mean less need to go to asm/intrinsics. However people are starting to get a bit nervous about whether the Intel compiler may actually not be doing such a good job for AMD hardware. Certainly if you step into the Intel CRT or IPP libraries you see a lot of cpuid queries to apparently set up jump tables to optimised functions. It seems unlikely Intel go to much trouble to do anything good for AMDs chips though.

Can anyone with any experience in this area comment on whether it's a big deal or not in practice ? (We've yet to actually do any performance testing on AMD ourselves).

Update 2010-01-04: Well the need to support AMD never became concrete enough for me to do any testing myself. There are some interesting reads on the issue here, here and here though.

Update 2010-08-09: It seems the Intel-FTC settlement has something to say about this issue - see "Compilers and Dirty Tricks" section of this article.


回答1:


Buy an AMD box and run it on that. That seems like the only responsible thing to do, rather than trusting strangers on the internet ;)

Apart from that, I believe part of AMD's lawsuit against Intel is based on the claim that Intel's compiler specifically produces code that runs inefficiently on AMD processors. I don't know whether that's true or not, but AMD seems to believe so.

But even if they don't willfully do that, there's no doubt that Intel's compiler optimizes specifically for Intel processors and nothing else.

When that is said, I doubt it'd make a huge difference. AMD CPU's would still benefit from all the auto-vectorization and other clever features of the compiler.




回答2:


What we have seen is that wherever the Intel compiler must make a runtime choice about the available instruction set, if it does not recognize an Intel CPU, it goes in their "standard" code (which, as you might expect, may not be optimal).

Note that even if I used the word "compiler" above, this mainly happens in their supplied (pre-compiled) libraries and intrinsics that check the instruction set and call the best code.




回答3:


I'm surely stating the obvious, if performance is crucial for your application, then you'd better do some testing - on all combinations of hardware/compiler. There are no guarantees. As outsiders, we can only give you our guesses/biases. Your software may have unique characteristics that are unlike what we've seen.

My experience:

I used to work at Intel, and developed an in-house (C++) application where performance was critical. We tried to use Intel's C++ compiler, and it always under performed gcc - even after doing profile runs, recompiling using the profiled information (which icc supposedly uses to optimize) and re-running on the exact same dataset (this was in 2005-2007, things may be different now). So, based on my experience, you might want to try gcc (in addition to icc and MSVC), it's possible you will get better performance that way and side-step the question. It shouldn't be too hard to switch compilers (if your build process is reasonable).

Now I work at a different company, and the IT folks do extensive hardware testing, and for a while Intel and AMD hardware was relatively comparable, but the latest generation of Intel hardware significantly out-performed the AMD. As a result, I believe they purchased significant amounts of Intel CPUs and recommend the same for our customers who run our software.

But, back to the question as to whether the Intel compiler specifically targets AMD hardware to run slowly. I doubt Intel bothers with that. It could be that certain optimizations that use knowledge about the internals of Intel CPU architecture or chipsets could run slower on AMD hardware, but I doubt they specifically target AMD hardware.




回答4:


Sorry if you hit my general button.

This is on the subject of low-level optimization, so it only matters for code that 1) the program counter spends much time in, and 2) the compiler actually sees. For example, if the PC spends most of its time in library routines that you don't compile, it shouldn't matter very much.

Whether or not conditions 1 & 2 are met, here's my experience of how optimization goes:

Several iterations of sampling and fixing are done. In each of these, a problem is identified and most often it is not about where the program counter is. Rather it is that there are function calls at mid-levels of the call stack that, since performance is paramount, could be replaced. To find them quickly, I do this.

Keep in mind that if there is a function call instruction that is on the stack for a significant fraction of execution time, whether in a few long invocations, or a great many short ones, that call is responsible for that fraction of time, so removing it or executing it less often can save a lot of time. And, that savings far exceeds any low-level optimization.

The program can now be many times faster than it was to begin with. I've never seen any good-sized program, no matter how carefully written, that could not benefit from this process. If the process has not been done, it should not be assumed that low-level optimization is the only way to speed up the program.

After this process has been done to the point where it simply can't be done any more, and if samples show that the PC is in code that the compiler sees, then the low-level optimization can make a difference.




回答5:


At the time this thread was started, Microsoft C++ defaulted to code generation which was good in some cases for AMD and bad for Intel. Their more recent compilers default to the blend option which is good for both, particularly after both brands of CPUs had worked out their peculiar performance bugs. When I first worked at Intel, their compilers reserved some optimizations for Intel-specific architecture settings. I guess that might have been a topic of some FTC depositions, although it didn't come up in my 10 hours of testimony, and the practice was already on the way out due to convergence of optimization requirements between up to date CPU models and the need for more productive use of compiler development time. If you used one of those obsolete compilers on an up to date Intel CPU, you might see some of the same performance deficiencies.




回答6:


It's pointless to worry if you can't act. Possible actions are: Not buying AMD, or using a different compiler. So the obvious things to do are:

(1) Buy one AMD box, and measure the speed of the code compiled with the Intel compiler. Is it fast enough? If yes, you're done, you can buy AMD, don't worry.

(2) If no: Compile the code with a different compiler and run it on the AMD box. Is it fast enough? If no, you're done, you can't buy AMD, don't worry.

(3) If yes: Run the same code on an Intel box. Is it fast enough? If yes, you're done, you can buy AMD but have to switch compilers, don't worry.

(4) If no: Possibilities are: Don't buy AMD, throw all Intel computers out, or compile with two different compilers. Pick one.




回答7:


I have directly experienced purposeful crippling of technology when a vendor attempted to prevent a Lotus product from reaching market before their offering. A working technology was available, but Lotus was forbidden to use it. Ah well...

A few years back there were blogs that showed users that patching a single byte in the Intel compiler caused it to emit "optimal" code that was not crippled when used on AMD. I have not looked for those blog entries in years.

I am inclined to believe that such competitive behavior continues. I have no other evidence to offer.



来源:https://stackoverflow.com/questions/839667/how-much-should-i-worry-about-the-intel-c-compiler-emitting-suboptimal-code-fo

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!