问题
I have one strange problem. I have following piece of code:
template<clss index, class policy>
inline int CBase<index,policy>::func(const A& test_in, int* srcPtr ,int* dstPtr)
{
int width = test_in.width();
int height = test_in.height();
double d = 0.0; //here is the problem
for(int y = 0; y < height; y++)
{
//Pointer initializations
//multiplication involving y
//ex: int z = someBigNumber*y + someOtherBigNumber;
for(int x = 0; x < width; x++)
{
//multiplication involving x
//ex: int z = someBigNumber*x + someOtherBigNumber;
if(soemCondition)
{
// floating point calculations
}
*dstPtr++ = array[*srcPtr++];
}
}
}
The inner loop gets executed nearly 200,000 times and the entire function takes 100 ms for completion. ( profiled using AQTimer)
I found an unused variable double d = 0.0;
outside the outer loop and removed the same. After this change, suddenly the method is taking 500ms for the same number of executions. ( 5 times slower).
This behavior is reproducible in different machines with different processor types. (Core2, dualcore processors).
I am using VC6 compiler with optimization level O2
.
Follwing are the other compiler options used :
-MD -O2 -Z7 -GR -GX -G5 -X -GF -EHa
I suspected compiler optimizations and removed the compiler optimization /O2
. After that function became normal and it is taking 100ms as old code.
Could anyone throw some light on this strange behavior?
Why compiler optimization should slow down performance when I remove unused variable ?
Note: The assembly code (before and after the change) looked same.
回答1:
If the assembly code looks the same before and after the change the error is somehow connected to how you time the function.
回答2:
VC6 is buggy as hell. It is known to generate incorrect code in several cases, and its optimizer isn't all that advanced either. The compiler is over a decade old, and hasn't even been supported for many years.
So really, the answer is "you're using a buggy compiler. Expect buggy behavior, especially when optimizations are enabled."
I don't suppose upgrading to a modern compiler (or simply testing the code on one) is an option?
Obviously, the generated assembly can not be the same, or there would be no performance difference.
The only question is where the difference lies. And with a buggy compiler, it may well be some completely unrelated part of the code that suddenly gets compiled differently and breaks. Most likely though, the assembly code generated for this function is not the same, and the differences are just so subtle you didn't notice them.
回答3:
Declare width
and height
as const {unsigned} ints. {The unsigned should be used since heights and widths are never negative.}
const int width = test_in.width();
const int height = test_in.height();
This helps the compiler with optimizing. With the values as const
, it can place them in the code or in registers, knowing that they won't change. Also, it relieves the compiler of having to guess whether the variables are changing or not.
I suggest printing out the assembly code of the versions with the unused double
and without. This will give you an insight into the compiler's thought process.
来源:https://stackoverflow.com/questions/2735245/compiler-optimization-causing-the-performance-to-slow-down