I was changing my for loop to increment using ++i
instead of i++
and got to thinking, is this really necessary anymore? Surely today\'s compilers
If there is no cost to the optimization, do it. When writing the code, ++i
is just as easy to write as i++
, so prefer the former. There is no cost to it.
On the other hand, going back and making this change afterwards takes time, and it most likely won't make a noticeable difference, so you probably shouldn't bother with it.
But yes, it can make a difference. On built-in types, probably not, but for complex classes, the compiler is unlikely to be able to optimize it away. The reason for this is that the increment operation no is no longer an intrinsic operation, built into the compiler, but a function defined in the class. The compiler may be able to optimize it like any other function, but it can not, in general, assume that pre-increment can be used instead of post-increment. The two functions may do entirely different things.
So when determining which optimizations can be done by the compiler, consider whether it has enough information to perform it. In this case, the compiler doesn't know that post-increment and pre-increment perform the same modifications to the object, so it can not assume that one can be replaced with the other. But you have this knowledge, so you can safely perform the optimization.
Many of the others you mention can usually be done very efficiently by the compiler: Inlining can be done by the compiler, and it's usually better at it than you. All it needs to know is how large a proportion of the function consists of function call over head, and how often is it called? A big function that is called often probably shouldn't be inlined, because you end up copying a lot of code, resulting in a larger executable, and more instruction cache misses. Inlining is always a tradeoff, and often, the compiler is better at weighing all the factors than you.
Loop unrolling is a purely mechanic operation, and the compiler can do that easily. Same goes for strength reduction. Swapping inner and outer loops is trickier, because the compiler has to prove that the changed order of traversal won't affect the result, which is difficult to do automatically. So here is an optimization you should do yourself.
But even in the simple ones that the compiler is able to do, you sometimes have information your compiler doesn't. If you know that a function is going to be called extremely often, even if it's only called from one place, it may be worth checking whether the compiler automatically inlines it, and do it manually if not.
Sometimes you may know more about a loop than the compiler as well (for example, that the number of iterations will always be a multiple of 4, so you can safely unroll it 4 times). The compiler may not have this information, so if it were to inline the loop, it would have to insert an epilog to ensure that the last few iterations get performed correctly.
So such "small-scale" optimizations can still be necessary, if 1) you actually need the performance, and 2) you have information that the compiler doesn't.
You can't outperform the compiler on purely mechanical optimizations. But you may be able to make assumptions that the compiler can't, and that is when you're able to optimize better than the compiler.
Those optimizations are still relevant. Regarding your example, using ++i or i++ on a built-in arithmetic type has no effect.
In case of user defined increment/decrement operators, ++i is preferable, because it doesn't imply copying the incremented object.
So a good coding style is to use prefix increment/decrement in for loops.
Only if you know for certain that they're relevant. This means either you've investigated this issue before on your particular compiler or you've already done the following:
If you've done all of these things then often the best thing to do is get your compiler to emit something lower level that you can examine yourself (like assembly) and make specific judgments based on that. In my experience every compiler is a little different. Sometimes an optimization on one causes another to produce larger less efficient code.
If you haven't already done these things then I'd call it premature optimization and I'd recommend against it. Optimizing before doing these things gives rewards which are disproportionately small when compared to the cost involved.
This is a well-worn subject, and SO contains oodles of good and bad advice on it.
Let me just tell you what I have found from much experience doing performance tuning.
There are tradeoff curves between performance and other things like memory and clarity, right? And you would expect that to get better performance you would have to give something up, right?
That is only true if the program is on the tradeoff curve. Most software, as first written, is miles away from the tradeoff curve. Most of the time, it is irrelevant and ignorant to talk about giving up one thing to get another.
The method I use is not measurement, it is diagnosis. I don't care how fast various routines are or how often they are called. I want to know precisely which instructions are causing slowness, and why.
The dominant and primary cause of poor performance in good-sized software efforts (not little one-person projects) is galloping generality. Too many layers of abstraction are employed, each of which extracts a performance penalty. This is not usually a problem - until it is a problem - and then it's a killer.
So what I do is tackle one issue at a time. I call these "slugs", short for "slowness bugs". Each slug that I remove yields a speedup of anywhere from 1.1x to 10x, depending on how bad it is. Every slug that is removed makes the remaining ones take a larger fraction of the remaining time, so they become easier to find. In this way, all the "low-hanging fruit" can be quickly disposed of.
At that point, I know what is costing the time, but the fixes may be more difficult, such as partial redesign of the software, possibly by removing extraneous data structure or using code generation. If it is possible to do this, that can set off a new round of slug-removal until the program is many times not only faster than it was to begin with, but smaller and clearer as well.
I recommend getting experience like this for yourself, because then when you design software you will know what not to do, and you will make better (and simpler) designs to begin with. At the same time, you will find yourself at odds with less experienced colleagues who can't begin thinking about a design without conjuring up a dozen classes.
ADDED: Now, to try to answer your question, low-level optimization should be done when diagnosis says you have a hot-spot (i.e. some code at the bottom of the call stack appears on enough call stack samples (10% or more) to be known to be costing significant time). AND if the hot-spot is in code you can edit. If you've got a hot-spot in "new", "delete", or string-compare, look higher up the stack for things to get rid of.
Hope that helps.