i++ less efficient than ++i, how to show this?

前端 未结 9 1744
暖寄归人
暖寄归人 2020-12-03 14:30

I am trying to show by example that the prefix increment is more efficient than the postfix increment.

In theory this makes sense: i++ needs to be able to return the

相关标签:
9条回答
  • 2020-12-03 14:46

    Ok, all this prefix/postfix "optimization" is just... some big misunderstanding.

    The major idea that i++ returns its original copy and thus requires copying the value.

    This may be correct for some unefficient implementations of iterators. However in 99% of cases even with STL iterators there is no difference because compiler knows how to optimize it and the actual iterators are just pointers that look like class. And of course there is no difference for primitive types like integers on pointers.

    So... forget about it.

    EDIT: Clearification

    As I had mentioned, most of STL iterator classes are just pointers wrapped with classes, that have all member functions inlined allowing out-optimization of such irrelevant copy.

    And yes, if you have your own iterators without inlined member functions, then it may work slower. But, you should just understand what compiler does and what does not.

    As a small prove, take this code:

    int sum1(vector<int> const &v)
    {
        int n;
        for(auto x=v.begin();x!=v.end();x++)
                n+=*x;
        return n;
    }
    
    int sum2(vector<int> const &v)
    {
        int n;
        for(auto x=v.begin();x!=v.end();++x)
                n+=*x;
        return n;
    }
    
    int sum3(set<int> const &v)
    {
        int n;
        for(auto x=v.begin();x!=v.end();x++)
                n+=*x;
        return n;
    }
    
    int sum4(set<int> const &v)
    {
        int n;
        for(auto x=v.begin();x!=v.end();++x)
                n+=*x;
        return n;
    }
    

    Compile it to assembly and compare sum1 and sum2, sum3 and sum4...

    I just can tell you... gcc give exactly the same code with -02.

    0 讨论(0)
  • 2020-12-03 14:51

    In response to Mihail, this is a somewhat more portable version his code:

    #include <cstdio>
    #include <ctime>
    using namespace std;
    
    #define SOME_BIG_CONSTANT 100000000
    #define OUTER 40
    int main( int argc, char * argv[] ) {
    
        int d = 0;
        time_t now = time(0);
        if ( argc == 1 ) {
            for ( int n = 0; n < OUTER; n++ ) {
                int i = 0;
                while(i < SOME_BIG_CONSTANT) {
                    d += i++;
                }
            }
        }
        else {
            for ( int n = 0; n < OUTER; n++ ) {
                int i = 0;
                while(i < SOME_BIG_CONSTANT) {
                    d += ++i;
                }
            }
        }
        int t = time(0) - now;  
        printf( "%d\n", t );
        return d % 2;
    }
    

    The outer loops are there to allow me to fiddle the timings to get something suitable on my platform.

    I don't use VC++ any more, so i compiled it (on Windows) with:

    g++ -O3 t.cpp
    

    I then ran it by alternating:

    a.exe   
    

    and

    a.exe 1
    

    My timing results were approximately the same for both cases. Sometimes one version would be faster by up to 20% and sometimes the other. This I would guess is due to other processes running on my system.

    0 讨论(0)
  • 2020-12-03 14:52

    Try to use while or do something with returned value, e.g.:

    #define SOME_BIG_CONSTANT 1000000000
    
    int _tmain(int argc, _TCHAR* argv[])
    {
        int i = 1;
        int d = 0;
    
        DWORD d1 = GetTickCount();
        while(i < SOME_BIG_CONSTANT + 1)
        {
            d += i++;
        }
        DWORD t1 = GetTickCount() - d1;
    
        printf("%d", d);
        printf("\ni++ > %d <\n", t1);
    
        i = 0;
        d = 0;
    
        d1 = GetTickCount();
        while(i < SOME_BIG_CONSTANT)
        {
            d += ++i;
    
        }
        t1 = GetTickCount() - d1;
    
        printf("%d", d);
        printf("\n++i > %d <\n", t1);
    
        return 0;
    }
    

    Compiled with VS 2005 using /O2 or /Ox, tried on my desktop and on laptop.

    Stably get something around on laptop, on desktop numbers are a bit different (but rate is about the same):

    i++ > 8xx < 
    ++i > 6xx <
    

    xx means that numbers are different e.g. 813 vs 640 - still around 20% speed up.

    And one more point - if you replace "d +=" with "d = " you will see nice optimization trick:

    i++ > 935 <
    ++i > 0 <
    

    However, it's quite specific. But after all, I don't see any reasons to change my mind and think there is no difference :)

    0 讨论(0)
  • 2020-12-03 14:52

    Perhaps you could just show the theoretical difference by writing out both versions with x86 assembly instructions? As many people have pointed out before, compiler will always make its own decisions on how best to compile/assemble the program.

    If the example is meant for students not familiar with the x86 instruction set, you might consider using the MIPS32 instruction set -- for some odd reason many people seem to find it to be easier to comprehend than x86 assembly.

    0 讨论(0)
  • 2020-12-03 14:57

    This code and its comments should demonstrate the differences between the two.

    class a {
        int index;
        some_ridiculously_big_type big;
    
        //etc...
    
    };
    
    // prefix ++a
    void operator++ (a& _a) {
        ++_a.index
    }
    
    // postfix a++
    void operator++ (a& _a, int b) {
        _a.index++;
    }
    
    // now the program
    int main (void) {
        a my_a;
    
        // prefix:
        // 1. updates my_a.index
        // 2. copies my_a.index to b
        int b = (++my_a).index; 
    
        // postfix
        // 1. creates a copy of my_a, including the *big* member.
        // 2. updates my_a.index
        // 3. copies index out of the **copy** of my_a that was created in step 1
        int c = (my_a++).index; 
    }
    

    You can see that the postfix has an extra step (step 1) which involves creating a copy of the object. This has both implications for both memory consumption and runtime. That is why prefix is more efficient that postfix for non-basic types.

    Depending on some_ridiculously_big_type and also on whatever you do with the result of the incrememt, you'll be able to see the difference either with or without optimizations.

    0 讨论(0)
  • 2020-12-03 14:59

    You won't see any difference with integers. You need to use iterators or something where post and prefix really do something different. And you need to turn all optimisations on, not off!

    0 讨论(0)
提交回复
热议问题