Loop unrolling behaviour in GCC

前端 未结 1 1694
醉酒成梦
醉酒成梦 2021-01-04 18:01

This question is in part a follow up question to GCC 5.1 Loop unrolling.

According to the GCC documentation, and as stated in my answer to the above question, flags

相关标签:
1条回答
  • 2021-01-04 18:44

    Why does GCC perform loop unrolling even though the flags corresponding to this behaviour are disabled?

    Think of it from a pragmatic view: what do you want when passing such flag to the compiler? No C++ developer will ask GCC to unroll or not unroll loops, just for the sake of having loops or not in assembly code, there is a goal. The goal with -fno-unroll-loops is, for example, to sacrifice a bit of speed in order to reduce the size of your binary, if you are developing an embedded software with limited storage. On the other hand, the goal with -funrool-loops is to tell the compiler that you do not care about the size of you binary, so it should not hesitate to unroll loops.

    But that does not mean that the compiler will blindly unroll or not all your loops!

    In your example, the reason is simple: the loop contains only one instruction - few bytes on any platforms - and the compiler knows that this is negligeable and will anyway take almost the same size as the assembly code needed for the loop (sub + mov + jne on x86-64).

    This is why gcc 6.2, with -O3 -fno-unroll-loops turns this code:

    int mul(int k, int j) 
    {   
      for (int i = 0; i < 5; ++i)
        volatile int k = j;
    
      return k; 
    }
    

    ... to the following assembly code:

     mul(int, int):
      mov    DWORD PTR [rsp-0x4],esi
      mov    eax,edi
      mov    DWORD PTR [rsp-0x4],esi
      mov    DWORD PTR [rsp-0x4],esi
      mov    DWORD PTR [rsp-0x4],esi
      mov    DWORD PTR [rsp-0x4],esi  
      ret    
    

    It does not listen to you because it would (almost, depending on the architecture) not change the size of the binary but it is faster. However, if you increase a bit your loop counter...

    int mul(int k, int j) 
    {   
      for (int i = 0; i < 20; ++i)
        volatile int k = j;
    
      return k; 
    }
    

    ... it follows your hint:

     mul(int, int):
      mov    eax,edi
      mov    edx,0x14
      nop    WORD PTR [rax+rax*1+0x0]
      sub    edx,0x1
      mov    DWORD PTR [rsp-0x4],esi
      jne    400520 <mul(int, int)+0x10>
      repz ret 
    

    You will get the same behavior if you keep your loop counter at 5 but you add some code into the loop.

    To sum up, think of all these optimization flags as a hint for the compiler, and from a pragmatic developer point of view. It is always a trade-off, and when you build a software, you never want to ask for all or no loop unrolling.

    As a final note, another very similar example is the -f(no-)inline-functions flag. I am fighting every day the compiler to inline (or not!) some of my functions (with the inline keyword and __attribute__ ((noinline)) with GCC), and when I check the assembly code, I see that this smartass is still doing sometimes what it wants, when I want to inline a function that is definitely too long for its taste. And most of the time, it is the right thing to do and I am happy!

    0 讨论(0)
提交回复
热议问题