How to find out which nested for loop is better?

前端 未结 5 1012
无人及你
无人及你 2021-01-21 16:14

Somebody asked me a question: Which one is the fastest among the below two scenario:

Case 1: assume int count = 0;

for (int i = 0; i < 10; i++)
{
           


        
5条回答
  •  野的像风
    2021-01-21 16:29

    Okay, tested this on my system. With full optimization, the compiler just made count = 50, with no questions asked. Without optimization, the second version usually was the slightest bit faster, but it was completely negligible.

    The disassembly: Both loops have the precisely same code, except the compares are once with 100, once with 50 (I buffed the numbers up a bit to allow for longer execution time)

        for(int i = 0; i< 100; i++) {
    00F9140B  mov         dword ptr [i],0  
    00F91412  jmp         main+5Dh (0F9141Dh)  
    00F91414  mov         eax,dword ptr [i]  
    00F91417  add         eax,1  
    00F9141A  mov         dword ptr [i],eax  
    00F9141D  cmp         dword ptr [i],64h  
    00F91421  jge         main+88h (0F91448h)  
    
            for(int j = 0; j< 50; j++)
    00F91423  mov         dword ptr [j],0  
    00F9142A  jmp         main+75h (0F91435h)  
    00F9142C  mov         eax,dword ptr [j]  
    00F9142F  add         eax,1  
    00F91432  mov         dword ptr [j],eax  
    00F91435  cmp         dword ptr [j],32h  
    00F91439  jge         main+86h (0F91446h)  
            {
                count++;
    00F9143B  mov         eax,dword ptr [count]  
    00F9143E  add         eax,1  
    00F91441  mov         dword ptr [count],eax  
            }
    00F91444  jmp         main+6Ch (0F9142Ch)  
        }
    00F91446  jmp         main+54h (0F91414h)  
    

    The only difference between big loop outside, small loop inside, and small loop inside, and big loop outside is how often you have to do the jump from

    00F91439  jge         main+86h (0F91446h)  
    to
    00F91446  jmp         main+54h (0F91414h)  
    

    And the initialization for the loop variables:

    00F91423  mov         dword ptr [j],0  
    00F9142A  jmp         main+75h (0F91435h)  
    

    for every new loop, while skipping below part.

    00F9142C  mov         eax,dword ptr [j]  
    00F9142F  add         eax,1  
    00F91432  mov         dword ptr [j],eax  
    

    Additional commands with each iteration of the inner loop: mov, add, mov, but no mov / jmp

    Additional commands for each inner loop initialized: mov, jmp, and more often getting the JGE true.

    Thus if you run the inner loop 50 times, you will have that JGE only come true 50 times, and thus do 50 jumps there, while with the inner loop running 100 times, you will have to jump 100 times. That's the ONLY difference in the code. With this case it's hardly any difference, and most of the times you will run into your memory access being the thing causing a slowdown a LOT more than your loop ordering. Only exception: if you know you can order your loops properly to avoid branch prediction. So two things are worthy of ordering your loop one way or the other:

    -memory access

    -branch prediction

    For everything else the impact is completely negligible.

提交回复
热议问题