Why the array index is faster than pointer? Isn\'t pointer supposed to be faster than array index?
** i used time.h clock_t to tested two functions, each loop 2 mil
Compiler optimizations are pattern matching.
When your compiler optimizes, it looks for known code patterns, and then transforms the code according to some rule. Your two code snippets seem to trigger different transforms, and thus produce slightly different code.
This is one of the reasons why we always insist to actually measure the resulting performance when it comes to optimizations: You can never be sure what your compiler turns your code into unless you test it.
If you are really curious, try compiling your code with gcc -S -Os
, this produces the most readable, yet optimized assembler code. On your two functions, I get the following assembler with that:
pointer code:
.L2:
cmpq %rax, %rdi
jnb .L5
movl $0, (%rdi)
addq $4, %rdi
jmp .L2
.L5:
index code:
.L7:
cmpl %eax, %esi
jle .L9
movl $0, (%rdi,%rax,4)
incq %rax
jmp .L7
.L9:
The differences are slight, but may indeed trigger a performance difference, most importantly the difference between using addq
and incq
could be significant.
Array dereference p[i]
is *(p + i)
. Compilers make use of instructions that do math + dereference in 1 or 2 cycles (e.g. x86 LEA instruction) to optimize for speed.
With the pointer loop, it splits the access and offset into to separate parts and the compiler cannot optimize it.
No, never ever pointers are supposed to be faster than array index. If one of the code is faster than the other, it's mostly because some address computations might be different. The question also should provide information of compiler and optimization flags as it can heavily affect the performance.
Array index in your context (array bound is not known) is exactly identical to the pointer operation. From a viewpoint of compilers, it is just different expression of pointer arithmetic. Here is an example of an optimized x86 code in Visual Studio 2010 with full optimization and no inline.
3: void myPointer(int a[], int size)
4: {
013E1800 push edi
013E1801 mov edi,ecx
5: int *p;
6: for(p = a; p < &a[size]; p++)
013E1803 lea ecx,[edi+eax*4]
013E1806 cmp edi,ecx
013E1808 jae myPointer+15h (13E1815h)
013E180A sub ecx,edi
013E180C dec ecx
013E180D shr ecx,2
013E1810 inc ecx
013E1811 xor eax,eax
013E1813 rep stos dword ptr es:[edi]
013E1815 pop edi
7: {
8: *p = 0;
9: }
10: }
013E1816 ret
13: void myIndex(int a[], int size)
14: {
15: int i;
16: for(i = 0; i < size; i++)
013E17F0 test ecx,ecx
013E17F2 jle myIndex+0Ch (13E17FCh)
013E17F4 push edi
013E17F5 xor eax,eax
013E17F7 mov edi,edx
013E17F9 rep stos dword ptr es:[edi]
013E17FB pop edi
17: {
18: a[i] = 0;
19: }
20: }
013E17FC ret
At a glance, myIndex
looks faster because the number of instructions are less, however, the two pieces of the code are essentially the same. Both eventually use rep stos
, which is a x86's repeating (loop) instruction. The only difference is because of the computation of the loop bound. The for
loop in myIndex
has the trip count size
as it is (i.e., no computation is needed). But, myPointer
needs some computation to get the trip count of the for
loop. This is the only difference. The important loop operations are just the same. Thus, the difference is negligible.
To summarize, the performance of myPointer
and myIndex
in an optimized code should be identical.
FYI, if the array's bound is known at compile time, e.g., int A[constant_expression]
, then the accesses on this array may be much faster than the pointer one. This is mostly because the array accesses are free from the pointer analysis problem. Compilers can perfectly compute the dependency information on computations and accesses on a fixed-size array, so it can do advanced optimizations including automatic parallelization.
However, if computations are pointer based, compilers must perform pointer analysis for further optimization, which is pretty much limited in C/C++. It generally ends up with conservative results on pointer analysis and results in a few optimization opportunity.
The times are so close together that if you did them repeatedly, you may not see much of a difference. Both code segments compile to the exact same assembly. By definition, there is no difference.