Why is this faster on 64 bit than 32 bit?

前端 未结 5 1186
盖世英雄少女心
盖世英雄少女心 2020-12-01 10:36

I\'ve been doing some performance testing, mainly so I can understand the difference between iterators and simple for loops. As part of this I created a simple set of tests

相关标签:
5条回答
  • 2020-12-01 11:25

    The long datatype is 64-bits and in a 64-bit process, it is processed as a single native-length unit. In a 32-bit process, it is treated as 2 32-bit units. Math, especially on these "split" types will be processor-intensive.

    0 讨论(0)
  • 2020-12-01 11:26

    As others said, doing 64-bit arithmetic on a 32-bit machine is going to take some extra manipulation, more-so if doing multiplication or division.

    Back to your concern about iterators vs. simple for loops, iterators can have fairly complex definitions, and they will only be fast if inlining and compiler-optimization is capable of replacing them with the equivalent simple form. It really depends on the type of iterator and the underlying container implementation. The simplest way to tell if it has been optimized reasonably well is to examine the generated assembly code. Another way is to put it in a long-running loop, pause it, and look at the stack to see what it's doing.

    0 讨论(0)
  • 2020-12-01 11:40

    Not sure of "why" but I would make sure to call your "method" at least once outside your timer loop so you're not counting 1st-time jitting. (Since this looks like C# to me).

    0 讨论(0)
  • 2020-12-01 11:42

    x64 processors contain 64 bit general purpose registers with which they can calculate operations on 64 bit integers in a single instruction. 32 bit processors does not have that. This is especially relevant to your program as it's heavily using long (64-bit integer) variables.

    For instance, in x64 assembly, to add a couple 64 bit integers stored in registers, you can simply do:

    ; adds rbx to rax
    add rax, rbx
    

    To do the same operation on a 32 bit x86 processor, you'll have to use two registers and manually use the carry of the first operation in the second operation:

    ; adds ecx:ebx to edx:eax
    add eax, ebx
    adc edx, ecx
    

    More instructions and less registers mean more clock cycles, memory fetches, ... which will ultimately result in reduced performance. The difference is very notable in number crunching applications.

    For .NET applications, it seems that the 64-bit JIT compiler performs more aggressive optimizations improving overall performance.

    Regarding your point about array iteration, the C# compiler is clever enough to recognize foreach over arrays and treat them specially. The generated code is identical to using a for loop and it's in recommended that you use foreach if you don't need to change the array element in the loop. Besides that, the runtime recognizes the pattern for (int i = 0; i < a.Length; ++i) and omits the bound checks for array accesses inside the loop. This will not happen in the LongLength case and will result in decreased performance (both for 32 bit and 64 bit case); and since you'll be using long variables with LongLength, the 32 bit performance will get degraded even more.

    0 讨论(0)
  • 2020-12-01 11:42

    Oh, that's easy. I assume that you are using x86 technology. What do you need for doing the loops in assembler ?

    1. One index variable i
    2. One result variable result
    3. An long array of results.

    So you need three variables. Variable access is fastest if you can store them in registers; if you need to move them in and out to memory, you are losing speed. For 64bit longs you need two registers on 32bit and we have only four registers, so chances are high that all variables cannot be stored in registers, but must be stored in intermediate storage like the stack. This alone will slow down access considerably.

    Addition of numbers: Addition must be two times; the first time without carry bit and the second time with carry bit. 64bit can it do in one cycle.

    Moving/Loading: For every 1-cycle 64bit var you need two cycles for 32bit to load/unload a long integer into memory.

    Every component datatype (datatypes which consists of more bits than register/address bits) will lose considerable speed. The speed gains of an order of magnitude is the reason GPUs still prefer floats (32bit) instead of doubles (64bit).

    0 讨论(0)
提交回复
热议问题