Speed up x64 assembler ADD loop

后端 未结 3 2237
悲&欢浪女
悲&欢浪女 2021-02-20 05:27

I\'m working on arithmetic for multiplication of very long integers (some 100,000 decimal digits). As part of my library I to add two long numbers.

Profiling shows that

3条回答
  •  被撕碎了的回忆
    2021-02-20 06:13

    Try to prefetch data first (you could try to read more data blocks to x64 registers first then do the calculations), check if the data is aligned properly in the memory, put loop code at label aligned to 16, try to remove SIB addressing

    You could also try to shorten your code to:

    mov rax, QWORD PTR [rdx+r11*8-64]
    adc rax, QWORD PTR [r8+r11*8-64]
    mov QWORD PTR [rcx+r11*8-64], rax
    

提交回复
热议问题