I\'m looking for a fast method to efficiently compute (a
⋅b
) modulo n
(in the mathematical sense of that) for
Ok, how about this (not tested)
modmul:
; rcx = a
; rdx = b
; r8 = n
mov rax, rdx
mul rcx
div r8
mov rax, rdx
ret
The precondition is that a * b / n <= ~0ULL
, otherwise there will be a divide error. That's a slightly less strict condition than a < n && m < n
, one of them can be bigger than n
as long as the other is small enough.
Unfortunately it has to be assembled and linked in separately, because MSVC doesn't support inline asm for 64bit targets.
It's also still slow, the real problem is that 64bit div
, which can take nearly a hundred cycles (seriously, up to 90 cycles on Nehalem for example).