Why is MOD
operation more expensive than multiplication
by a bit more than a factor of 2
? Please be more specific about how CPU perfor
mod is essentially the same process as division (some systems provide a "divmod" for this reason).
The big difference between binary long mulitplication and binary long division is that long division requires you to perform an overflow test after each subtraction, while long mutiplication performs the addition unconditionally after the initial masking process.
That means you can easilly rearrange and paralleise the affffditions in long multiplication, but you can't do the same for long division. I wrote a longer answer about this at https://stackoverflow.com/a/53346554/5083516