What are the fastest divisibility tests? Say, given a little-endian architecture and a 32-bit signed integer: how to calculate very fast that a number is divisible by 2,3,4,
Assume number
is unsigned
(32-bits). Then the following are very fast ways to compute divisibility up to 16. (I haven't measured but the assembly code indicates so.)
bool divisible_by_2 = number % 2 == 0;
bool divisible_by_3 = number * 2863311531u <= 1431655765u;
bool divisible_by_4 = number % 4 == 0;
bool divisible_by_5 = number * 3435973837u <= 858993459u;
bool divisible_by_6 = divisible_by_2 && divisible_by_3;
bool divisible_by_7 = number * 3067833783u <= 613566756u;
bool divisible_by_8 = number % 8 == 0;
bool divisible_by_9 = number * 954437177u <= 477218588u;
bool divisible_by_10 = divisible_by_2 && divisible_by_5;
bool divisible_by_11 = number * 3123612579u <= 390451572u;
bool divisible_by_12 = divisible_by_3 && divisible_by_4;
bool divisible_by_13 = number * 3303820997u <= 330382099u;
bool divisible_by_14 = divisible_by_2 && divisible_by_7;
bool divisible_by_15 = number * 4008636143u <= 286331153u;
bool divisible_by_16 = number % 16 == 0;
Regarding divisibility by d
the following rules hold:
When d
is a power of 2:
As pointed out by James Kanze, you can use is_divisible_by_d = (number % d == 0)
. Compilers are clever enough to implement this as (number & (d - 1)) == 0
which is very efficient but obfuscated.
However, when d
is not a power of 2 it looks like the obfuscations shown above are more efficient than what current compilers do. (More on that later).
When d
is odd:
The technique takes the form is_divisible_by_d = number * a <= b
where a
and b
are cleverly obtained constants. Notice that all we need is 1 multiplication and 1 comparison:
When d
is even but not a power of 2:
Then, write d = p * q
where p
is a power of 2 and q
is odd and use the "tongue in cheek" suggested by unpythonic, that is, is_divisible_by_d = is_divisible_by_p && is_divisible_by_q
. Again, only 1 multiplication (in the calculation of is_divisible_by_q
) is performed.
Many compilers (I've tested clang 5.0.0, gcc 7.3, icc 18 and msvc 19 using godbolt) replace number % d == 0
by (number / d) * d == number
. They use a clever technique (see references in Olof Forshell's answer) to replace the division by a multiplication and a bit shift. They end up doing 2 multiplications. In contrast the techniques above perform only 1 multiplication.
Update 01-Oct-2018
Looks like the algorithm above is coming to GCC soon (already in trunk):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853
The GCC's implementation seems even more efficient. Indeed, the implementation above has three parts: 1) divisibility by the divisor's even part; 2) divisibility by the divisor's odd part; 3) &&
to connect the results of the two previous steps. By using an assembler instruction which is not efficiently available in standard C++ (ror
), GCC wraps up the three parts into a single one which is very similar to that of divisibility by the odd part. Great stuff! Having this implementation available, it's better (for both clarity and performance) to fall back to %
all times.
Update 05-May-2020
My articles on the subject have been published:
Quick Modular Calculations (Part 1), Overload Journal 154, December 2019, pages 11-15.
Quick Modular Calculations (Part 2), Overload Journal 155, February 2020, pages 14-17.
Quick Modular Calculations (Part 3), Overload Journal 156, April 2020, pages 10-13.
Fast tests for divisibility depend heavily on the base in which the number is represented. In case when base is 2, I think you can only do "fast tests" for divisibility by powers of 2. A binary number is divisible by 2n iff the last n binary digits of that number are 0. For other tests I don't think you can generally find anything faster than %
.
A bit tongue in cheek, but assuming you get the rest of the answers:
Divisible_by_6 = Divisible_by_3 && Divisible_by_2;
Divisible_by_10 = Divisible_by_5 && Divisible_by_2;
Divisible_by_12 = Divisible_by_4 && Divisible_by_3;
Divisible_by_14 = Divisible_by_7 && Divisible_by_2;
Divisible_by_15 = Divisible_by_5 && Divisible_by_3;
You should just use (i % N) == 0 as your test.
My compiler (a fairly old version of gcc) generated good code for all the cases I tried. Where bit tests were appropriate it did that. Where N was a constant it didn't generate the obvious "divide" for any case, it always used some "trick".
Just let the compiler generate the code for you, it will almost certainly know more about the architecture of the machine than you do :) And these are easy optimisations where you are unlikely to think up something better than the compiler does.
It's an interesting question though. I can't list the tricks used by the compiler for each constant as I have to compile on a different computer.. But I'll update this reply later on if nobody beats me to it :)
You can replace division by a non-power-of-two constant by a multiplication, essentially multiplying by the reciprocal of your divisor. The details to get the exact result by this method are complicated.
Hacker's Delight discusses this at length in chapter 10 (unfortunately not available online).
From the quotient you can get the modulus by another multiplication and a subtraction.
The LCM of these numbers seems to be 720720. Its quite small, so that you can perform a single modulus operation and use the remainder as the index in the precomputed LUT.