Conditional tests in primality by trial division

前端 未结 4 1653
清酒与你
清酒与你 2020-12-04 01:31

My question is about the conditional test in trial division. There seems to be some debate on what conditional test to employ. Let\'s look at the code for this from Roset

相关标签:
4条回答
  • 2020-12-04 02:12

    sqrt(n) is accurate enough as long as your sqrt is monotone increasing, it gets perfect squares right, and every unsigned int can be represented exactly as a double. All three of these are the case on every platform I know of.

    You can get around these issues (if you consider them to be issues) by implementing a function unsigned int sqrti(unsigned int n) that returns the floor of the square root of an unsigned int using Newton's method. (This is an interesting exercise if you've never done it before!)

    0 讨论(0)
  • 2020-12-04 02:20

    An answer to only a small portion of this post.

    Case 2 fix to deal with overflow.

    #include <limits.h>
    
    int is_prime(unsigned n) {
      unsigned p;
      if (!(n & 1) || n < 2)
        return n == 2;
    
      #define UINT_MAX_SQRT (UINT_MAX >> (sizeof(unsigned)*CHAR_BIT/2))
      unsigned limit = n;
      if (n >= UINT_MAX_SQRT * UINT_MAX_SQRT)
        limit = UINT_MAX_SQRT * UINT_MAX_SQRT - 1;
    
      for (p = 3; p * p < limit; p += 2)
        if (!(n % p))
          return 0;
    
      if (n != limit)
        if (!(n % p))
          return 0;
      return 1;
    }
    

    The limit calculation fails if both sizeof(unsigned) and CHAR_BIT are odd - a rare situation.

    0 讨论(0)
  • 2020-12-04 02:28

    UPD: This is a compiler optimization issue, obviously. While MinGW used only one div instruction in loop body, both GCC on Linux and MSVC failed to reuse the quotient from previous iteration.

    I think the best we could do is explicitly define quo and rem and calculate them in the same basic instruction block, to show the compiler we want both quotient and remainder.

    int is_prime(uint64_t n)
    {
        uint64_t p = 3, quo, rem;
        if (!(n & 1) || n < 2) return n == 2;
    
        quo = n / p;
        for (; p <= quo; p += 2){
            quo = n / p; rem = n % p;
            if (!(rem)) return 0;
        }
        return 1;
    }
    

    I tried your code from http://coliru.stacked-crooked.com/a/69497863a97d8953 on a MinGW-w64 compiler, case 1 is faster than case 2.

    enter image description here

    So I guess you are compiling targeted to a 32-bit architecture and used uint64_t type. Your assembly shows it doesn't use any 64-bit register.

    If I got it right, there is the reason.

    On 32-bit architecture, 64-bit numbers is represented in two 32-bit registers, your compiler will do all concatenation works. It's simple to do 64-bit addition, subtraction and multiplication. But modulo and division is done by a small function call which named as ___umoddi3 and ___udivdi3 in GCC, aullrem and aulldiv in MSVC.

    So actually you need one ___umoddi3 and one ___udivdi3 for each iteration in case 1, one ___udivdi3 and one concatenation of 64-bit multiplication in case 2. That's why case 1 seems twice slower than case 2 in your test.

    What you really get in case 1:

    L5:
        addl    $2, %esi
        adcl    $0, %edi
        movl    %esi, 8(%esp)
        movl    %edi, 12(%esp)
        movl    %ebx, (%esp)
        movl    %ebp, 4(%esp)
        call    ___udivdi3         // A call for div
        cmpl    %edi, %edx
        ja  L6
        jae L21
    L6:
        movl    %esi, 8(%esp)
        movl    %edi, 12(%esp)
        movl    %ebx, (%esp)
        movl    %ebp, 4(%esp)
        call    ___umoddi3        // A call for modulo.
        orl %eax, %edx
        jne L5
    

    What you really get in case 2:

    L26:
        addl    $2, %esi
        adcl    $0, %edi
        movl    %esi, %eax
        movl    %edi, %ecx
        imull   %esi, %ecx
        mull    %esi
        addl    %ecx, %ecx
        addl    %ecx, %edx
        cmpl    %edx, %ebx
        ja  L27
        jae L41
    L27:
        movl    %esi, 8(%esp)
        movl    %edi, 12(%esp)
        movl    %ebp, (%esp)
        movl    %ebx, 4(%esp)
        call    ___umoddi3         // Just one call for modulo
        orl %eax, %edx
        jne L26
    

    MSVC failed to reuse the result of div. The optimization is broken by return. Try these code:

    __declspec(noinline) int is_prime_A(unsigned int n)
    {
        unsigned int p;
        int ret = -1;
        if (!(n & 1) || n < 2) return n == 2;
    
        /* comparing p*p <= n can overflow */
        p = 1;
        do {
            p += 2;
            if (p >= n / p) ret = 1; /* Let's return latter outside the loop. */
            if (!(n % p)) ret = 0;
        } while (ret < 0);
        return ret;
    }
    
    __declspec(noinline) int is_prime_B(unsigned int n)
    {
        unsigned int p;
        if (!(n & 1) || n < 2) return n == 2;
    
        /* comparing p*p <= n can overflow */
        p = 1;
        do {
            p += 2;
            if (p > n / p) return 1; /* The common routine. */
            if (!(n % p)) return 0;
        } while (1);
    }
    

    The is_prime_B will be twice slower than is_prime_A on MSVC / ICC for windows.

    0 讨论(0)
  • 2020-12-04 02:29

    About your first question: why (2) is faster that (1)?
    Well, this depends on the compiler, maybe.
    However, in general one could expect that a division is a more expensive operation than a multiplication.

    About your 2nd question: is sqrt() an accurate function?

    In general, it is accurate.
    The only case that could give you problems is the one that sqrt(n) is an integer.
    For example, if n == 9 and sqrt(n) == 2.9999999999999 in your system, then you are in trouble there, because the integer part is 2, but the exact value is 3.
    However, this rare cases can easily handled by adding a not so small double constant like 0.1, say.
    Thus, you can write:

      double stop = sqrt(n) + 0.1;  
      for (unsigned int d = 2; d <= stop; d += 2)
           if (n % d == 0)
               break;  /*  not prime!! */
    

    The added term 0.1 could add one iteration to your algorithm, which is not a big issue at all.

    Finally, the obvious choice for your algorithm is (3), that is, the sqrt() approach, because there is not any calculation (multiplications or divisions), and the value stop is calculated just once.

    Another improvement that you can have is the following:

    • Note that every prime p >= 5 has the form 6n - 1 or well 6n + 1.

    So, you can alternate the increments of the variable d being 2, 4, 2, 4, and so on.

    0 讨论(0)
提交回复
热议问题