Brute-force, single-threaded prime factorization

前端 未结 7 644
误落风尘
误落风尘 2020-12-07 22:08

Up for consideration is the following function which can be used to (relatively quickly) factor a 64-bit unsigned integer into its prime factors. Note that the factoring is

相关标签:
7条回答
  • 2020-12-07 22:30

    Compare such an approach to a (pre-generated) sieve. Modulo is expensive, so both approaches essentially do two things: generate potential factors, and perform modulo operations. Either program should reasonably generate a new candidate factor in less cycles than modulo takes, so either program is modulo bound.

    The given approach filters out a constant proportion of all integers, namely the multiples of 2 and 3, or 75%. One in four (as given) numbers is used as an argument to the modulo operator. I'll call it a skip filter.

    On the other hand, a sieve uses only primes as arguments to the modulo operator, and the average difference between successive primes is governed by the prime number theorem to be 1/ln(N). For example, e^20 is just under 500 million, so numbers over 500 million have under a 5% chance of being prime. If all numbers up to 2^32 are considered, 5% is a good rule of thumb.

    Therefore, a sieve will spend 5 times less time on div operations as your skip filter. The next factor to consider is the speed at which the sieve produces primes, i.e. reads them from memory or disk. If fetching one prime is faster than 4 divs, then the sieve is faster. According to my tables div throughput on my Core2 is at most one per 12 cycles. These will be hard division problems, so let's conservatively budget 50 cycles per prime. For a 2.5 GHz processor, that's 20 nanoseconds.

    In 20 ns, a 50 MB/sec hard drive can read about one byte. The simple solution is to use 4 bytes per prime, so the drive will be slower. But, we can be more clever. If we want to encode all the primes in order, we can just encode their differences. Again, the expected difference is 1/ln(N). Also, they're all even, which saves an extra bit. And they are never zero, which makes extension to a multibyte encoding free. So using one byte per prime, differences up to 512 can be stored in one byte, which gets us up to 303371455241 according to that Wikipedia article.

    Therefore, depending on the hard drive, a stored list of primes should be about equal in speed at verifying primality. If it can be stored in RAM (it's 203 MB, so subsequent runs will probably hit the disk cache), then the problem goes away entirely, as the FSB speed typically differs from the processor speed by a factor less than the FSB width in bytes — i.e., the FSB can transfer more than one prime per cycle. Then factor of improvement is the reduction in division operations, i.e. five times. This is borne out by the experimental results below.

    Of course, then there is multithreading. Ranges of either primes or skip-filtered candidates can be assigned to different threads, making either approach embarrassingly parallel. There are no optimizations that don't involve increasing the number of parallel divider circuits, unless you somehow eliminate the modulo.

    Here is such a program. It's templated so you could add bignums.

    /*
     *  multibyte_sieve.cpp
     *  Generate a table of primes, and use it to factorize numbers.
     *
     *  Created by David Krauss on 10/12/10.
     *
     */
    
    #include <cmath>
    #include <bitset>
    #include <limits>
    #include <memory>
    #include <fstream>
    #include <sstream>
    #include <iostream>
    #include <iterator>
    #include <stdint.h>
    using namespace std;
    
    char const primes_filename[] = "primes";
    enum { encoding_base = (1<< numeric_limits< unsigned char >::digits) - 2 };
    
    template< typename It >
    unsigned decode_gap( It &stream ) {
        unsigned gap = static_cast< unsigned char >( * stream ++ );
    
        if ( gap ) return 2 * gap; // only this path is tested
    
        gap = ( decode_gap( stream )/2-1 ) * encoding_base; // deep recursion
        return gap + decode_gap( stream ); // shallow recursion
    }
    
    template< typename It >
    void encode_gap( It &stream, uint32_t gap ) {
        unsigned len = 0, bytes[4];
    
        gap /= 2;
        do {
            bytes[ len ++ ] = gap % encoding_base;
            gap /= encoding_base;
        } while ( gap );
    
        while ( -- len ) { // loop not tested
            * stream ++ = 0;
            * stream ++ = bytes[ len + 1 ];
        }
        * stream ++ = bytes[ 0 ];
    }
    
    template< size_t lim >
    void generate_primes() {
        auto_ptr< bitset< lim / 2 > > sieve_p( new bitset< lim / 2 > );
        bitset< lim / 2 > &sieve = * sieve_p;
    
        ofstream out_f( primes_filename, ios::out | ios::binary );
        ostreambuf_iterator< char > out( out_f );
    
        size_t count = 0;
    
        size_t last = sqrtl( lim ) / 2 + 1, prev = 0, x = 1;
        for ( ; x != last; ++ x ) {
            if ( sieve[ x ] ) continue;
            size_t n = x * 2 + 1; // translate index to number
            for ( size_t m = x + n; m < lim/2; m += n ) sieve[ m ] = true;
            encode_gap( out, ( x - prev ) * 2 );
            prev = x;
        }
    
        for ( ; x != lim / 2; ++ x ) {
            if ( sieve[ x ] ) continue;
            encode_gap( out, ( x - prev ) * 2 );
            prev = x;
        }
    
        cout << prev * 2 + 1 << endl;
    }
    
    template< typename I >
    void factorize( I n ) {
        ifstream in_f( primes_filename, ios::in | ios::binary );
        if ( ! in_f ) {
            cerr << "Could not open primes file.\n"
                    "Please generate it with 'g' command.\n";
            return;
        }
    
        while ( n % 2 == 0 ) {
            n /= 2;
            cout << "2 ";
        }
        unsigned long factor = 1;
    
        for ( istreambuf_iterator< char > in( in_f ), in_end; in != in_end; ) {
            factor += decode_gap( in );
    
            while ( n % factor == 0 ) {
                n /= factor;
                cout << factor << " ";
            }
    
            if ( n == 1 ) goto finish;
        }
    
        cout << n;
    finish:
        cout << endl;
    }
    
    int main( int argc, char *argv[] ) {
        if ( argc != 2 ) goto print_help;
    
        unsigned long n;
    
        if ( argv[1][0] == 'g' ) {
            generate_primes< (1ul<< 32) >();
        } else if ( ( istringstream( argv[1] ) >> n ).rdstate() == ios::eofbit )
            factorize( n );
        } else goto print_help;
    
        return 0;
    
    print_help:
        cerr << "Usage:\n\t" << argv[0] << " <number> -- factorize number.\n"
                "\t" << argv[0] << " g -- generate primes file in current directory.\n";
    }
    

    Performance on a 2.2 GHz MacBook Pro:

    dkrauss$ time ./multibyte_sieve g
    4294967291
    
    real    2m8.845s
    user    1m15.177s
    sys    0m2.446s
    dkrauss$ time ./multibyte_sieve 18446743721522234449
    4294967231 4294967279 
    
    real    0m5.405s
    user    0m4.773s
    sys 0m0.458s
    dkrauss$ time ./mike 18446743721522234449
    4294967231 4294967279
    real    0m25.147s
    user    0m24.170s
    sys 0m0.096s
    
    0 讨论(0)
  • 2020-12-07 22:30

    Fermat's factorization method is simple and quick for finding pairs of large prime factors as long as you stop it before it goes too far and becomes slow. However, in my tests on random numbers such cases have been too rare to see any improvement.

    ...without using a probabalistic approach (e.g., Miller-Rabin) for determining primality

    With uniform distribution, 75% of your inputs are going to need a billion loop iterations, so it's worth spending a million operations on less deterministic techniques first even if you get an inconclusive answer and have to revert back to trial division.

    I've found the Brent variation of Pollard's Rho Method to be very good, though more complicated to code and understand. The best example I've seen is from this forum discussion. The method relies on luck, but helps often enough to be worthwhile.

    The Miller-Rabin primality test is actually deterministic up to about 10^15, which can save you the trouble of a fruitless search.

    I tried a few dozen variations and settled on the following for factoring int64 values:

    1. Trial division on small factors. (I use first 8000 precomputed primes.)
    2. 10 attempts with Pollard's Rho, each using 16 iterations
    3. Trial division to sqrt(n).

    Note that Pollard's Rho finds factors that are not necessarily prime, so recursion can be used to factor those.

    0 讨论(0)
  • 2020-12-07 22:31

    This code is rather slower, and I'm pretty sure I understand why. It's not incredibly slower, but definitely slower in the 10-20% range. The division should not be done once for every loop through, but the only way to do that is to call sqrt or something similar.

    // system specific typedef for ulong should go here (or use boost::uint64_t)
    typedef std::vector<ulong> ULongVector;
    
    void GetFactors(ULongVector &factors, ulong num)
    {
      if (num<2)
        return;
    
      ulong workingNum=num;
      ulong nextOffset=2;
    
      while (workingNum%2==0)
      {
        factors.push_back(2);
        workingNum/=2;
      }
    
      while (workingNum%3==0)
      {
        factors.push_back(3);
        workingNum/=3;
      }
    
      ulong n = 5;
      while ((workingNum != 1) && ((workingNum / n) >= n)) {
        // Is workingNum divisible by n?
        if (workingNum%n==0)
        {
          // n is a factor!
          // so is the number multiplied by n to get workingNum
    
          // Insert n into the list of factors
          factors.push_back(n);
    
          // Divide working number by n
          workingNum/=n;
        } else {
          n+=nextOffset;
          nextOffset=(nextOffset==2UL ? 4UL : 2UL);
        }
      }
    
      if (workingNum != 1) {
        // workingNum is prime, add it to the list of factors        
        factors.push_back(workingNum);
      }
    }
    
    0 讨论(0)
  • 2020-12-07 22:37

    My other answer is rather long and quite different from this one, so here's something else.

    Rather than just filter out multiples of the first two primes, or encode all relevant primes in one byte each, this program filters out multiples of all of the primes that fit in eight bits, specifically 2 through 211. So instead of passing 33% of numbers, this passes about 10% on to the division operator.

    The primes are kept in three SSE registers, and their moduli with the running counter are kept in another three. If the modulus of any prime with the counter is equal to zero, the counter cannot be prime. Also, if any modulus is equal to one, then (counter+2) cannot be prime, etc, up to (counter+30). Even numbers are ignored, and offsets like +3, +6, and +5 are skipped. Vector processing allows updating sixteen byte-sized variables at once.

    After throwing a kitchen sink full o' micro-optimizations at it (but nothing more platform specific than an inline directive), I got a 1.78x performance boost (24 s vs 13.4 s on my laptop). If using a bignum library (even a very fast one), the advantage is greater. See below for a more readable, pre-optimization version.

    /*
     *  factorize_sse.cpp
     *  Filter out multiples of the first 47 primes while factorizing a number.
     *
     *  Created by David Krauss on 10/14/10.
     *
     */
    
    #include <cmath>
    #include <sstream>
    #include <iostream>
    #include <xmmintrin.h>
    using namespace std;
    
    inline void remove_factor( unsigned long &n, unsigned long factor ) __attribute__((always_inline));
    void remove_factor( unsigned long &n, unsigned long factor ) {
        while ( n % factor == 0 ) {
            n /= factor;
            cout << factor << " ";
        }
    }
    
    int main( int argc, char *argv[] ) {
        unsigned long n;
    
        if ( argc != 2
            || ( istringstream( argv[1] ) >> n >> ws ).rdstate() != ios::eofbit ) {
            cerr << "Usage: " << argv[0] << " <number>\n";
            return 1;
        }
    
        int primes[] = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47,
                         53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127,
                         131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211 };
        for ( int *p = primes; p < primes + sizeof primes/sizeof *primes; ++ p ) {
            remove_factor( n, * p );
        }
    
        //int histo[8] = {}, total = 0;
    
        enum { bias = 15 - 128 };
        __m128i const prime1 =       _mm_set_epi8( 21, 21, 21, 22, 22, 26, 26, 17, 19, 23, 29, 31, 37, 41, 43, 47 ),
                prime2 =             _mm_set_epi8( 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127 ),
                prime3 =             _mm_set_epi8( 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211 ),
                vbias = _mm_set1_epi8( bias ),
                v3 = _mm_set1_epi8( 3+bias ), v5 = _mm_set1_epi8( 5+bias ), v6 = _mm_set1_epi8( 6+bias ), v8 = _mm_set1_epi8( 8+bias ),
                v9 = _mm_set1_epi8( 9+bias ), v11 = _mm_set1_epi8( 11+bias ), v14 = _mm_set1_epi8( 14+bias ), v15 = _mm_set1_epi8( 15+bias );
        __m128i mod1 = _mm_add_epi8( _mm_set_epi8(  3, 10, 17,  5, 16,  6, 19,  8,  9, 11, 14, 15, 18, 20, 21, 23 ), vbias ),
                mod2 = _mm_add_epi8( _mm_set_epi8( 26, 29, 30, 33, 35, 36, 39, 41, 44, 48,  50,  51,  53,  54,  56,  63 ), vbias ),
                mod3 = _mm_add_epi8( _mm_set_epi8(  65,  68,  69,  74,  75,  78,  81,  83,  86,  89,  90,  95,  96,  98,  99, 105 ), vbias );
    
        for ( unsigned long factor = 1, limit = sqrtl( n ); factor <= limit + 30; factor += 30 ) {
            if ( n == 1 ) goto done;
    
            // up to 2^32, distribution of number candidates produced (0 up to 7) is
            // 0.010841     0.0785208   0.222928    0.31905     0.246109    0.101023    0.0200728   0.00145546 
            unsigned candidates[8], *cand_pen = candidates;
            * cand_pen = 6;
            cand_pen += !( _mm_movemask_epi8( _mm_cmpeq_epi8( mod1,  v3 ) ) | _mm_movemask_epi8( _mm_or_si128( _mm_cmpeq_epi8( mod2,  v3 ), _mm_cmpeq_epi8( mod3,  v3 ) ) ) );
            * cand_pen = 10;                                                                                                                                            
            cand_pen += !( _mm_movemask_epi8( _mm_cmpeq_epi8( mod1,  v5 ) ) | _mm_movemask_epi8( _mm_or_si128( _mm_cmpeq_epi8( mod2,  v5 ), _mm_cmpeq_epi8( mod3,  v5 ) ) ) );
            * cand_pen = 12;                                                                                                                            
            cand_pen += !( _mm_movemask_epi8( _mm_cmpeq_epi8( mod1,  v6 ) ) | _mm_movemask_epi8( _mm_or_si128( _mm_cmpeq_epi8( mod2,  v6 ), _mm_cmpeq_epi8( mod3,  v6 ) ) ) );
            * cand_pen = 16;                                                                                                                            
            cand_pen += !( _mm_movemask_epi8( _mm_cmpeq_epi8( mod1,  v8 ) ) | _mm_movemask_epi8( _mm_or_si128( _mm_cmpeq_epi8( mod2,  v8 ), _mm_cmpeq_epi8( mod3,  v8 ) ) ) );
            * cand_pen = 18;                                                                                                                            
            cand_pen += !( _mm_movemask_epi8( _mm_cmpeq_epi8( mod1,  v9 ) ) | _mm_movemask_epi8( _mm_or_si128( _mm_cmpeq_epi8( mod2,  v9 ), _mm_cmpeq_epi8( mod3,  v9 ) ) ) );
            * cand_pen = 22;                                                                                                                            
            cand_pen += !( _mm_movemask_epi8( _mm_cmpeq_epi8( mod1, v11 ) ) | _mm_movemask_epi8( _mm_or_si128( _mm_cmpeq_epi8( mod2, v11 ), _mm_cmpeq_epi8( mod3, v11 ) ) ) );
            * cand_pen = 28;                                                                                                                            
            cand_pen += !( _mm_movemask_epi8( _mm_cmpeq_epi8( mod1, v14 ) ) | _mm_movemask_epi8( _mm_or_si128( _mm_cmpeq_epi8( mod2, v14 ), _mm_cmpeq_epi8( mod3, v14 ) ) ) );
            * cand_pen = 30;                                                                                                                            
            cand_pen += !( _mm_movemask_epi8( _mm_cmpeq_epi8( mod1, v15 ) ) | _mm_movemask_epi8( _mm_or_si128( _mm_cmpeq_epi8( mod2, v15 ), _mm_cmpeq_epi8( mod3, v15 ) ) ) );
    
            /*++ total;
            ++ histo[ cand_pen - candidates ];
    
            cout << "( ";
            while ( cand_pen != candidates ) cout << factor + * -- cand_pen << " ";
            cout << ")" << endl; */
    
            mod1 = _mm_sub_epi8( mod1, _mm_set1_epi8( 15 ) ); // update residuals
            __m128i mask1 = _mm_cmplt_epi8( mod1, _mm_set1_epi8( 1+bias ) );
            mask1 = _mm_and_si128( mask1, prime1 ); // residual goes to zero or negative?
            mod1 = _mm_add_epi8( mask1, mod1 ); // combine reset into zero or negative
    
            mod2 = _mm_sub_epi8( mod2, _mm_set1_epi8( 15 ) );
            __m128i mask2 = _mm_cmplt_epi8( mod2, _mm_set1_epi8( 1+bias ) );
            mask2 = _mm_and_si128( mask2, prime2 );
            mod2 = _mm_add_epi8( mask2, mod2 );
    
            mod3 = _mm_sub_epi8( mod3, _mm_set1_epi8( 15 ) );
            __m128i mask3 = _mm_cmplt_epi8( mod3, _mm_set1_epi8( 1+bias ) );
            mask3 = _mm_and_si128( mask3, prime3 );
            mod3 = _mm_add_epi8( mask3, mod3 );
    
            if ( cand_pen - candidates == 0 ) continue;
            remove_factor( n, factor + candidates[ 0 ] );
            if ( cand_pen - candidates == 1 ) continue;
            remove_factor( n, factor + candidates[ 1 ] );
            if ( cand_pen - candidates == 2 ) continue;
            remove_factor( n, factor + candidates[ 2 ] );
            if ( cand_pen - candidates == 3 ) continue;
            remove_factor( n, factor + candidates[ 3 ] );
            if ( cand_pen - candidates == 4 ) continue;
            remove_factor( n, factor + candidates[ 4 ] );
            if ( cand_pen - candidates == 5 ) continue;
            remove_factor( n, factor + candidates[ 5 ] );
            if ( cand_pen - candidates == 6 ) continue;
            remove_factor( n, factor + candidates[ 6 ] );
        }
    
        cout << n;
    done:
        /*cout << endl;
        for ( int hx = 0; hx < 8; ++ hx ) cout << (float) histo[hx] / total << " ";*/
        cout << endl;
    }
    

    .

    dkrauss$ /usr/local/bin/g++ main.cpp -o factorize_sse -O3 --profile-use
    dkrauss$ time ./factorize_sse 18446743721522234449
    4294967231 4294967279 
    
    real    0m13.437s
    user    0m13.393s
    sys 0m0.011s
    

    Below is the first draft of the above. Optimizations included

    • Make the cyclic counter merge unconditional (avoid a branch).
    • Obtain ILP by unrolling loop by a factor of 15, increasing stride to 30.
      • Inspired by your optimization.
      • 30 seems to be a sweet spot, as it removes multiples of 2, 3, and 5 for free.
      • Primes between 5 and 15 may have several multiples in one stride, so put several copies at varied phase in the vector.
    • Factor out remove_factor.
    • Change conditional, unpredictable remove_factor calls to non-branching array writes.
    • Fully unroll final loop with remove_factor call and make sure the function is always inlined.
      • Eliminate the final unrolled iteration as there is always a multiple of 7 among the candidates.
    • Add another vector containing all the remaining primes that are small enough.
    • Make more space by adding a bias to the counters, and add yet another vector. Now there are only six more primes that could possibly be filtered without bumping up to 16 bits, and I've also run out of registers: the loop needs 3 prime vectors, 3 modulus vectors, 8 constants to search for, and one constant each to increment by and to do the range check. That makes 16.
      • The gain is minimal (but positive) in this application, but the original purpose of this technique was to filter primes for the sieve in the other answer. So stay tuned…

    Readable version:

    /*
     *  factorize_sse.cpp
     *  Filter out multiples of the first 17 primes while factorizing a number.
     *
     *  Created by David Krauss on 10/14/10.
     *
     */
    
    #include <cmath>
    #include <sstream>
    #include <iostream>
    #include <xmmintrin.h>
    using namespace std;
    
    int main( int argc, char *argv[] ) {
        unsigned long n;
    
        if ( argc != 2
            || ( istringstream( argv[1] ) >> n >> ws ).rdstate() != ios::eofbit ) {
            cerr << "Usage: " << argv[0] << " <number>\n";
            return 1;
        }
    
        int primes[] = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59 };
        for ( int *p = primes; p < primes + sizeof primes/sizeof *primes; ++ p ) {
            while ( n % * p == 0 ) {
                n /= * p;
                cout << * p << " ";
            }
        }
    
        if ( n != 1 ) {
            __m128i       mod   = _mm_set_epi8( 1, 2, 3,  5,  6,  8,  9, 11, 14, 15, 18, 20, 21, 23, 26, 29 );
            __m128i const prime = _mm_set_epi8( 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59 ),
                          one = _mm_set1_epi8( 1 );
    
            for ( unsigned long factor = 1, limit = sqrtl( n ); factor < limit; ) {
                factor += 2;
                __m128i mask = _mm_cmpeq_epi8( mod, one ); // residual going to zero?
                mod = _mm_sub_epi8( mod, one ); // update other residuals
                if ( _mm_movemask_epi8( mask ) ) {
                    mask = _mm_and_si128( mask, prime ); // reset cycle if going to zero
                    mod = _mm_or_si128( mask, mod ); // combine reset into zeroed position
    
                } else while ( n % factor == 0 ) {
                    n /= factor;
                    cout << factor << " ";
                    if ( n == 1 ) goto done;
                }
            }
            cout << n;
        }
    done:
        cout << endl;
    }
    
    0 讨论(0)
  • 2020-12-07 22:45

    I'm not sure how effective these would be, but instead of

    while (workingNum%2==0)
    

    you could do

    while (workingNum & 1 == 0)
    

    I'm not sure if gcc or msvc (or whatever compiler you are using) is clever enough to change the workingNum%2 expression, but odds are it's doing the division and looking at the modulus in edx...

    My next suggestion could be completely unnecessary depending on your compiler, but you could try putting the workingNum /= 3 before the method call. G++ might be smart enough to see the unnecessary division and just use the quotient in eax (you could do this inside your larger loop too). Or, a more thorough (but painful) approach would be to inline assembly the following code.

    while (workingNum%3==0)
    {
      factors.push_back(3);
      workingNum/=3;
    }
    

    The compiler is probably translating the modular operation into a divison and then looking at the modulus in edx. The issue is, you are performing the division again (and I doubt the compiler is seeing that you just implicitly performed the division in the condition of the loop). So, you could inline assemble this. This presents two issues:

    1) The method call for push_back(3). This might mess with the registers, making this completely unnecessary.

    2) Getting a register for workingNum, but this can be determined by doing an initial modular check (to force it into %eax), or at the current moment, it will/should be in eax.

    You could write the loop as (assuming workingNum is in eax, and this is 32bit AT&T syntax, only because I don't know 64bit assembly or Intel syntax)

    asm( "
         movl     $3, %ebx
      WorkNumMod3Loop: 
         movl     %eax, %ecx    # just to be safe, backup workingNUm
         movl     $0, %edx      # zero out edx
         idivl    $3            # divide by 3. quotient in eax, remainder in edx
         cmpl     $0, %edx      # compare if it's 0
         jne      AfterNumMod3Loop    # if 0 is the remainder, jump out
    
         # no need to perform division because new workingNum is already in eax
         #factors.push_back(3) call
    
         je       WorkNumMod3Loop
      AfterNumMod3Loop: 
         movl     %ecx, %eax"
    );
    

    You should look at the assembly output for those loops. It is possible your compiler is already making these optimizations, but I doubt it. I would not be surprised if putting the workingNum /= n before the method call improves performance a little in some cases.

    0 讨论(0)
  • 2020-12-07 22:47

    Incorporating some ideas from Omnifarious, plus other improvements:

    // system specific typedef for ulong should go here (or use boost::uint64_t)
    typedef unsigned __int64 ulong;
    typedef std::vector<ulong> ULongVector;
    
    // Caller needs to pass in an empty factors vector
    void GetFactors(ULongVector &factors, ulong num)
    {
      if (num<2)
        return;
    
      ulong workingNum=num;
    
      // Factor out factors of 2
      while (workingNum%2==0)
      {
        factors.push_back(2);
        workingNum/=2;
      }
    
      // Factor out factors of 3
      while (workingNum%3==0)
      {
        factors.push_back(3);
        workingNum/=3;
      }
    
      if (workingNum==1)
        return;
    
      // Factor out factors >=5
      ulong nextOffset=2;
      char nextShift = 1;
      ulong n = 5;
      ulong nn = 25;
      do {
        // Is workingNum divisible by n?
        if (workingNum%n==0)
        {
          // n is a factor!
          // so is the number multiplied by n to get workingNum
    
          // Insert n into the list of factors
          factors.push_back(n);
    
          // Divide working number by n
          workingNum/=n;
    
          // Test for done...
          if (workingNum==1)
            return;
    
          // Try n again
        }  
        else {
          nn += (n << (nextShift+1)) + (1<<(nextShift*2)); // (n+b)^2 = n^2 + 2*n*b + b*2
          n += nextOffset;
          nextOffset ^= 6;
          nextShift ^= 3;
          // invariant: nn == n*n
          if (n & 0x100000000LL) break; // careful of integer wraparound in n^2
        }
      } while (nn <= workingNum);
    
      // workingNum is prime, add it to the list of factors        
      factors.push_back(workingNum);
    }
    
    0 讨论(0)
提交回复
热议问题