Factor a large number efficiently with gmp

I need to get all the prime factors of large numbers that can easily get to 1k bits. The numbers are practically random so it shouldn't be hard. How do I do it efficiently? I use C++ with GMP library.

EDIT: I guess you all misunderstood me.
What I mean by prime a number is to get all prime factors of the number.
Sorry for my english, in my language prime and factor are the same :)

clarification (from OP's other post):

What I need is a way to efficiently factor(find prime factors of a number) large numbers(may get to 2048 bits) using C++ and GMP(Gnu Multiple Precession lib) or less preferably any other way. The numbers are practically random so there is little chance it will be hard to factor, and even if the number is hard to factor, I can re-roll the number(can't choose though).

A good start would be some pre-filtering with small primes, say about all primes lower than 100 000 or so. Simply try to divide by every single one of them (create a table which you then load at runtime or have it as static data in your code). It might seem slow and stupid, but if the number is totally random, this will give you some factors very fast with a huge probability. Then look at the remaining number and decide what to do next. If it is quite small (what "small" means is up to you) you could try a primality test (there is something in GMP i think) and if it gives it is a prime, you can in most of the cases trust it. Otherwise you have to factor it further.

If your numbers are really huge and you care about performance, then you definitely need to implement something more sophisticated than just a stupid division. Look at Quadratic Sieve (try wikipedia). It is quite simple but very powerful. If you are up to the chalenge, try MPQS, a variant of the quadratic sieve algorithm. This forum is a good source of information. There are even existing implementations of a tool you need - see for example this.

Note though that numbers with 1k bits are huge by all means. Factoring such a number (even with MPQS or others) might take years if you are lucky and forever if not. I think that MPQS performs well with numbers of about 100-400 bits (if they are composed of two primes almost equally large, which is the hardest case of course).

Below is a sample algorithm in Java (it's not C++ with GMP, but converting should be pretty straightforward) that:

generates a random number x of bitlength Nbits
tries to factor out all prime factors < 100, keeping a list of prime factors that divide x.
tests to see if the remaining factor is prime using Java's isProbablePrime method
If the remaining factor product is prime with sufficient probability, we have succeeded in factoring x. (STOP)
Otherwise the remaining factor product is definitely composite (see the isProbablePrime docs).
While we still have time, we run the Pollard rho algorithm until we find a divisor d.
If we run out of time, we have failed. (STOP)
We have found a divisor d. So we factor out d, add the prime factors of d to the list of prime factors of x, and go to step 4.

All the parameters of this algorithm are near the beginning of the program listing. I looked for 1024-bit random numbers, with a timeout of 250 milliseconds, and I keep running the program until I get a number x with at least 4 prime factors (sometimes the program finds a number with 1, 2, or 3 prime factors first). With this parameter set, it usually takes about 15-20 seconds on my 2.66Ghz iMac.

Pollard's rho algorithm isn't really that efficient, but it's simple, compared to the quadratic sieve (QS) or the general number field sieve (GNFS) -- I just wanted to see how the simple algorithm worked.

Why this works: (despite the claim of many of you that this is a hard problem)

The plain fact of it is, that prime numbers aren't that rare. For 1024-bit numbers, the Prime Number Theorem says that about 1 in every 1024 ln 2 (= about 710) numbers is prime.

So if I generate a random number x that is prime, and I accept probabilistic prime detection, I've successfully factored x.

If it's not prime, but I quickly factor out a few small factors, and the remaining factor is (probabilistically) prime, then I've successfully factored x.

Otherwise I just give up and generate a new random number. (which the OP says is acceptible)

Most of the numbers successfully factored will have 1 large prime factor and a few small prime factors.

The numbers that are hard to factor are the ones that have no small prime factors and at least 2 large prime factors (these include cryptographic keys that are the product of two large numbers; the OP has said nothing about cryptography), and I can just skip them when I run out of time.

package com.example;

import java.math.BigInteger;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;

public class FindLargeRandomComposite {
    final static private int[] smallPrimes = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 
        31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 
        73, 79, 83, 89, 97};

    final static private int maxTime = 250;
    final static private int Nbits = 1024;
    final static private int minFactors = 4;
    final static private int NCERTAINTY = 4096;

    private interface Predicate { public boolean isTrue(); }

    static public void main(String[] args)
    {
        Random r = new Random();
        boolean found = false;
        BigInteger x=null;
        List<BigInteger> factors=null;
        long startTime = System.currentTimeMillis();
        while (!found)
        {
            x = new BigInteger(Nbits, r);
            factors = new ArrayList<BigInteger>();
            Predicate keepRunning = new Predicate() {
                final private long stopTime = System.currentTimeMillis() + maxTime;
                public boolean isTrue() {
                    return System.currentTimeMillis() < stopTime;
                }
            };
            found = factor(x, factors, keepRunning);
            System.out.println((found?(factors.size()+" factors "):"not factored ")+x+"= product: "+factors);
            if (factors.size() < minFactors)
                found = false;
        }
        long stopTime = System.currentTimeMillis();
        System.out.println("Product verification: "+(x.equals(product(factors))?"passed":"failed"));
        System.out.println("elapsed time: "+(stopTime-startTime)+" msec");
    }

    private static BigInteger product(List<BigInteger> factors) {
        BigInteger result = BigInteger.ONE;
        for (BigInteger f : factors)
            result = result.multiply(f);
        return result;
    }

    private static BigInteger findFactor(BigInteger x, List<BigInteger> factors,
            BigInteger divisor)
    {
        BigInteger[] qr = x.divideAndRemainder(divisor);
        if (qr[1].equals(BigInteger.ZERO))
        {
            factors.add(divisor);
            return qr[0];
        }
        else
            return x;
    }

    private static BigInteger findRepeatedFactor(BigInteger x,
            List<BigInteger> factors, BigInteger p) {
        BigInteger xprev = null;
        while (xprev != x)
        {
            xprev = x;
            x = findFactor(x, factors, p);
        }
        return x;
    }

    private static BigInteger f(BigInteger x, BigInteger n)
    {
        return x.multiply(x).add(BigInteger.ONE).mod(n);
    }

    private static BigInteger gcd(BigInteger a, BigInteger b) {
        while (!b.equals(BigInteger.ZERO))
        {
            BigInteger nextb = a.mod(b);
            a = b;
            b = nextb;
        }
        return a;
    }
    private static BigInteger tryPollardRho(BigInteger n,
            List<BigInteger> factors, Predicate keepRunning) {
        BigInteger x = new BigInteger("2");
        BigInteger y = x;
        BigInteger d = BigInteger.ONE;
        while (d.equals(BigInteger.ONE) && keepRunning.isTrue())
        {
            x = f(x,n);
            y = f(f(y,n),n);
            d = gcd(x.subtract(y).abs(), n);
        }
        if (d.equals(n))
            return x;
        BigInteger[] qr = n.divideAndRemainder(d);
        if (!qr[1].equals(BigInteger.ZERO))
            throw new IllegalStateException("Huh?");
        // d is a factor of x. But it may not be prime, so run it through the factoring algorithm.
        factor(d, factors, keepRunning);
        return qr[0];
    }

    private static boolean factor(BigInteger x0, List<BigInteger> factors,
            Predicate keepRunning) {

        BigInteger x = x0;
        for (int p0 : smallPrimes)
        {
            BigInteger p = new BigInteger(Integer.toString(p0));
            x = findRepeatedFactor(x, factors, p);          
        }
        boolean done = false;
        while (!done && keepRunning.isTrue())
        {
            done = x.equals(BigInteger.ONE) || x.isProbablePrime(NCERTAINTY);
            if (!done)
            {
                x = tryPollardRho(x, factors, keepRunning);
            }
        }
        if (!x.equals(BigInteger.ONE))
            factors.add(x);
        return done;
    }
}

At the moment you cannot factor a bigint with GMP. You can convert your bigint to other libraries and use their factoring algorithms. Note that factoring of integers with >>20 digits needs specialized algorithms and is near exponentially slow.

Check out:

You could use Pollard p-1 factorization algorithm if the number you want to factor has small prime factors. It has factored out a 30 digit prime factor of the number 2 ^ 740 + 1. ECM is a similar but sub-exponetial algorithm but implementation is more difficult. The amount of time the algorithm is based on what the bound b is set as. It will factor any number which has a factor p where p - 1 is b-smooth.

//Pollard p - 1 factorization algorithm

void factor(mpz_t g, mpz_t n, long b)
{
    //sieve for primes
    std::vector<bool> r;

    for(int i = 0; i < b; i++)
        r.push_back(true);


    for(int i = 2; i < ceil(sqrt(b - 1)); i++)
        if(r.at(i) == true)
            for(int j = i * i; j < b; j += i)
                r.at(j) = false;

    std::vector<long> p;
    std::vector<long> a;
    for(int i = 2; i < b; i++)
        if(r[i] == true)
        {
            p.push_back(i);//Append the prime on to the vector
            int temp = floor(log(b) / log(i)); //temp = logb(i)

            // put primes in to sieve
            // a = the maximum power for p ^ a < bound b
            if(temp == 0)
                a.push_back(1);
            else
                a.push_back(temp);                
        }

    int m = p.size();//m = number of primes under bound b

    mpz_t c;// c is the number Which will be exponated
    mpz_init(c);
    long two = 2;
    mpz_set_ui(c, two);// set c to 2

    int z = 0;
    long x = 2;

    // loop c until a factor is found
    for(;;)
    {
    mpz_set_si( c, x);

    //powering ladder
    for(long i = 0; i < m; i++)
        for(long j = 0; j < a[i]; j++)
            mpz_powm_ui(c , c, (p[i]), n);

    //check if a factor has been found;
    mpz_sub_ui(c ,c,1);
    mpz_gcd(g ,c, n);
    mpz_add_ui(c , c, 1);

    //if g is a factor return else increment c
    if((mpz_cmp_si(g,1)) > 0 && (mpz_cmp(g,n)) < 0)
        return;
    else if (x > b)
        break;
    else
        x++;
    }

}


int main()
{
    mpz_t x;
    mpz_t g;

    //intialize g and x
    mpz_init(g);
    mpz_init_set_str(x,"167698757698757868925234234253423534235342655234234235342353423546435347",10);

    //p-1 will factor x as long as it has a factor p where p - 1 is b-smooth(has all prime factors less than bound b)
    factor(g , x, 1000);

    //output the factor, it will output 1 if algorithm fails
    mpz_out_str(NULL, 10, g);

    return 0;
}

Outputs - 7465647 Execution time - 0.003 seconds

Another Factoring algorithm created by J.Pollard was Pollards Rho algorithm which is not that quick but requires very little space. Their are also ways to parrelize it. Its complexity is O(n^1/4)

//Pollard rho factoring algorithm
void rho(mpz_t g, mpz_t n)
{
    mpz_t x;
    mpz_t y;
    mpz_init_set_ui(x ,2);
    mpz_init_set_ui(y ,2);//initialize x and y as 2
    mpz_set_ui(g , 1);
    mpz_t temp;
    mpz_init(temp);

    if(mpz_probab_prime_p(n,25) != 0)
        return;//test if n is prime with miller rabin test

    int count;
    int t1 = 0;
    int t2 = 1;
    int nextTerm = t1 + t2;
    while(mpz_cmp_ui(g,1) < 1)
    {
        f(x,n);//x is changed
        f(y,n);//y is going through the sequence twice as fast
        f(y,n);

        if(count == nextTerm)//calculate gcd every fibonacci number
        {
            mpz_sub(temp,x,y);
            mpz_gcd(g , temp, n);

            t1 = t2;
            t2 = nextTerm;
            nextTerm = t1 + t2;//calculate next fibonacci number
        }

        count ++;
    }

    return;
}

int main()
{
    mpz_t x;
    mpz_t g;

    //intialize g and x
    mpz_init(g);
    mpz_init_set_str(x,"167698757698757868925234234253423",10);


    rho(g , x);

    //output the factor, it will output 1 if algorithm fails
    mpz_out_str(NULL, 10, g);

    return 0;
}

Outputs - 353 Execution time - 0.003s

来源：https://stackoverflow.com/questions/4301434/factor-a-large-number-efficiently-with-gmp

标签

c++

math

primes

gmp

factorization