Is there a good way to optimize the multiplication of two BigNums?

问题

I have a class BigNum:

struct BigNum{
    vector <int> digits;
    
    BigNum(vector <int> data){
        for(int item : data){d.push_back(item);}
    }
    
    int get_digit(size_t index){
        return (index >= d.size() ? 0 : d[index]);
    }
};

and I'm trying to write code to multiply two BigNums. Currently, I've been using the traditional method of multiplication, which is multiplying the first number by each digit of the other and adding it to a running total. Here's my code:

BigNum add(BigNum a, BigNum b){ // traditional adding: goes digit by digit and keeps a "carry" variable
    vector <int> ret;

    int carry = 0;
    for(size_t i = 0; i < max(a.digits.size(), b.digits.size()); ++i){
        int curr = a.get_digit(i) + b.get_digit(i) + carry;

        ret.push_back(curr%10);
        carry = curr/10;
    }

    // leftover from carrying values
    while(carry != 0){
        ret.push_back(carry%10);
        carry /= 10;
    }

    return BigNum(ret);
}

BigNum mult(BigNum a, BigNum b){
    BigNum ret({0});

    for(size_t i = 0; i < a.d.size(); ++i){
        vector <int> row(i, 0); // account for the zeroes at the end of each row

        int carry = 0;
        for(size_t j = 0; j < b.d.size(); ++j){
            int curr = a.d[i] * b.d[j] + carry;

            row.push_back(curr%10);
            carry = curr/10;
        }

        while(carry != 0){ // leftover from carrying
            row.push_back(carry%10);
            carry /= 10;
        }

        ret = add(ret, BigNum(row)); // add the current row to our running sum
    }

    return ret;
}

This code still works pretty slowly; it takes around a minute to calculate the factorial of 1000. Is there a better way to multiply two BigNums? If not, is there a better way to represent large numbers that will speed up this code?

回答1:

If you use a different base, say 2^16 instead of 10, the multiplication will be much faster.

But getting to print in decimal will be longer.

回答2:

Get a ready made bignum library. Those tend to be optimized to death, all the way down to specific CPU models, with assembly where necessary.

GMP and MPIR are two popular ones. The latter is more Windows friendly.

回答3:

One way is to use a larger base than ten. It's a huge waste, in both time and space, to take an int, able to hold values up to about four billion (unsigned variant) and use it to store single digits.

What you can do is use unsigned int/long values for a start, then choose a base such that the square of that base will fit into the value. So, for example, the square root of the largest 32-bit unsigned int is a touch over 65,000 so you choose 10,000 as the base.

So a "bigdigit" (I'll use that term for a digit in the base-10,000 scheme, is effectively equal to four decimal digits (just digits from here on), and this has several effects:

much less space taken up (about 1/1,000th of the space);
still no chance of overflow when you multiply four-digit groups.
faster multiplications, doing four digits at a time rather than one; and
still easy printing since it's in a base-ten-to-the-power-of-something format.

Those last two points warrant some explanation.

On the second last one, it should be something like sixteen times faster since, to multiply 1234 and 5678, each digit in the first has to be multiplied with every digit in the second. For a normal digit, that's sixteen multiplications, while it's only one for a bigdigit.

Since the bigdigits are exactly four digits, the output is still relatively easy, something like:

printf("%d", node[0]);
for (int i = 1; i < node_count; ++i) {
    printf("%04d", node[0]);
}

Beyond that, and the normal C++ optimisations like passing const references rather than copying all objects, you can examine the same tricks used by MPIR and GMP. I tend to avoid them myself since they have (or did have at some point) a rather nasty habit of just violently exiting programs when they ran out of memory, something I find inexcusable in a general purpose library. In any case, I have routines built up over time that do, while nowhere near as much as GMP, certainly more than I need (and that use the same algorithms in many cases).

One of the tricks for multiplication is the Karatsuba algorithm (to be honest, I'm not sure if GMP/MPIR use this but, unless they've got something much better, I suspect they would).

It basically involves splitting the numbers into parts so that a = a₁a₀ is the first, and b = b₁b₀. In other words:

a = a₁ x B^p + a₀
b = b₁ x B^p + b₀

The B^p is just some integral power of the actual base you're using, and can generally be the closest value to the square root of the larger number (about half as many digits).

You then work out:

c₂ = a₁ x b₁
c₀ = a₀ x b₀
c₁ = (a₁ + a₀) x (b₁ + b₀) - c₂ - c₀

That last point is tricky but it has been proven mathematically. I suggest if you want to go into that level of depth, I'm not the best person for the job. At some point, even I, the consumate "don't believe anything you can't prove yourself" type, have take the expert opinions as fact :-)

Then you work some add/shift magic (multiplication looks to be involved but, since it's multiplication by a power of the base, it's really just a matter of shifting values left).

c = c₂ x B^2p + c₁ x B^p + c₀

Now you may be wondering why three multiplications is a better approach than one, but you need to take into account that these multiplications are using far fewer digits than the original. If you remember back to the comment I made above about doing one multiplication rather than sixteen when switching from base-10 to base-10,000, you'll realise the number of digit multiplications is proportional to the square of the numbers of digits.

That means it can be better to perform three smaller multiplications even with some extra shifting and adding. And the beauty of this solution is that you can recursively apply it to the smaller numbers until you get down to the point where you're just multiplying two unsigned int values.

I probably haven't done the concept justice, and you do need to watch for and adjust the case where c1 becomes negative but, if you want raw speed, this is the sort of thing you'll have to look into.

And, as my more advanced math buddies will tell me (quite often), if you're not willing to have your entire head explode, you probably shouldn't be doing math :-)

来源：https://stackoverflow.com/questions/62441306/is-there-a-good-way-to-optimize-the-multiplication-of-two-bignums

标签

c++

struct

bignum