Fastest way to convert a BigInteger to a decimal (Base 10) string?

≯℡__Kan透↙ 提交于 2019-11-29 14:52:18

Save the BigInteger data in binary or hex format. It is readable to the computer, and to sufficiently dedicated humans. ;>

Spending extra effort to make the output "human readable" is a waste of time. No human is going to be able to make sense out of 450,000 digits regardless of whether they are base 10, base 16, base 2, or anything else.

UPDATE

Looking into the Base 10 conversion a little more closely, it is possible to cut the baseline performance of ToString almost in half using multiple threads on a multi core system. The main obstacle is that the largest consumer of time across the entire decimalization process is the first division operation on the original 450k digit number.

Stats on my quad core P7: 
Generating a 500k digit random number using power and multiply: 5 seconds
Dividing that big number by anything just once: 11 seconds
ToString(): 22 seconds
ToQuickString: 18 seconds
ToStringMT: 12.9 seconds

.

public static class BigIntExtensions
{
    private static List<BigInteger> powersOfTen;

    // Must be called before ToStringMt()
    public static void InitPowersOfTen(BigInteger n)
    {
        powersOfTen = new List<BigInteger>();

        powersOfTen.Add(1);

        for (BigInteger i = 10; i < n; i *= i)
            powersOfTen.Add(i);
    }

    public static string ToStringMT(this BigInteger n)
    {
        // compute the index into the powersOfTen table for the given parameter. This is very fast.
        var m = (int)Math.Ceiling(Math.Log(BigInteger.Log10(n), 2));

        BigInteger r1;
        // the largest amount of execution time happens right here:
        BigInteger q1 = BigInteger.DivRem(n, BigIntExtensions.powersOfTen[m], out r1);

        // split the remaining work across 4 threads - 3 new threads plus the current thread
        var t1 = Task.Factory.StartNew<string>(() =>
        {
            BigInteger r1r2;
            BigInteger r1q2 = BigInteger.DivRem(r1, BigIntExtensions.powersOfTen[m - 1], out r1r2);
            var t2 = Task.Factory.StartNew<string>(() => BuildString(r1r2, m - 2));
            return BuildString(r1q2, m - 2) + t2.Result;
        });
        BigInteger q1r2;
        BigInteger q1q2 = BigInteger.DivRem(q1, BigIntExtensions.powersOfTen[m - 1], out q1r2);
        var t3 = Task.Factory.StartNew<string>(() => BuildString(q1r2, m - 2));
        var sb = new StringBuilder();
        sb.Append(BuildString(q1q2, m - 2));
        sb.Append(t3.Result);
        sb.Append(t1.Result);
        return sb.ToString();
    }

    // same as ToQuickString, but bails out before m == 0 to reduce call overhead.
    // BigInteger.ToString() is faster than DivRem for smallish numbers.
    private static string BuildString(BigInteger n, int m)
    {
        if (m <= 8)
            return n.ToString();

        BigInteger remainder;
        BigInteger quotient = BigInteger.DivRem(n, powersOfTen[m], out remainder);
        return BuildString(quotient, m - 1) + BuildString(remainder, m - 1);
    }
}

For ToQuickString() and ToStringMT(), the powers of 10 array needs to be initialized prior to using these functions. Initializing this array shouldn't be included in function execution time measurements because the array can be reused across subsequent calls, so its initialization cost is amortized over the lifetime of the program, not individual function calls.

For a production system I would set up a more automatic initialization, such as initializing a reasonable number of entries in the class static constructor and then checking in ToQuickString() or ToStringMT() to see if there are enough entries in the table to handle the given BigInteger. If not, go add enough entries to the table to handle the current BigInteger, then continue with the operation.

This ToStringMT function constructs the worker tasks manually to spread the remaining work out across 4 threads on the available execution cores in a multi core CPU. You could instead just make the original ToQuickString() function spin off half of its work into another thread on each recursion, but this quickly creates too many tasks and gets bogged down in task scheduling overhead. The recursion drills all the way down to individual decimal digits. I modified the BuildString() function to bail out earlier (m <= 8 instead of m == 0) because BigInteger.ToString() is faster than DivRem for smallish numbers.

90% of ToStringMt()'s execution time is taken up by the first DivRem call. It converges very quickly after that, but the first one is really painful.

First I'd calculate all numbers of the form 10^(2^m) smaller than n. Then I'd use DivRem with the largest of these to split the problem into two subproblems. Repeat that recursively until you're down to individual digits.

var powersOfTen=new List<BigInteger>();
powersOfTen.Add(1);
for(BigInteger i=10;i<n;i=i*i)
  powersOfTen.Add(i);

string ToString(BigInteger n, int m)
{
  if(m==0)
    return n.ToString();
  quotient = DivRem(n,powersOfTen[m], remainder)
  return ToString(quotient, m-1)+ToString(remainder, m-1)
}

You can also optimize out the string concatenation entirely by directly writing into a character array.


Alternatively you could consider using base 1000'000'000 during all the calculations. That way you don't need the base conversion in the end at all. That's probably much faster for factorial calculation.

List<int> multiply(List<int> f1, int f2)
{
  int carry=0;
  for(int i=0;i<f1.Count;i++)
  {
    var product=(Int64)f1[i]*(Int64)f2;
    carry=product/1000000000;
    result.Add(product%1000000000);
  }
  if(carry!=0)
    result.Add(carry);
}

Now conversion to a base 10 string is trivial and cheap.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!