Unexpected results when working with very big integers on interpreted languages

断了今生、忘了曾经 提交于 2019-11-28 02:51:11

Python works:

>>> sum(x for x in xrange(1000000000 + 1))
500000000500000000

Or:

>>> sum(xrange(1000000000+1))
500000000500000000

Python's int auto promotes to a Python long which supports arbitrary precision. It will produce the correct answer on 32 or 64 bit platforms.

This can be seen by raising 2 to a power far greater than the bit width of the platform:

>>> 2**99
633825300114114700748351602688L

You can demonstrate (with Python) that the erroneous values you are getting in PHP is because PHP is promoting to a float when the values are greater than 2**32-1:

>>> int(sum(float(x) for x in xrange(1000000000+1)))
500000000067108992

Your Go code uses integer arithmetic with enough bits to give an exact answer. Never touched PHP or Node.js, but from the results I suspect the math is done using floating point numbers and should be thus expected not to be exact for numbers of this magnitude.

user568109

The reason is that the value of your integer variable sum exceeds the maximum value. And the sum you get is result of float-point arithmetic which involves rounding off. Since other answers did not mention the exact limits, I decided to post it.

The max integer value for PHP for:

  • 32-bit version is 2147483647
  • 64-bit version is 9223372036854775807

So it means either you are using 32 bit CPU or 32 bit OS or 32 bit compiled version of PHP. It can be found using PHP_INT_MAX. The sum would be calculated correctly if you do it on a 64 bit machine.

The max integer value in JavaScript is 9007199254740992. The largest exact integral value you can work with is 253 (taken from this question). The sum exceeds this limit.

If the integer value does not exceed these limits, then you are good. Otherwise you will have to look for arbitrary precision integer libraries.

CyberSkull

Here is the answer in C, for completeness:

#include <stdio.h>

int main(void)
{
    unsigned long long sum = 0, i;

    for (i = 0; i <= 1000000000; i++)    //one billion
        sum += i;

    printf("%llu\n", sum);  //500000000500000000

    return 0;
}

The key in this case is using C99's long long data type. It provides the biggest primitive storage C can manage and it runs really, really fast. The long long type will also work on most any 32 or 64-bit machine.

There is one caveat: compilers provided by Microsoft explicitly do not support the 14 year-old C99 standard, so getting this to run in Visual Studio is a crapshot.

My guess is that when the sum exceeds the capacity of a native int (232-1 = 2,147,483,647), Node.js and PHP switch to a floating point representation and you start getting round-off errors. A language like Go will probably try to stick with an integer form (e.g., 64-bit integers) as long as possible (if, indeed, it didn't start with that). Since the answer fits in a 64-bit integer, the computation is exact.

Perl script give us the expected result:

use warnings;
use strict;

my $sum = 0;
for(my $i = 0; $i <= 1_000_000_000; $i++) {
    $sum += $i;
}
print $sum, "\n";  #<-- prints: 500000000500000000
dognose

The Answer to this is "surprisingly" simple:

First - as most of you might know - a 32-bit integer ranges from −2,147,483,648 to 2,147,483,647. So, what happens if PHP gets a result, that is LARGER than this?

Usually, one would expect a immediate "Overflow", causing 2,147,483,647 + 1 to turn into −2,147,483,648. However, that is NOT the case. IF PHP Encounters a larger number, it Returns FLOAT instead of INT.

If PHP encounters a number beyond the bounds of the integer type, it will be interpreted as a float instead. Also, an operation which results in a number beyond the bounds of the integer type will return a float instead.

http://php.net/manual/en/language.types.integer.php

This said, and knowing that PHP FLOAT implementation is following the IEEE 754 double precision Format, means, that PHP is able to deal with numbers upto 52 bit, without loosing precision. (On a 32-bit System)

So, at the Point, where your Sum hits 9,007,199,254,740,992 (which is 2^53) The Float value returned by the PHP Maths will no longer be precise enough.

E:\PHP>php -r "$x=bindec(\"100000000000000000000000000000000000000000000000000000\"); echo number_format($x,0);"

9,007,199,254,740,992

E:\PHP>php -r "$x=bindec(\"100000000000000000000000000000000000000000000000000001\"); echo number_format($x,0);"

9,007,199,254,740,992

E:\PHP>php -r "$x=bindec(\"100000000000000000000000000000000000000000000000000010\"); echo number_format($x,0);"

9,007,199,254,740,994

This example Shows the Point, where PHP is loosing precision. First, the last significatn bit will be dropped, causing the first 2 expressions to result in an equal number - which they aren't.

From NOW ON, the whole math will go wrong, when working with default data-types.

•Is it the same problem for other interpreted language such as Python or Perl?

I don't think so. I think this is a problem of languages that have no type-safety. While a Integer Overflow as mentioned above WILL happen in every language that uses fixed data types, the languages without type-safety might try to catch this with other datatypes. However, once they hit their "natural" (System-given) Border - they might return anything, but the right result.

However, each language may have different threadings for such a Scenario.

The other answers already explained what is happening here (floating point precision as usual).

One solution is to use an integer type big enough, or to hope the language will chose one if needed.

The other solution is to use a summation algorithm that knows about the precision problem and works around it. Below you find the same summation, first with with 64 bit integer, then with 64 bit floating point and then using floating point again, but with the Kahan summation algorithm.

Written in C#, but the same holds for other languages, too.

long sum1 = 0;
for (int i = 0; i <= 1000000000; i++)
{
    sum1 += i ;
}
Console.WriteLine(sum1.ToString("N0"));
// 500.000.000.500.000.000

double sum2 = 0;
for (int i = 0; i <= 1000000000; i++)
{
    sum2 += i ;
}
Console.WriteLine(sum2.ToString("N0"));
// 500.000.000.067.109.000

double sum3 = 0;
double error = 0;
for (int i = 0; i <= 1000000000; i++)
{
    double corrected = i - error;
    double temp = sum3 + corrected;
    error = (temp - sum3) - corrected;
    sum3 = temp;
}
Console.WriteLine(sum3.ToString("N0"));
//500.000.000.500.000.000

The Kahan summation gives a beautiful result. It does of course take a lot longer to compute. Whether you want to use it depends a) on your performance vs. precision needs, and b) how your language handles integer vs. floating point data types.

If you have 32-Bit PHP, you can calculate it with bc:

<?php

$value = 1000000000;
echo bcdiv( bcmul( $value, $value + 1 ), 2 );
//500000000500000000

In Javascript you have to use arbitrary number library, for example BigInteger:

var value = new BigInteger(1000000000);
console.log( value.multiply(value.add(1)).divide(2).toString());
//500000000500000000

Even with languages like Go and Java you will eventually have to use arbitrary number library, your number just happened to be small enough for 64-bit but too high for 32-bit.

In Ruby:

sum = 0
1.upto(1000000000).each{|i|
  sum += i
}
puts sum

Prints 500000000500000000, but takes a good 4 minutes on my 2.6 GHz Intel i7.


Magnuss and Jaunty have a much more Ruby solution:

1.upto(1000000000).inject(:+)

To run a benchmark:

$ time ruby -e "puts 1.upto(1000000000).inject(:+)"
ruby -e "1.upto(1000000000).inject(:+)"  128.75s user 0.07s system 99% cpu 2:08.84 total

I use node-bigint for big integer stuff:
https://github.com/substack/node-bigint

var bigint = require('bigint');
var sum = bigint(0);
for(var i = 0; i <= 1000000000; i++) { 
  sum = sum.add(i); 
}
console.log(sum);

It's not as quick as something that can use native 64-bit stuff for this exact test, but if you get into bigger numbers than 64-bit, it uses libgmp under the hood, which is one of the faster arbitrary precision libraries out there.

took ages in ruby, but gives the correct answer:

(1..1000000000).reduce(:+)
 => 500000000500000000 

To get the correct result in php I think you'd need to use the BC math operators: http://php.net/manual/en/ref.bc.php

Here is the correct answer in Scala. You have to use Longs otherwise you overflow the number:

println((1L to 1000000000L).reduce(_ + _)) // prints 500000000500000000

There's actually a cool trick to this problem.

Assume it was 1-100 instead.

1 + 2 + 3 + 4 + ... + 50 +

100 + 99 + 98 + 97 + ... + 51

= (101 + 101 + 101 + 101 + ... + 101) = 101*50

Formula:

For N= 100: Output = N/2*(N+1)

For N = 1e9: Output = N/2*(N+1)

This is much faster than looping through all of that data. Your processor will thank you for it. And here is an interesting story regarding this very problem:

http://www.jimloy.com/algebra/gauss.htm

This gives the proper result in PHP by forcing the integer cast.

$sum = (int) $sum + $i;

Common Lisp is one of the fastest interpreted* languages and handles arbitrarily large integers correctly by default. This takes about 3 second with SBCL:

* (time (let ((sum 0)) (loop :for x :from 1 :to 1000000000 :do (incf sum x)) sum))

Evaluation took:
  3.068 seconds of real time
  3.064000 seconds of total run time (3.044000 user, 0.020000 system)
  99.87% CPU
  8,572,036,182 processor cycles
  0 bytes consed

500000000500000000
  • By interpreted, I mean, I ran this code from the REPL, SBCL may have done some JITing internally to make it run fast, but the dynamic experience of running code immediately is the same.

I don't have enough reputation to comment on @postfuturist's Common Lisp answer, but it can be optimized to complete in ~500ms with SBCL 1.1.8 on my machine:

CL-USER> (compile nil '(lambda () 
                        (declare (optimize (speed 3) (space 0) (safety 0) (debug 0) (compilation-speed 0))) 
                        (let ((sum 0))
                          (declare (type fixnum sum))
                          (loop for i from 1 to 1000000000 do (incf sum i))
                          sum)))
#<FUNCTION (LAMBDA ()) {1004B93CCB}>
NIL
NIL
CL-USER> (time (funcall *))
Evaluation took:
  0.531 seconds of real time
  0.531250 seconds of total run time (0.531250 user, 0.000000 system)
  100.00% CPU
  1,912,655,483 processor cycles
  0 bytes consed

500000000500000000

Racket v 5.3.4 (MBP; time in ms):

> (time (for/sum ([x (in-range 1000000001)]) x))
cpu time: 2943 real time: 2954 gc time: 0
500000000500000000

Works fine in Rebol:

>> sum: 0
== 0

>> repeat i 1000000000 [sum: sum + i]
== 500000000500000000

>> type? sum
== integer!

This was using Rebol 3 which despite being 32 bit compiled it uses 64-bit integers (unlike Rebol 2 which used 32 bit integers)

I wanted to see what happened in CF Script

<cfscript>
ttl = 0;

for (i=0;i LTE 1000000000 ;i=i+1) {
    ttl += i;
}
writeDump(ttl);
abort;
</cfscript>

I got 5.00000000067E+017

This was a pretty neat experiment. I'm fairly sure I could have coded this a bit better with more effort.

ActivePerl v5.10.1 on 32bit windows, intel core2duo 2.6:

$sum = 0;
for ($i = 0; $i <= 1000000000 ; $i++) {
  $sum += $i;
}
print $sum."\n";

result: 5.00000000067109e+017 in 5 minutes.

With "use bigint" script worked for two hours, and would worked more, but I stopped it. Too slow.

Blacksad

For the sake of completeness, in Clojure (beautiful but not very efficient):

(reduce + (take 1000000000 (iterate inc 1))) ; => 500000000500000000

AWK:

BEGIN { s = 0; for (i = 1; i <= 1000000000; i++) s += i; print s }

produces the same wrong result as PHP:

500000000067108992

It seems AWK uses floating point when the numbers are really big, so at least the answer is the right order-of-magnitude.

Test runs:

$ awk 'BEGIN { s = 0; for (i = 1; i <= 100000000; i++) s += i; print s }'
5000000050000000
$ awk 'BEGIN { s = 0; for (i = 1; i <= 1000000000; i++) s += i; print s }'
500000000067108992

Category other interpreted language:

Tcl:

If using Tcl 8.4 or older it depends if it was compiled with 32 or 64 bit. (8.4 is end of life).

If using Tcl 8.5 or newer which has arbitrary big integers, it will display the correct result.

proc test limit {
    for {set i 0} {$i < $limit} {incr i} {
        incr result $i
    }
    return $result
}
test 1000000000 

I put the test inside a proc to get it byte-compiled.

For the PHP code, the answer is here:

The size of an integer is platform-dependent, although a maximum value of about two billion is the usual value (that's 32 bits signed). 64-bit platforms usually have a maximum value of about 9E18. PHP does not support unsigned integers. Integer size can be determined using the constant PHP_INT_SIZE, and maximum value using the constant PHP_INT_MAX since PHP 4.4.0 and PHP 5.0.5.

Harbour:

proc Main()

   local sum := 0, i

   for i := 0 to 1000000000
      sum += i
   next

   ? sum

   return

Results in 500000000500000000. (on both windows/mingw/x86 and osx/clang/x64)

Erlang works:

from_sum(From,Max) ->
    from_sum(From,Max,Max).
from_sum(From,Max,Sum) when From =:= Max ->
    Sum;
from_sum(From,Max,Sum) when From =/= Max -> 
    from_sum(From+1,Max,Sum+From).

Results: 41> useless:from_sum(1,1000000000). 500000000500000000

Funny thing, PHP 5.5.1 gives 499999999500000000 (in ~ 30s), while Dart2Js gives 500000000067109000 (which is to be expected, since it's JS that gets executed). CLI Dart gives the right answer ... instantly.

Erlang gives the expected result too.

sum.erl:

-module(sum).
-export([iter_sum/2]).

iter_sum(Begin, End) -> iter_sum(Begin,End,0).
iter_sum(Current, End, Sum) when Current > End -> Sum;
iter_sum(Current, End, Sum) -> iter_sum(Current+1,End,Sum+Current).

And using it:

1> c(sum).
{ok,sum}
2> sum:iter_sum(1,1000000000).
500000000500000000
Samuel Henry

Smalltalk:

(1 to: 1000000000) inject: 0 into: [:subTotal :next | subTotal + next ]. 

"500000000500000000"
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!