One of the programming problems I have come across involves calculation of factorials of large numbers (of numbers up to 10^5). I have seen a simple Haskell code which goes lik
The below explanations are obviously not enough. Here's some slides that explain the transformation a function goes through when it's parameters are strict (like the above example) and no thunks are generated: http://www.slideshare.net/ilyasergey/static-analyses-and-code-optimizations-in-glasgow-haskell-compiler
The Haskell version will be doing just the calculation storing just the previous calculation and the applying the next one, eg 6 x 4. Whereas the Java version is doing caching (all the historic values), memory management, GC and the like.
It's doing strictness analysis and it's automatically caching the previous computation. See: http://neilmitchell.blogspot.com.au/2008/03/lazy-evaluation-strict-vs-speculative.html?m=1
More details are on the Haskell Wiki: "Optimising compilers like GHC try to reduce the cost of laziness using strictness analysis, which attempts to determine which function arguments are always evaluated by the function, and hence can be evaluated by the caller instead."
"Strictness analysis can spot the fact that the argument n is strict, and can be represented unboxed. The resulting function won't use any heap while it is running, as you'd expect."
"Strictness analysis is a process by which GHC attempts to determine, at compile-time, which data definitely will 'always be needed'. GHC can then build code to just calculate such data, rather than the normal (higher overhead) process for storing up the calculation and executing it later."
http://www.haskell.org/haskellwiki/Performance/Strictness http://www.haskell.org/haskellwiki/GHC_optimisations
Here's a related question: https://softwareengineering.stackexchange.com/q/149167/26988
It seems that in this particular case, you're seeing the difference in optimizations of a pure vs impure function. In Haskell, all functions are pure unless they're doing IO (see link).
I think what's going on is that GHC is able to optimize the code better because of the guarantee of purity. Even though there isn't a tail call, it knows there aren't any side effects (because of purity guarantee), so it can do some optimizations the Java code can't (like automatic caching and whatnot as @andrew mentions in his answer).
A better solution in Haskell would be to use the built-in product function:
factorial n = product [1..n]
This is able to do tail-call optimization because it's just iteration. The same can be done in Java with a for loop as in your example, but it won't have the benefit of being functionally pure.
Edit:
I assumed tail-call elimination was happening, but it apparently is not. Here's the original answer for reference (it still has useful information about why Haskell may be faster that Java in some recursive contexts).
Functional programming languages like Haskell take advantange of tail call elimination.
In most programming languages, recursive calls maintain a call stack. Each recursive function allocates a new stack, which isn't cleaned up until it returns. For example:
call fact()
call fact()
call fact()
cleanup
cleanup
cleanup
Functional languages, however, don't need to maintain a stack. In procedural languages, it's often difficult to tell if the return value will be used by the caling function, so it's hard to optimize. In FP, however, the return value only makes sense when the recursion is complete, so you can eliminate the call stack and end up with something like this:
call fact()
call fact()
call fact()
cleanup
The call fact()
lines can all happen in the same stack frame because the return value isn't needed in intermediate calculations.
Now, to answer your question, you can solve this problem in a variety of ways, all of which aim to eliminate the call stack:
Here are some related questions with examples for the above:
Note:
It's not guaranteed that recursive calls will reuse the same stack frame, so some implementations may reallocate on each recursive call. This is often easier and still provides the same memory safety as reusing the stack frame.
For more information about this, see these articles:
The difference is, as shachaf said, that GHC (by default) uses GMP for Integer
computations that exceed the Int
range, and GMP is rather well optimised. It has nothing to do with purity, caching, tail-call optimisation or such.
Java's BigInteger
uses more or less the naive schoolbook algorithms. If you look at the code for multiply (openjdk7), the work horse is
/**
* Multiplies int arrays x and y to the specified lengths and places
* the result into z. There will be no leading zeros in the resultant array.
*/
private int[] multiplyToLen(int[] x, int xlen, int[] y, int ylen, int[] z) {
int xstart = xlen - 1;
int ystart = ylen - 1;
if (z == null || z.length < (xlen+ ylen))
z = new int[xlen+ylen];
long carry = 0;
for (int j=ystart, k=ystart+1+xstart; j>=0; j--, k--) {
long product = (y[j] & LONG_MASK) *
(x[xstart] & LONG_MASK) + carry;
z[k] = (int)product;
carry = product >>> 32;
}
z[xstart] = (int)carry;
for (int i = xstart-1; i >= 0; i--) {
carry = 0;
for (int j=ystart, k=ystart+1+i; j>=0; j--, k--) {
long product = (y[j] & LONG_MASK) *
(x[i] & LONG_MASK) +
(z[k] & LONG_MASK) + carry;
z[k] = (int)product;
carry = product >>> 32;
}
z[i] = (int)carry;
}
return z;
}
a quadratic digit-by-digit multiplication (digits are of course not base 10). That doesn't hurt too much here, since one of the factors is always single-digit, but indicates that not too much work has yet been put into optimising BigInteger
computations in Java.
One thing that can be seen from the source is that in Java products of the form smallNumber * largeNumber
are faster than largeNumber * smallNumber
(in particular if the small number is single-digit, having that as the first number means the second loop with the nested loop doesn't run at all, so you have altogether less loop control overhead, and the loop that is run has a simpler body).
So changing
f = f.multiply(BigInteger.valueOf(i));
in your Java version to
f = BigInteger.valueOf(i).multiply(f);
gives a considerable speedup (increasing with the argument, ~2× for 25000, ~2.5× for 50000, ~2.8× for 100000).
The computation is still much slower than the GHC/GMP combination by a factor of roughly 4 in the tested range on my box, but, well, GMP's implementation is plain better optimised.
If you make computations that often multiply two large numbers, the algorithmic difference between the quadratic BigInteger
multiplication and GMP's that uses Karatsuba or Toom-Cook when the factors are large enough (FFT for really large numbers) would show.
However, if multiplying is not all that you do, if you print out the factorials, hence convert them to a String
, you get hit by the fact that BigInteger
's toString
method is abominably slow (it's roughly quadratic, so since the computation of the factorial is altogether quadratic in the length of the result, you get no [much] higher algorithmic complexity, but you get a big constant factor on top of the computation). The Show
instance for Integer
is much better, O(n * (log n)^x)
[not sure what x
is, between 1 and 2], so converting the result to String
adds just a little to the computation time.
I first want to point out two factors which are clearly not the reason for the speed difference, but have been mentioned nevertheless, in the question and some answers.
The question mentions caching, and some of the answers mention memoization. But the factorial function does not benefit from memoization, because it recursively calls itself with different arguments. So we would never hit an entry in the cache that's already filled, and the whole caching is unnecessary. Maybe people were thinking of the fibonacci function here?
For the record, Haskell would not provide automatic memoization anyway.
Both the Java and the Haskell program look already pretty optimal to me. Both programs use the iteration mechanism of choice of their respective languages: Java uses a loop, Haskell uses recursion. Both programs use a standard type for big integer arithmetic.
If anything, the Haskell version should be slower because it is not tail recursive, whereas the Java version uses a loop which is the fastest looping construct available in Java.
I don't see much scope for clever high-level optimizations a compiler could make to these programs. I suspect that the observed speed difference is due to low-level details about how big integers are implemented.
The Haskell compiler has built-in and reasonable support for Integer. That seems to be less so with Java implementations and the big integer class. I googled for "BigInteger slow" and the results suggest that the question really should be: Why is Java's BigInteger so slow? There seem to be other big integer classes that are faster. I'm not a Java expert, so I cannot answer this variant of the question in any detail.
I think the difference has nothing to do with tail-call optimization, or optimization at all. The reason I think that is that the optimization can, at it's best, only achieve something that is like your iterative Java version.
The real reason is, IMHO, that Java BigIntegers are slow compared with Haskell's.
To establish this, I propose 2 experiments:
Use the same algorithms, but use long. (The results will be some garbage for higher numbers, but the computations will be done nevertheless.) Here, the Java version should be on par with Haskell.
Use a faster big integer library in the java version. The performance should improve accordingly. There are wrappers for GMP out there, as well as improvements to the java big integers like here. The manyfold performance imporvements possible for multiplication of large numbers are telling.