I don\'t Java much.
I am writing some optimized math code and I was shocked by my profiler results. My code collects values, interleaves the data and then chooses th
First of all: Your profiler shows that your spending 99% of the cpu time in the floor function. This does not indicate floor is slow. If you do nothing but floor() thats totally sane. Since other languages seem to implement floor more efficient your assumption may be correct, however.
I know from school that a naive implementation of floor (which works only for positive numbers and is one off for negative ones) can be done by casting to an integer/long. That is language agnostic and some sort of general knowledge from CS courses.
Here are some micro benches. Works on my machine and backs what I learned in school ;)
rataman@RWW009 ~/Desktop
$ javac Cast.java && java Cast
10000000 Rounds of Casts took 16 ms
rataman@RWW009 ~/Desktop
$ javac Floor.java && java Floor
10000000 Rounds of Floor took 140 ms
#
public class Cast/Floor {
private static final int ROUNDS = 10000000;
public static void main(String[] args)
{
double[] vals = new double[ROUNDS];
double[] res = new double[ROUNDS];
// awesome testdata
for(int i = 0; i < ROUNDS; i++)
{
vals[i] = Math.random() * 10.0;
}
// warmup
for(int i = 0; i < ROUNDS; i++)
{
res[i] = floor(vals[i]);
}
long start = System.currentTimeMillis();
for(int i = 0; i < ROUNDS; i++)
{
res[i] = floor(vals[i]);
}
System.out.println(ROUNDS + " Rounds of Casts took " + (System.currentTimeMillis() - start) +" ms");
}
private static double floor(double arg)
{
// Floor.java
return Math.floor(arg);
// or Cast.java
return (int)arg;
}
}
You might want to give a try to FastMath.
Here is a post about the performance of Math in Java vs. Javascript. There are a few good hints about why the default math lib is slow. They are discussing other operations than floor
, but I guess their findings can be generalized. I found it interesting.
EDIT
According to this bug entry, floor has been implemented a pure java code in 7(b79), 6u21(b01) resulting in better performance. The code of floor in the JDK 6 is still a bit longer than the one in FastMath, but might not be responsible for such a perf. degradation. What JDK are you using? Could you try with a more recent version?
Math.floor()
is insanely fast on my machine at around 7 nanoseconds per call in a tight loop. (Windows 7, Eclipse, Oracle JDK 7). I'd expect it to be very fast in pretty much all circumstances and would be extremely surprised if it turned out to be the bottleneck.
Some ideas:
Math.floor()
that are likely to be inlined.FastMath
class in the excellent Apache Commons Math library, which includes a new implementation of floor. I'd be really surprised if it is faster, but you never know.java.lang.Math
functions including Math.floor()
)It is worth noting that monitoring a method takes some overhead and in the case of VisualVM, this is fairly high. If you have a method which is called often but does very little it can appear to use lots of CPU. e.g. I have seen Integer.hashCode() as a big hitter once. ;)
On my machine a floor takes less 5.6 ns, but a cast takes 2.3 ns. You might like to try this on your machine.
Unless you need to handle corner cases, a plain cast is faster.
// Rounds to zero, instead of Negative infinity.
public static double floor(double a) {
return (long) a;
}
public static void main(String... args) {
int size = 100000;
double[] a = new double[size];
double[] b = new double[size];
double[] c = new double[size];
for (int i = 0; i < a.length; i++) a[i] = Math.random() * 1e6;
for (int i = 0; i < 5; i++) {
timeCast(a, b);
timeFloor(a, c);
for (int j = 0; j < size; j++)
if (b[i] != c[i])
System.err.println(a[i] + ": " + b[i] + " " + c[i]);
}
}
public static double floor(double a) {
return a < 0 ? -(long) -a : (long) a;
}
private static void timeCast(double[] from, double[] to) {
long start = System.nanoTime();
for (int i = 0; i < from.length; i++)
to[i] = floor(from[i]);
long time = System.nanoTime() - start;
System.out.printf("Cast took an average of %.1f ns%n", (double) time / from.length);
}
private static void timeFloor(double[] from, double[] to) {
long start = System.nanoTime();
for (int i = 0; i < from.length; i++)
to[i] = Math.floor(from[i]);
long time = System.nanoTime() - start;
System.out.printf("Math.floor took an average of %.1f ns%n", (double) time / from.length);
}
prints
Cast took an average of 62.1 ns
Math.floor took an average of 123.6 ns
Cast took an average of 61.9 ns
Math.floor took an average of 6.3 ns
Cast took an average of 47.2 ns
Math.floor took an average of 6.5 ns
Cast took an average of 2.3 ns
Math.floor took an average of 5.6 ns
Cast took an average of 2.3 ns
Math.floor took an average of 5.6 ns
Here's a sanity check for your hypothesis that the code is really spending 99% of its time in floor
. Let's assume that you have Java and C++ versions of the algorithm that are both correct in terms of the outputs they produce. For the sake of the argument, let us assume that the two versions call the equivalent floor
functions the same number of times. So a time function is
t(input) = nosFloorCalls(input) * floorTime + otherTime(input)
where floorTime
is the time taken for a call to floor
on the platform.
Now if your hypothesis is correct, and floorTime
is vastly more expensive on Java (to the extent that it takes roughly 99% of the execution time) then you would expect the Java version of the application to run a large factor (50 times or more) slower than the C++ version. If you don't see this, then your hypothesis most likely is false.
If the hypothesis is false, here are two alternative explanations for the profiling results.
This is a measurement anomaly; i.e. the profiler has somehow got it wrong. Try using a different profiler.
There is a bug in the Java version of your code that is causing it to call floor
many, many more times than in the C++ version of the code.