How to really benchmark the memory usage of a Java application

I want to compare different implementations of Java programs in terms of their memory usage efficiency. There are different usage scenarios formulated as JUnit test cases. Actually, all the code is open source at: https://github.com/headissue/cache2k-benchmark

The general wisdom to get to the used memory of a Java program is this: Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory(), of course it is also possible to use the JMX interface to get these values.

However, the determined values of used memory is not reliable. Possible reasons:

There may be uncollected garbage
There is fragmentation, if the GC did no compaction

So far I experimented with switching to serial GC and to force garbage collection with Runtime.getRuntime().gc() before reading out the value. I've put the experimental code for this at: https://github.com/cruftex/java-memory-benchmark

If I do three gc calls before reading the values, I get this output (mvn test | grep loopCount with jdk1.7.0_51):

testBaseline1: used=1084168, loopCount=0, total=124780544
testBaseline2: used=485632, loopCount=0, total=124780544
testBaseline3: used=483760, loopCount=0, total=124780544
testBaseline4: used=483800, loopCount=0, total=124780544
testBaseline: used=484160, loopCount=0, total=124780544
test100MBytes: used=105341496, loopCount=0, total=276828160
test127MBytes: used=133653088, loopCount=0, total=469901312
test27MBytes: used=28795528, loopCount=0, total=317755392
test10MBytes: used=10969776, loopCount=0, total=124784640

With four gc calls (as checked in) I get:

testBaseline1: used=483072, loopCount=0, total=124780544
testBaseline2: used=483728, loopCount=0, total=124780544
testBaseline3: used=483768, loopCount=0, total=124780544
testBaseline4: used=483808, loopCount=0, total=124780544
testBaseline: used=483848, loopCount=0, total=124780544
test100MBytes: used=105341504, loopCount=0, total=276828160
test127MBytes: used=133653096, loopCount=0, total=469901312
test27MBytes: used=28795536, loopCount=0, total=139239424
test10MBytes: used=10969784, loopCount=0, total=124784640

So it is empirically shown, that with four GC calls, the results seem to be correct. From the GC statistics output I can see that the first GC fills the tenured space and the fourth GC call reduces it:

2015-01-08T02:30:35.069+0100: [Full GC2015-01-08T02:30:35.069+0100: [Tenured: 0K->1058K(83968K)
2015-01-08T02:30:35.136+0100: [Full GC2015-01-08T02:30:35.136+0100: [Tenured: 1058K->1058K(83968K)
2015-01-08T02:30:35.198+0100: [Full GC2015-01-08T02:30:35.198+0100: [Tenured: 1058K->1058K(83968K)
2015-01-08T02:30:35.263+0100: [Full GC2015-01-08T02:30:35.264+0100: [Tenured: 1058K->471K(83968K)

The final code, to get the memory usage value is:

try {
  Runtime.getRuntime().gc();
  Thread.sleep(55);
  Runtime.getRuntime().gc();
  Thread.sleep(55);
  Runtime.getRuntime().gc();
  Thread.sleep(55);
  Runtime.getRuntime().gc();
  Thread.sleep(55);
} catch (Exception ignore) { }
long _usedMem;
long _total;
long _total2;
long _count = -1;
// loop to get a stable reading, since memory may be resized between the method calls
do {
  _count++;
  _total = Runtime.getRuntime().totalMemory();
  try {
    Thread.sleep(12);
  } catch (Exception ignore) { }
  long _free = Runtime.getRuntime().freeMemory();
  _total2 = Runtime.getRuntime().totalMemory();
  _usedMem = _total - _free;
} while (_total != _total2);
System.out.println(_testName + ": used=" + _usedMem + ", loopCount=" + _count + ", total=" + _total);

I am pretty unsure about whether this approach is producing reliable results all the time. So some questions:

Is there some best practice to get reliable and comparable benchmark values from Java programs?
Any ideas how to tune (or actually detune) the GC for that usage case?
Is there a reliable source and a reliable behavior explaining the needed four GC calls? (BTW: java 8 is performing the same way)
Is there a way to say the JVM: "Do best possible garbage collection, I'll wait"?
In general, what might be the most "future proof" and reliable solution for the problem statement?

Update:

Although some questions above are GC related, the actual problem is not. I like to find out the memory usage of an application for a single point in time. A possible solution would also to do a depth search of all object and sum up the sizes.

Update 2:

Meanwhile I did write and extensive blog article about that problem discussing different ways how to measure the actual memory usage:

https://cruftex.net/2017/03/28/The-6-Memory-Metrics-You-Should-Track-in-Your-Java-Benchmarks.html

I also struggled with this issue and interested to know if there is any standard way.

The best I could do was to tell JVM to do its best to gather garbage as much as possible by calling the following method after a run and before the next one:

GcFinalization.awaitFullGc();

This method is from the Guava test-lib package, which can be added as a Maven dependency as:

 <dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava-testlib</artifactId>
    <version>18.0</version>
</dependency>

The implementation looks like this:

public static void awaitFullGc() {
   final CountDownLatch finalizerRan = new CountDownLatch(1);
   WeakReference<Object> ref = new WeakReference<Object>(
      new Object() {
         @Override protected void finalize() { finalizerRan.countDown(); }
      });

   await(finalizerRan);
   awaitClear(ref);

   // Hope to catch some stragglers queued up behind our finalizable object
   System.runFinalization();
 }

This gave me very consistent results for each run and makes the CPU user time (from ThreadMXBean) very close to nano time (from System.currentTimeMills). My main concern in those measurements was running time, but the memory usage was also consistent, compared to the version without this call in between.

First, you should look at JMH to find out how a proper Java bechmarking should be done.

Calling Runtime.getRuntime().gc() is definitely a bad practice - both in real-life and as well as when benchmarking GCs. To name at least one reason, by forcing a GC cycle you are direclty penalizing a performance of any GC algorithm.

Further, you cannot compare various GC algorithms by having them execute only ~4 GC cycles. You should run a proper GC benchmark - see JMH, and you need to run at least for a considerably long time - depending on Heap size, this can be 10 minutes, or a few hours...

I think your best bet for a start is to run for a JMH-like benchmark for a long time(~30minutes), collect GC logs and process the GC logs for various statistics... at least to have some kind of reasonable comparison to start with.

I want to compare different implementations of Java programs in terms of their memory usage efficiency.

One option is to run the program with:

-Xloggc:gc.log_impl1 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

then, switch to implementation 2 and re-run with

-Xloggc:gc.log_impl2 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

Then download HPjmeter, load both files into the console and use the compare gc feature. There may be a little bit of skew in the graphs, but you'll get a pretty good idea of how the programs memory profiles differ.

I would not try to invoke the GC synthetically.

来源：https://stackoverflow.com/questions/27831778/how-to-really-benchmark-the-memory-usage-of-a-java-application

标签

java

performance

memory

garbage-collection

benchmarking