I have a program that is constantly running. Normally, it seems to garbage collect, and remain under about 8MB of memory usage. However, every weekend, it refuses to garbage col
If possible, I would setup the process to dump the heap if it runs out of memory - so you can analyze it if (when) it happens again. Not an answer, but a potential route to a solution.
Here are the JVM options, taken from Oracle's Java HotSpot VM Options page. (This assumes you have an Oracle JVM):
-XX:HeapDumpPath=./java_pid.hprof
Path to directory or filename for heap dump. Manageable. (Introduced in 1.4.2 update 12, 5.0 update 7.)
-XX:-HeapDumpOnOutOfMemoryError
Dump heap to file when java.lang.OutOfMemoryError is thrown. Manageable. (Introduced in 1.4.2 update 12, 5.0 update 7.)
However the only reason this issue was noticed, is because it actually crashed from running out of memory on one weekend i.e. it must have reached the maximum heap size, and not run the garbage collector.
I think your diagnosis is incorrect. Unless there is something seriously broken about your JVM, then the application will only throw an OOME after it has just run a full garbage collect, and discovered that it still doesn't have enough free heap to proceed*.
I suspect that what is going on here is one or more of the following:
Your application has a slow memory leak. Each time you restart the application, the leaked memory gets reclaimed. So, if you restart the application regularly during the week, this could explain why it only crashes on the weekend.
Your application is doing computations that require varying amounts of memory to complete. On that weekend, someone sent it a request that required more memory that was available.
Running the GC by hand is not actually going to solve the problem in either case. What you need to do is to investigate the possibility of memory leaks, and also look at the application memory size to see if it is large enough for the tasks that are being performed.
If you can capture heap stats over a long period, a memory leak will show up as a downwards trend over time in the amount of memory available after full garbage collections. (That is the height of the longest "teeth" of the sawtooth pattern.) A workload-related memory shortage will probably show up as an occasional sharp downwards trend in the same measure over a relatively short period of time, followed by a recovery. You may see both, then you could have both things happening.
* Actually, the criteria for deciding when to give up with an OOME are a bit more complicated than this. They depend on certain JVM tuning options, and can include the percentage of time spent running the GC.
FOLLOWUP
@Ogre - I'd need a lot more information about your application to be able to answer that question (about memory leaks) with any specificity.
With your new evidence, there are two further possibilities:
Your application may be getting stuck in a loop that leaks memory as a result of the clock time-warping.
The clock time-warping may cause the GC to think that it is taking too large a percentage of run time and trigger an OOME as a result. This behaviour depends on your JVM settings.
Either way, you should lean hard on your client to get them to stop adjusting the system clock like that. (A 32 minute timewarp is way too much!!). Get them to install a system service to keep the clock in sync with network time hour by hour (or more frequent). Critically, get them to use a service with an option to adjusts the clock in small increments.
(Re the 2nd bullet: there is a GC monitoring mechanism in the JVM that measures the percentage of overall time that the JVM is spending running the GC, relative to doing useful work. This is designed to prevent the JVM from grinding to a halt when your application is really running out of memory.
This mechanism would be implemented by sampling the wall-clock time at various points. But if the wall-clock time is timewarped at a critical point, it is easy to see how the JVM may think that a particular GC run took much longer than it actually did ... and trigger the OOME.)
Ok guys, thanks for all your help. The correct answer however, turned out to have nothing to do with the program itself.
It seems that at the time that the memory usage started it's steady climb, the server was synchronising it's time from somewhere internal, though our client's IT contact has no idea where. Obviously, wherever it was coming from, was not a good clock, since the time was half an hour behind. We turned off this synchronisation, and now that I have checked it again this morning, the problem did not occur. So if the time on your system suddenly changes, apparently this causes problems for the garbage collector. At least that's what this implies to me.
As for why this wasn't occuring on any other parts of our system on this server (which are also written in Java), we probably simply haven't noticed, as they don't deal with as large a number of objects, and so they would never have hit their out of memory state.
I find this strange, since I would have thought that the invoking of the garbage collector would be completely related to memory usage, and not on the system time at all. Clearly, my understanding of how the garbage collector works is woefully inadequate.