I\'m trying to avoid the Full GC (from gc.log sample below) running a Grails application in Tomcat in production. Any suggestions on how to better configure the GC?
I'm serving requests and expect that beyond a certain amount of shared objects, every other objects are useful only to the request at hand. That's the theory, but any kind of caches can easily void that assumption and create objects that live beyond the request.
As others have noted neither your huge young generation nor the extended tenuring seems to work.
You should profile your application and analyze the age-distribution of objects. I'm pretty sure Grails caches all kinds of things beyond the scope of a request and that's what leaks into the old gen.
What you're essentially trying is to sacrifice the young generation pause times (for a young gen of 2GB) to postpone the inevitable - an old gen collection of 6GB. This is not exactly a good tradeoff you're making there.
Instead you probably should aim for better young gen pause times and allow CMS to burn more CPU time: more conrrent-phase GC threads (can't remember the option for that one), higher GCTimeRatio
, a MaxGCPauseMillis
> MaxGCMinorPauseMillis
to take pressure of the minor collections and allow them to meet their pause goals instead of having to resize to fit the major collection limit.
To make major GCs less painful you might want to read this: http://blog.ragozin.info/2012/03/secret-hotspot-option-improving-gc.html (this patch should be in j7u4). CMSParallelRemarkEnabled
should be enabled too, not sure if this is the default.
Personally I have some horrible experiences with G1GC working itself into a corner due to some very large LRU-like workloads and then falling back to a large, stop-the-world collection far more often than CMS experienced concurrent mode failures for the same workload.
But for other workloads (like yours) it might actually do the job and collect the old generation incrementally, while also compacting and thus avoiding any big pauses.
Give it a try if you haven't already. Again, update to the newest java7 before you do so, G1 still has some issues with its heuristics that they're trying to iron out.
Edit: Oracle has improved G1GC's heuristics and some bottlenecks since I have written this answer. It should definitely be worth a try now.
As you're already using a parallel collector for a 2GB young gen and get away with 200ms pause times... why not try the parallel old gen collector on your 6G heap? It would probably take less than the 10s+ major collections you're seeing with CMS. Whenever CMS runs into one of its failure modes it does a single-threaded, stop-the-world collection.