GC Tuning - preventing a Full GC

后端 未结 5 386
清歌不尽
清歌不尽 2021-01-30 14:37

I\'m trying to avoid the Full GC (from gc.log sample below) running a Grails application in Tomcat in production. Any suggestions on how to better configure the GC?

5条回答
  •  醉话见心
    2021-01-30 14:52

    The log snippet posted shows you have a substantial number of objects that are live for >320s (approx 40s per young collection and objects survive through 8 collections before promotion). The remaining objects then bleed into tenured and eventually you hit an apparently unexpected full gc which doesn't actually collect very much.

    3453285K->3099828K(4194304K)

    i.e. you have a 4G tenured which is ~82% full (3453285/4194304) when it is triggered and is ~74% full after 13 long seconds.

    This means it took 13s to collect the grand total of ~350M which, in the context of a 6G heap is not v much.

    This basically means your heap is not big enough or, perhaps more likely, you have a memory leak. A leak like this is a terrible thing for CMS because a concurrent tenured collection is a non compacting event which means tenured is a collection of free lists which means fragmentation can be a big problem for CMS which means that your utilisation of tenured becomes increasingly inefficient which means that there is an increased probability of promotion failure events (though if this were such an event then I'd expect to see a log message saying that) because it wants to promote (or thinks it will need to promote) X MB into tenured but it does not have a (contiguous) free list >= X MB available. This triggers an unexpected tenured collection which is a not remotely concurrent STW event. If you actually have v little to collect (as you do) then there is no surprise you're sitting twiddling your thumbs.

    Some general pointers, to a large extent reiterating what Vladimir Sitnitov has said...

    • using iCMS on a multicore box makes no sense (unless you have lots of JVMs or other processes running on that box such that the JVM really is short of CPU) therefore remove this switch
    • your young collections are unnecessarily long because of the impact of copying relatively substantial quantities of memory between the survivor spaces on every collection, 150-200ms is a really quite massive ParNew collection
      • the right answer to the young gen issue depends on what the allocation behaviour really is (e.g. perhaps you'd be better off tenuring early and reducing the impact of fragmentation on tenured collections OR perhaps you'd be better off having a much more massive new gen and reducing the frequency of young gen collections such that fewer objects are promoted so that there is minimal bleed into tenured).

    Some questions...

    • does it eventually go OoM or does it recover?
    • is the application in a steady state (subject to consistent load at some point well beyond startup) during this log snippet or is it under stress?

提交回复
热议问题