Setting -XX:+DisableExplicitGC in production: what could go wrong?

前端 未结 3 1407
一个人的身影
一个人的身影 2020-12-08 09:48

we just had a meeting to address some performance issues in a web application that is used to calculate insurance rates. The calculations are implemented in a C/C++-module,

相关标签:
3条回答
  • 2020-12-08 10:19

    If you use -XX:+DisableExplicitGC and use CMS, you might want to use -XX:+CMSClassUnloadingEnabled as well to limit another reason for full GCs (i.e. the PermGen being full). Other than that, I haven't had problems using the option, though I've switched to using -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses, because my only cause of explicit GCs was RMI, not application code.

    0 讨论(0)
  • 2020-12-08 10:21

    I've been wrestling with this same issue, and based on all the information I've been able to find there definitely appears to be some risk. Per the comments on your original post from @millimoose, as well as https://bugs.openjdk.java.net/browse/JDK-6200079 , it appears that setting -XX:+DisableExplicitGC would be a bad idea if the NIO direct buffers are being used. It appears that they are being used in the internal implementation of the Websphere 8.5 app server which we're using. Here's the stack trace I was able to capture while debugging this:

    3XMTHREADINFO      "WebContainer : 25" J9VMThread:0x0000000006FC5D00, j9thread_t:0x00007F60E41753E0, java/lang/Thread:0x000000060B735590, state:R, prio=5
    3XMJAVALTHREAD            (java/lang/Thread getId:0xFE, isDaemon:true)
    3XMTHREADINFO1            (native thread ID:0x1039, native priority:0x5, native policy:UNKNOWN)
    3XMTHREADINFO2            (native stack address range from:0x00007F6067621000, to:0x00007F6067662000, size:0x41000)
    3XMCPUTIME               CPU usage total: 80.222215853 secs
    3XMHEAPALLOC             Heap bytes allocated since last GC cycle=1594568 (0x1854C8)
    3XMTHREADINFO3           Java callstack:
    4XESTACKTRACE                at java/lang/System.gc(System.java:329)
    4XESTACKTRACE                at java/nio/Bits.syncReserveMemory(Bits.java:721)
    5XESTACKTRACE                   (entered lock: java/nio/Bits@0x000000060000B690, entry count: 1)
    4XESTACKTRACE                at java/nio/Bits.reserveMemory(Bits.java:766(Compiled Code))
    4XESTACKTRACE                at java/nio/DirectByteBuffer.<init>(DirectByteBuffer.java:123(Compiled Code))
    4XESTACKTRACE                at java/nio/ByteBuffer.allocateDirect(ByteBuffer.java:306(Compiled Code))
    4XESTACKTRACE                at com/ibm/ws/buffermgmt/impl/WsByteBufferPoolManagerImpl.allocateBufferDirect(WsByteBufferPoolManagerImpl.java:706(Compiled Code))
    4XESTACKTRACE                at com/ibm/ws/buffermgmt/impl/WsByteBufferPoolManagerImpl.allocateCommon(WsByteBufferPoolManagerImpl.java:612(Compiled Code))
    4XESTACKTRACE                at com/ibm/ws/buffermgmt/impl/WsByteBufferPoolManagerImpl.allocateDirect(WsByteBufferPoolManagerImpl.java:527(Compiled Code))
    4XESTACKTRACE                at com/ibm/io/async/ResultHandler.runEventProcessingLoop(ResultHandler.java:507(Compiled Code))
    4XESTACKTRACE                at com/ibm/io/async/ResultHandler$2.run(ResultHandler.java:905(Compiled Code))
    4XESTACKTRACE                at com/ibm/ws/util/ThreadPool$Worker.run(ThreadPool.java:1864(Compiled Code))
    3XMTHREADINFO3           Native callstack:
    4XENATIVESTACK               (0x00007F61083DD122 [libj9prt26.so+0x13122])
    4XENATIVESTACK               (0x00007F61083EA79F [libj9prt26.so+0x2079f])
    ....
    

    Just what exactly the full ramifications are of setting -XX:+DisableExplicitGC when NIO direct byte buffers are being used isn't entirely clear to me yet (does this introduce a memory leak?), but there at least does appear to be some risk there. If you're using an app server other than Websphere you may want to verify that the app server itself isn't invoking System.gc() via NIO before disabling it. I've got a related question that will hopefully get some clarification on the exact impact on the NIO libraries here: Impact of setting -XX:+DisableExplicitGC when NIO direct buffers are used

    Incidentally, Websphere also seems to manually invoke System.gc() several times during the boot process, usually twice within the first couple seconds after the app server is launched, and a third time within the first 1-2 minutes (possibly when the application is being deployed). In our case, this is why we started investigating in the first place, as it appears that all the System.gc() calls are coming directly from the app server, and never from our application code.

    It should also be noted that in addition to the NIO libraries, the JDK internal implementation of RMI distributed garbage collection also calls System.gc(): Unexplained System.gc() calls due to Remote Method Invocation System.gc() calls by core APIs

    Whether enabling -XX:+DisableExplicitGC will also wreak havoc with RMI DGC is also a little unclear to me. The only reference I've been able to find that even addresses this is the first reference above, which states

    "However, in most cases regular GC activity is sufficient for effective DGC"

    That 'in most cases' qualifier sounds awfully wishy-washy to me, so again, it seems like there's at least some risk is just shutting off all System.gc() calls, and you'd be better off fixing the calls in your code if at all possible and only shutting them off entirely as a last resort.

    0 讨论(0)
  • 2020-12-08 10:36

    You are not alone in fixing stop-the-world GC events by setting the -XX:+DisableExplicitGC flag. Unfortunately (and in spite of the disclaimers in the documentation), many developers decide they know better than the JVM when to collect memory and introduce exactly this type of issue.

    I'm aware of many instances where the -XX:+DisableExplicitGC improved the production environment and zero instances where there were any negative side effects.

    The safe thing to do is to run your current production code, under load, with that flag set in a stress test environment and perform a normal QA cycle.

    If you cannot do that, I would suggest that the risk of setting the flag is less than the cost of not setting it in most cases.

    0 讨论(0)
提交回复
热议问题