I\'m testing a Jetty-based API vs a Netty-based one. With the only difference in the experiment being which API I use (same application, same servers, same memory config, sa
Not all STW pauses - the mechanism used to trigger them is called a safepoint - are caused by the GC, use -XX:+PrintSafepointStatistics –XX:PrintSafepointStatisticsCount=1
to print other safepoint causes.
Secondly, if the pauses are caused by GC then the lines you pasted themselves do not contain the cause, but an adjacent block from the GC log should contain the cause, something like [GC pause (G1 Evacuation Pause) (young), 0.0200285 secs]
Additionally you may also want to monitor disk IO latency and match timestamps to safepoint pauses. Any Sync IO or paging happening during safepoints that goes to slow storage might stall the entire safepoint. Putting logfiles and /tmp
on a tmpfs or SSDs may help there.
To add some closure to this: The issue was that this was not, technically, a GC pause; it was a combination of several factors:
Other parts of our application reached the EBS throttling threshold, and when the JVM tried to write to /tmp during a STW, all threads on the JVM became queued behind the AWS throttling point.
It seems the Netty/Jetty difference was a red herring.
We need our application to survive in this kind of environment, so our solution was to disable this JVM behavior, at the cost of loosing support from several JVM tools we added:
-XX:+PerfDisableSharedMem
More info on this issue from this excellent blog post: http://www.evanjones.ca/jvm-mmap-pause.html