jHiccup analysis doesn't add up

前端 未结 1 2065
一整个雨季
一整个雨季 2021-02-06 00:12

I have the following jHiccup result.

\"jHiccup

Obviously there are huge peaks of few secs

相关标签:
1条回答
  • 2021-02-06 00:33

    Glad to see you are using jHiccup, and that it seems to show reality-based hiccups.

    jHiccup observes "hiccups" that would also be seen by application threads running on the JVM. It does not glean the reason - just reports the fact. Reasons can be anything that would cause a process to not run perfectly ready-to-run code: GC pauses are a common cause, but a temporary ^Z at the keyboard, or one of those "live migration" things across virtualized hosts would be observed just as well.. There are a multitude of possible reasons, including scheduling pressure at the OS or hypervisor level (if one exists), power management craziness, swapping, and many others. I've seen Linux file system pressure and Transparent Huge Page "background" defragmentation cause multi-second hiccups as well...

    A good first step at isolating the cause of the pause is to use the "-c" option in jHiccup: It launches a separate control process (with an otherwise idle workload). If both your application and the control process show hiccups that are roughly correlated in size and time, you'll know you are looking for a system-level (as opposed to process-local) reason. If they do not correlate, you'll know to suspect the insides of your JVM - which most likely indicates your JVM paused for something big; either GC or something else, like a lock debiasing or a class-loading-deriven-deoptimization which can take a really long [and often unreported in logs] time on some JVMs if time-to-safepoint is long for some reason (and on most JVMs, there are many possible causes for a long time-to-safepoint).

    jHiccup's measurement is so dirt-simple that it's hard to get wrong. The entire thing is less than 650 lines of java code, so you can look at the logic for yourself. jHiccup's HiccupRecorder thread repeatedly goes to sleep for 1msec, and when it wakes up it records any difference in time (from before the sleep) that is greater that 1msec as a hiccup. The simple assumption is that if one ready-to-run thread (the HiccupRecorder) did not get to run for 5 seconds, other threads in the same process also saw a similar sized hiccup.

    As you note above, jHiccups observations seem to be corroborated in your independent network logs, where you saw a 5 seconds response time, Note that not all hiccups would have been observed by the network logs, as only requests actually made during the hiccups would have been observed by a network logger. In contrast, no hiccup larger than ~1msec can hide from jHiccup, since it will attempt a wakeup 1,000 times per second even with no other activity.

    This may not be GC, but before you rule out GC, I'd suggest you look into the GC logging a bit more. To start with, a JVM hint to limit pauses to 200msec is useless on all known JVMs. A pause hint is the equivalent of saying "please". In addition, don't believe your GC logs unless you include -XX:+PrintGCApplicationStoppedTime in options (and Suspect them even then). There are pauses and parts of pauses that can be very long and go unreported unless you include this flag. E.g. I've seen pauses caused by the occasional long running counted loop taking 15 seconds to reach a safe point, where GC only reported only the .08 seconds part of the pause where it actually did some work. There are also plenty of pauses whose causes that are not considered part of "GC" and can thereby go unreported by GC logging flags.

    -- Gil. [jHiccup's author]

    0 讨论(0)
提交回复
热议问题