How is “app server” time related to “browser time” and “transaction time” in newrelic?

前端 未结 1 886
别跟我提以往
别跟我提以往 2021-02-11 08:21

I\'m monitoring a PHP app with NewRelic, and I\'m very confused about some of the numbers shown in the overview of my application.

My app consists of a PHP webapp, that

1条回答
  •  梦如初夏
    2021-02-11 08:52

    Well, I finally figured this out :) The key concept I was missing here was "percentiles". Let me explain a little bit.

    In my question, I mentioned I was getting average response times of 1560ms, which didn't seemed to make sense given the fact that our backend always has to process for about 15 secs to produce a response. The following picture is what I'm getting in the "overview" of my webapp.

    web transactions average times

    As you can see, average time responses don't seem to be that bad. However, I'm also seeing Transactions that take up to 15 secs.

    Following, if you expand the "Web Transactions response time" selector, and select the percentage sign ("%"), you will get the "Percentiles" graph. Mine is as follows:

    web transaction percentiles

    In this new graph:

    • The green line represents the average response time, which corresponds to the green area of the first graph. Here we see that in fact it states transactions take an average of under 2 secs to complete. So far so good.
    • The orange-ish line, that corresponds to the "95%". This is the key to understanding how all this numbers come together. This "95%" corresponds to the "95th percentile" of your requests. This means that 95% of your requests take less than this time. But of course it also means 5% of your requests are taking more than that!
    • The blue line, corresponding to the "99%" or "99th percentile" of your requests, this meaning that 99% of your requests are taking less than this line, but again, 1% is taking more.
    • The red line, corresponding to the "median" which if fact is a synonym for "50%" or "50th percentile". A this point you can imagine what this is: 50% of your requests are taking less than this time, and another 50% is taking more (hence the name "median"). Note that is interesting this measure is considerably different from the "average" notion, because average sums up all times and divides by the total number of transactions, thus hiding in the high volume of the sample, those transactions that are on the extremes of the sampled times.

    Now, it all begins to make sense. My average requests are in fact taking no more than 2 secs. But I have so many requests that are extremely fast (those below the red line), that those taking the incredible amount of time of 15 secs are not noticeable in the average. Those are evident only when you look at the long-tail of your sampled requests, ie. the 95th and 99th percentiles.

    To wrap it up, this can be confirmed selecting the "histogram" option in the graph. Mine is as follows:

    web transactions histogram

    Notice the vast majority of request take under 200ms, but we have also a 8.29% of transactions taking more than 7 secs to complete (and if we could scroll to the right of the histogram, we would find that in fact the request taking more than 15 secs are in the last 5% and 1%, because of the percentiles analysis we did before).

    (This article pointed me in the right direction: https://blog.newrelic.com/2013/10/23/histograms-percentiles-new-relic-style/)

    This had me disoriented for a long time, hope it helps someone!

    0 讨论(0)
提交回复
热议问题