Currently I am running a load test using JMeter on our system build on grails 3 running on tomcat. After sending 20k request per second I got “no live upstreams while connec
For me, the issue was with my proxy_pass entry. I had
location / {
...
proxy_pass http://localhost:5001;
}
This caused the upstream request to use the IP4 localhost IP or the IP6 localhost IP, but every now and again, it would use the localhost DNS without the port number resulting in the upstream error as seen in the logs below.
[27/Sep/2018:16:23:37 +0100] <request IP> - - - <requested URI> to: [::1]:5001: GET /api/hc response_status 200
[27/Sep/2018:16:24:37 +0100] <request IP> - - - <requested URI> to: 127.0.0.1:5001: GET /api/hc response_status 200
[27/Sep/2018:16:25:38 +0100] <request IP> - - - <requested URI> to: localhost: GET /api/hc response_status 502
[27/Sep/2018:16:26:37 +0100] <request IP> - - - <requested URI> to: 127.0.0.1:5001: GET /api/hc response_status 200
[27/Sep/2018:16:27:37 +0100] <request IP> - - - <requested URI> to: [::1]:5001: GET /api/hc response_status 200
As you can see, I get a 502 status for "localhost:"
Changing my proxy_pass to 127.0.0.1:5001 means that all requests now use IP4 with a port.
This StackOverflow response was a big help in finding the issue as it detailed changing the log format to make it possible to see the issue.
I saw such behavior many times during perf. tests.
Under heavy workload the performance of your upstream server(s) may not be enough and upstream module may mark upstream server(s) as unavailable.
The relevant parameters (server directive) are:
max_fails=number
sets the number of unsuccessful attempts to communicate with the server that should happen in the duration set by the fail_timeout
parameter to consider the server unavailable for a duration also set by the fail_timeout
parameter. By default, the number of unsuccessful attempts is set to 1. The zero value disables the accounting of attempts. What is considered an unsuccessful attempt is defined by the proxy_next_upstream
, directives.
fail_timeout=time
sets:
the time during which the specified number of unsuccessful attempts to communicate with the server should happen to consider the server unavailable;
and the period of time the server will be considered unavailable.
By default, the parameter is set to 10 seconds.