Here\'s my scenario (designed by my predecessor):
Two Apache servers serving reverse proxy duty for a number of mixed backend web servers (Apache, IIS, Tomcat, etc.). T
There is a property 'ping' in the 'BalancerMember parameters'
Reading the documentation it sounds like 'ping' set to 500ms will send a request before mod_proxy directs you to a BalancerMember. mod_proxy will wait 500ms for a response from a BalancerMember, and if mod_proxy doen't get a response it will but the BalancerMember into an error state.
I tired implementing this but it did not appear to help with directing to a live BalancerMember.
<Proxy balancer://APICluster>
BalancerMember https://api01 route=qa-api1 ttl=5 ping=500ms
BalancerMember https://api02 route=qa-api2 ttl=5 ping=500ms
ProxySet lbmethod=bybusyness stickysession=ROUTEID
</Proxy>
http://httpd.apache.org/docs/2.4/mod/mod_proxy.html
Ping property tells the webserver to "test" the connection to the backend before forwarding the request. For AJP, it causes mod_proxy_ajp to send a CPING request on the ajp13 connection (implemented on Tomcat 3.3.2+, 4.1.28+ and 5.0.13+). For HTTP, it causes mod_proxy_http to send a 100-Continue to the backend (only valid for HTTP/1.1 - for non HTTP/1.1 backends, this property has no effect). In both cases, the parameter is the delay in seconds to wait for the reply. This feature has been added to avoid problems with hung and busy backends. This will increase the network traffic during the normal operation which could be an issue, but it will lower the traffic in case some of the cluster nodes are down or busy. By adding a postfix of ms, the delay can be also set in milliseconds.
http://httpd.apache.org/docs/2.4/mod/mod_proxy.html Section "BalancerMember parameters", property=retry:
If the connection pool worker to the backend server is in the error state, Apache httpd will not forward any requests to that server until the timeout expires. This enables [one] to shut down the backend server for maintenance, and bring it back online later. A value of 0 means always retry workers in an error state with no timeout.
However there are other failure conditions that wouldn't be caught using mod_whatever, for example, IIS backend running an application which is down. IIS is up so a connection can be made and a page can be read, it's just that the page will always be 500 internal server error. Here you will have to use failonerror to catch it and force the worker into an error state.
In all cases once the worker is in an error state traffic will not be directed to it. I've been trying different ways of consuming that first failure and retrying it but there always seems to be cases where an error page makes it back to the client.