问题
Here's my scenario (designed by my predecessor):
Two Apache servers serving reverse proxy duty for a number of mixed backend web servers (Apache, IIS, Tomcat, etc.). There are some sites for which we have multiple backend web servers, and in those cases, we do something like:
<Proxy balancer://www.example.com>
BalancerMember http://192.168.1.40:80
BalancerMember http://192.168.1.41:80
</Proxy>
<VirtualHost *:80>
ServerName www.example.com:80
CustomLog /var/log/apache2/www.example.com.log combined
<Location />
Order allow,deny
Allow from all
ProxyPass balancer://www.example.com/
ProxyPassReverse balancer://www.example.com/
</Location>
</VirtualHost>
So in this example, I've got one site (www.example.com) in the proxy servers' configs, and that site is proxied to one or the other of the two backend servers, 192.168.1.40 and .41.
I'm evaluating this to make sure that we are fault tolerant on all of our web services (I've already put the two reverse proxy servers into a shared IP cluster for this reason), and I want to make sure that the load-balanced backend servers are fault tolerant as well. But I'm having trouble figuring out if backend failure detection (and the logic to avoid the failed backend server) is built into the mod_proxy_balancer module...
So if 192.168.202.40 goes down, will Apache detect this (I'll understand if it takes a failed request first) and automatically route all requests to the other backend, 192.168.202.41? Or will it continue to balance requests between the failed backend and the operational backend?
I've found some clues in the Apache documentation for mod_proxy and mod_proxy_balancer that seem to indicate that failure can be detected ("maxattempts = Maximum number of failover attempts before giving up.", "failonstatus = A single or comma-separated list of HTTP status codes. If set this will force the worker into error state when the backend returns any status code in the list."), but after a few days of searching, I've found nothing conclusive saying for sure that it will (or at least "should") detect backend failure and recovery.
I will say that most of the search results reference using the AJP protocol to pass the traffic to the backend servers, and this apparently does support failure detection-- but my backends are a mixture of Apache, IIS, Tomcat and others, and I am fairly sure that many of them don't support AJP. They are also a mixture of Windows 2k3/2k8 and Linux (mostly Ubuntu Lucid) boxes running various different applications with various different requirements, so add-on modules like Backhand and LVS aren't an option for me.
I've also tried to empirically test this feature, by creating a new test site like this:
<Proxy balancer://test.example.com>
BalancerMember http://192.168.1.40:80
BalancerMember http://192.168.1.200:80
</Proxy>
<VirtualHost *:80>
ServerName test.example.com:80
CustomLog /var/log/apache2/test.example.com.log combined
LogLevel debug
<Location />
Order allow,deny
Allow from all
ProxyPass balancer://test.example.com/
ProxyPassReverse balancer://test.example.com/
</Location>
</VirtualHost>
Where 192.168.1.200 is a bogus address that isn't running any web server, to simulate a backend failure. The test site was served up without a problem for a bunch of different client machines, but even with the LogLevel set to debug, I didn't see anything logged to indicate that it detected that one of the backend servers was down... And I'd like to make 100% sure that I can take our load-balanced backends down for maintenance (one at a time, of course) without affecting production sites.
回答1:
http://httpd.apache.org/docs/2.4/mod/mod_proxy.html Section "BalancerMember parameters", property=retry:
If the connection pool worker to the backend server is in the error state, Apache httpd will not forward any requests to that server until the timeout expires. This enables [one] to shut down the backend server for maintenance, and bring it back online later. A value of 0 means always retry workers in an error state with no timeout.
However there are other failure conditions that wouldn't be caught using mod_whatever, for example, IIS backend running an application which is down. IIS is up so a connection can be made and a page can be read, it's just that the page will always be 500 internal server error. Here you will have to use failonerror to catch it and force the worker into an error state.
In all cases once the worker is in an error state traffic will not be directed to it. I've been trying different ways of consuming that first failure and retrying it but there always seems to be cases where an error page makes it back to the client.
回答2:
There is a property 'ping' in the 'BalancerMember parameters'
Reading the documentation it sounds like 'ping' set to 500ms will send a request before mod_proxy directs you to a BalancerMember. mod_proxy will wait 500ms for a response from a BalancerMember, and if mod_proxy doen't get a response it will but the BalancerMember into an error state.
I tired implementing this but it did not appear to help with directing to a live BalancerMember.
<Proxy balancer://APICluster>
BalancerMember https://api01 route=qa-api1 ttl=5 ping=500ms
BalancerMember https://api02 route=qa-api2 ttl=5 ping=500ms
ProxySet lbmethod=bybusyness stickysession=ROUTEID
</Proxy>
http://httpd.apache.org/docs/2.4/mod/mod_proxy.html
Ping property tells the webserver to "test" the connection to the backend before forwarding the request. For AJP, it causes mod_proxy_ajp to send a CPING request on the ajp13 connection (implemented on Tomcat 3.3.2+, 4.1.28+ and 5.0.13+). For HTTP, it causes mod_proxy_http to send a 100-Continue to the backend (only valid for HTTP/1.1 - for non HTTP/1.1 backends, this property has no effect). In both cases, the parameter is the delay in seconds to wait for the reply. This feature has been added to avoid problems with hung and busy backends. This will increase the network traffic during the normal operation which could be an issue, but it will lower the traffic in case some of the cluster nodes are down or busy. By adding a postfix of ms, the delay can be also set in milliseconds.
来源:https://stackoverflow.com/questions/11868988/apache-proxy-load-balancing-backend-server-failure-detection