Twitter - twemproxy - memcached - Retry not working as expected

问题

Simple setup:

1 node running twemproxy (vcache:22122)
2 nodes running memcached (vcache-1, vcache-2) both listening on 11211

I have the following twemproxy config:

default:
  auto_eject_hosts: true
  distribution: ketama
  hash: fnv1a_64
  listen: 0.0.0.0:22122
  server_failure_limit: 1
  server_retry_timeout: 600000 # 600sec, 10m
  timeout: 100
  servers:
    - vcache-1:11211:1
    - vcache-2:11211:1

The twemproxy node can resolve all hostnames. As part of testing I took down vcache-2. In theory for every attempt to interface with vcache:22122, twemproxy will contact a server from the pool to facilitate the attempt. However, if one of the cache nodes is down, then twemproxy is supposed to "auto eject" it from the pool, so subsequent requests will not fail.

It is up to the app layer to determine if a failed interface attempt with vcache:22122 was due to infrastructure issue, and if so, try again. However I am finding that on the retry, the same failed server is being used, so instead of subsequent attempts being passed to a known good cache node (in this case vcache-1) they are still being passed to the ejected cache node (vcache-2).

Here's the php code snippet which attempts the retry:

....

// $this is a Memcached object with vcache:22122 in the server list

$retryCount = 0;

do {

    $status = $this->set($key, $value, $expiry);

    if (Memcached::RES_SUCCESS === $this->getResultCode()) {

        return true;
    }


} while (++$retryCount < 3);

return false;

-- Update --

Link to Issue opened on Github for more info: Issue #427

回答1:

I can't see anything wrong with your configuration. As you know the important settings are in place:

default:
  auto_eject_hosts: true
  server_failure_limit: 1

The documentation suggests connection timeouts might be an issue.

Relying only on client-side timeouts has the adverse effect of the original request having timedout on the client to proxy connection, but still pending and outstanding on the proxy to server connection. This further gets exacerbated when client retries the original request.

Is your PHP script closing the connection and retrying before twemproxy failed its first attempt and removed the server from the pool? Perhaps adding a timeout value in the twemproxy lower than the connection timeout used in PHP solves the issue.

From your discussion on Github though it sounds like support for healthcheck, and perhaps auto ejection, aren't stable in twemproxy. If you're building against old packages you might be better to find a package which has been stable for some time. Is mcrouter (with interesting article) suitable?