I have a grpc client pointing to a url which resolves to 2 IP addresses. The problem is when one server node goes down and then gets back, it's not picked by the grpc client and all the load goes to a single node.
I tried recommendation to change networkaddress.cache.ttl
propetty but it didn't help.
My code (in Scala)
java.security.Security.setProperty("networkaddress.cache.ttl", "30")
System.setProperty("networkaddress.cache.ttl", "30")
val channel = NettyChannelBuilder.forAddress(host, port).nameResolverFactory(
new DnsNameResolverProvider).usePlaintext().build
val client = MyServiceGrpc.newStub(channel)
grpc version: 1.32.1
Assuming that DNS returns both IPs all the time (probably shuffled), then the problem is not the DNS cache. The problem is that gRPC has a working connection and so won't choose to reconnect and won't perform DNS queries.
You should configure your server with MAX_CONNECTION_AGE to force clients to reconnect occasionally to rebalance the load. When clients are disconnected from the server they trigger a new DNS resolution, so this can also be used to find new addresses (although reconnections do not wait for the DNS resolution to complete).
is available via NettyServerBuilder.maxConnectionAge():
.maxConnectionAge(30, TimeUnit.MINUTES)
You want to use as large of age as you can accept. With a time like 30 minutes, then each client will rebalance every 30 minutes. So after 15 minutes of the server restarting that server would have ¼ of the load and after 30 minutes it would have roughly ½.
Seems that configuring load-balancing policy does the job:
NettyChannelBuilder.forAddress(host, port).defaultLoadBalancingPolicy("round_robin").usePlaintext().build()