问题
I have a zuul gateway application, that receives requests from a client app and forwards the requests using load balanced rest template to a micro service with 2 endpoints, say endpoint1 and endpoint2 (load balancing between the two end points in round robbin which is okay for now, although I want it to be availability based).
Here are the issues I am facing -
- I brought down one of the end points, say endpoint2 and tried calling the zuul route and I see that when the request is going to the endpoint2 - zuul takes 2 mins or so before failing with HTTP 503 and does not retry on the next request. the error is just cascaded back to the caller.
- Also, even after setting the read time out and connect timeout configurations, I don't see ribbon respecting the configuration and still takes 2 mins to throw the error from the server.
- I tried enabling logs at the netflix package level, but I am unable to see logs unless I pass a custom http client to rest template.
I am new to netflix stack of components... please advise if I am missing something obvious. Thanks
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.mycomp</groupId>
<artifactId>zuul-gateway</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>zuul-gateway</name>
<description>Spring Boot Zuul</description>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>1.5.9.RELEASE</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<java.version>1.8</java.version>
<spring-cloud.version>Edgware.SR1</spring-cloud.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-zuul</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-eureka</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-netflix-ribbon</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-lambda</artifactId>
<version>1.11.242</version>
</dependency>
<dependency>
<groupId>org.codehaus.groovy</groupId>
<artifactId>groovy-all</artifactId>
<version>2.3.10</version>
</dependency>
<dependency>
<groupId>com.netflix.netflix-commons</groupId>
<artifactId>netflix-commons-util</artifactId>
<version>0.1.1</version>
</dependency>
<dependency>
<groupId>org.springframework.retry</groupId>
<artifactId>spring-retry</artifactId>
</dependency>
</dependencies>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>${spring-cloud.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
and my application.yml looks like below -
eureka:
client:
healthcheck:
enabled: true
lease:
duration: 5
service-url:
defaultZone: http://localhost:8761/eureka/
ingestWithOutEureka:
ribbon:
eureka:
enabled: false
NIWSServerListClassName: com.netflix.loadbalancer.ConfigurationBasedServerList
listOfServers: http://demo-nlb-6a67d59c901ecd128.elb.us-west-2.amazonaws.com,http://demo-nlb-124321w2a123ecd128.elb.us-west-2.amazonaws.com
okToRetryOnAllOperations: true
ConnectTimeout: 500
ReadTimeout: 1000
MaxAutoRetries: 5
MaxAutoRetriesNextServer: 5
MaxTotalHttpConnections: 500
MaxConnectionsPerHost: 100
retryableStatusCodes: 404,503
okhttp:
enabled: true
zuul:
debug:
request: true
parameter: true
ignored-services: '*'
routes:
ingestServiceELB:
path: /ingestWithoutEureka/ingest/**
retryable: true
url: http://dummyURL
management.security.enabled : false
spring:
application:
name: zuul-gateway
cloud:
loadbalancer:
retry:
enabled: true
logging:
level:
org:
apache:
http: DEBUG
com:
netflix: DEBUG
hystrix:
command:
default:
execution:
isolation:
strategy: THREAD
thread:
timeoutInMilliseconds: 60000
and my application class looks like below
@SpringBootApplication
@EnableZuulProxy
@EnableDiscoveryClient
public class ZuulGatewayApplication {
@Bean
public InterceptionFilter addInterceptionFilter() {
return new InterceptionFilter();
}
public static void main(String[] args) {
SpringApplication.run(ZuulGatewayApplication.class, args);
}
}
and lastly my zuul filter looks like below - package com.zuulgateway.filter;
import com.netflix.zuul.ZuulFilter;
import com.netflix.zuul.context.RequestContext;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.cloud.client.discovery.DiscoveryClient;
import org.springframework.cloud.client.loadbalancer.LoadBalanced;
import org.springframework.cloud.client.loadbalancer.LoadBalancerClient;
import org.springframework.web.client.RestTemplate;
import javax.servlet.http.HttpServletRequest;
import java.io.IOException;
import java.util.stream.Collectors;
public class InterceptionFilter extends ZuulFilter{
private static final String REQUEST_PATH = "/ingestWithoutEureka";
@LoadBalanced
@Bean
RestTemplate loadBalanced() {
//RestTemplate restTemplate = new RestTemplate(new HttpComponentsClientHttpRequestFactory());
RestTemplate restTemplate = new RestTemplate();
return restTemplate;
}
@Autowired
@LoadBalanced
private RestTemplate loadBalancedRestTemplate;
@Override
public String filterType() {
return "route";
}
@Override
public int filterOrder() {
return 0;
}
@Override
public boolean shouldFilter() {
RequestContext context = RequestContext.getCurrentContext();
HttpServletRequest request = context.getRequest();
String method = request.getMethod();
String requestURI = request.getRequestURI();
return requestURI.startsWith(REQUEST_PATH);
}
@Override
public Object run() {
RequestContext ctx = RequestContext.getCurrentContext();
try {
String requestPayload = ctx.getRequest().getReader().lines().collect(Collectors.joining(System.lineSeparator()));
String response = loadBalancedRestTemplate.postForObject("http://ingestWithOutEureka/ingest", requestPayload, String.class);
ctx.setResponseStatusCode(200);
ctx.setResponseBody(response);
} catch (IOException e) {
ctx.setResponseStatusCode(500);
ctx.setResponseBody("{ \"error\" : " + e.getMessage() + " }");
System.out.println("Exception during feign call - " + e.getMessage());
e.printStackTrace();
} finally {
ctx.setSendZuulResponse(false);
ctx.getResponse().setContentType("application/json");
}
return null;
}
}
回答1:
So, here are the solutions that worked for me -
Issue 1 - Retry was not working in spite of configuring ribbon.<client>.OkToRetryOnAllOperations: true
. Ribbon was clearly ignoring my configuration.
Solution: - It's strange, but after some debugging I had noticed that Ribbon was picking up client level configuration only if a global configuration was present in the first place.
Once I set the global "OkToRetryOnAllOperations" as either "true" or "false" as shown below, ribbon started picking up the ribbon.<client>.OkToRetryOnAllOperations
as expected and I could see the retries happening.
ribbon:
OkToRetryOnAllOperations: false
Issue 2 - Also, even after setting the read time out and connect timeout configurations, I don't see ribbon respecting the configuration and still takes 2 mins to throw the error from the server
Solution 2 - Though ribbon started retrying requests after the changes suggested in solution 1 above, I did not see ribbon honoring <client>.ribbon.ReadTimeout
and <client>.ribbon.ConnectTimeout
.
After spending some time, I figure that this is because of using RestTemplate
.
While spring documentation mentions that you could use load balanced RestTemplate
for achieving retries, it does not mention that the timeouts wont work with it. Based on this SO answer from 2014, it looks like while ribbon has been added as a interceptor when using load balanced RestTemplate
to achieve serviceId to URI resolution, ribbon does not use the underlying HTTP client and uses the http client provided by the RestTemplate
. Thus, the ribbon specific <client>.ribbon.ReadTimeout
and <client>.ribbon.ConnectTimeout
are NOT honored. After I added timeouts to RestTemplate
, requests have started timing out at expected intervals.
Lastly, Issue 3 - I enabled logs by passing a custom http client to rest template.
@LoadBalanced
@Bean
RestTemplate loadBalanced() {
RestTemplate restTemplate = new RestTemplate(new HttpComponentsClientHttpRequestFactory());
System.out.println("returning load balanced rest client");
((HttpComponentsClientHttpRequestFactory)restTemplate.getRequestFactory()).setReadTimeout(1000*30);
((HttpComponentsClientHttpRequestFactory)restTemplate.getRequestFactory()).setConnectTimeout(1000*3);
((HttpComponentsClientHttpRequestFactory)restTemplate.getRequestFactory()).setConnectionRequestTimeout(1000*3);
return restTemplate;
}
@Bean
LoadBalancedBackOffPolicyFactory backOffPolicyFactory() {
return new LoadBalancedBackOffPolicyFactory() {
@Override
public BackOffPolicy createBackOffPolicy(String service) {
return new ExponentialBackOffPolicy();
}
};
}
With all the changes I see that the request retries are happening and with the timeouts and with exponential backoff and with request / responses logs visible as expected. Good luck!
来源:https://stackoverflow.com/questions/48584013/ribbon-retry-properties-not-respected