ZuulProxy fails with “RibbonCommand timed-out and no fallback available” when it should do failover

后端 未结 1 1308
半阙折子戏
半阙折子戏 2021-01-30 18:45

Short description: I\'m trying to get a ZuulProxy to handle instance failover but it throws ZuulException: Forwarding error, instead of responding with a result

相关标签:
1条回答
  • 2021-01-30 19:01

    1/ TIMEOUT

    Zuul requests are monitored by Hystrix whose purpose (in that application) is to apply timeouts on long running requests.

    Hystrix provides two different ways to execute commands and enforce timeouts: SEMAPHORE and THREAD execution isolation.

    When THREAD isolation is used, Hystrix commands are executed on a separate thread from a thread pool. Hystrix then "pauses" the thread holding the incoming request until a response is received from the down stream server or a timeout occurs.

    When SEMAPHORE isolation is used, Hystrix commands are executed on the request thread. Timeouts are detected only after a response is received from the down stream server. So if you configure Zuul/Hystrix with a timeout of 5s and your service takes 30s to complete, your client will be notified of the timeout only after 30s - even if the service responded successfully (!)

    Netflix recommends THREAD execution by default except in some rare cases. Unfortunately, the SpringCloud Zuul integration changed it to SEMAPHORE for reasons unknown to me. See Why is ZUUL forcing a SEMAPHORE isolation to execute its Hystrix commands? for more information.

    This explains why you receive a 500 error although the remaining live server was successfully contacted.

    2/ RETRY

    Ribbon is used to make the actual call to remote service. It uses information provided by Eureka to determine the available services and the corresponding addresses. Eureka uses a local cache that is updated every 30 seconds. So as @spencergibb said, it is likely to hold obsolete information for a while (dead server) - but this is expected.

    Ribbon automatically retries when it fails to connect/contact a service. It can be configured to retry the same server a couple of time before trying another. I don't remember the default values nor the actual configuration property, but personally I have been using the following settings:

    # Max number of retries on the same server (excluding the first try)
    ribbon.maxAutoRetries = 1
    
    # Max number of next servers to retry (excluding the first server)
    ribbon.MaxAutoRetriesNextServer = 2
    

    3/ CONNECT TIMEOUT

    From your logs it appears it takes about 1s to fail the connect attempt to the remote service. This very long for a stopped service. Attempts to connect to a TCP port with no service listening should fail immediately (at least if the host/ip is reachable and the connect attempt doesn't end in the void)...

    The connect timeout is controlled by the following property - make sure you set it to a descent value:

    # Connect timeout used by Apache HttpClient
    ribbon.ConnectTimeout=3000
    
    # Read timeout used by Apache HttpClient
    ribbon.ReadTimeout=5000
    

    Hope this information helps you to troubleshoot your problem ;-)

    0 讨论(0)
提交回复
热议问题