dubbo 集群容错 | 易学教程

在收到提供者执行的结果时，当结果处理失败时，需要对其进行处理。

在Reference中，返回的Invoker是根据对应的容错机制生成的Invoker

<dubbo:reference id="testService"  interface="com.test.ITestService"  cluster="failfast"/>

failover cluster 失败的时候自动切换并重试其他服务器。通过retries=2。来设置重试次数（默认）
failfast cluster 快速失败，只发起一次调用 ; 写操作。比如新增记录的时候，非幂等请求
failsafe cluster 失败安全。出现异常时，直接忽略异常 – 写日志
failback cluster 失败自动恢复。后台记录失败请求，定时重发
forking cluster 并行调用多个服务器，只要一个成功就返回。只能应用在读请求
broadcast cluster 广播调用所有提供者，逐个调用。.3其中一台报错就会返回异常

@SPI(FailoverCluster.NAME)
public interface Cluster {
    @Adaptive
    <T> Invoker<T> join(Directory<T> directory) throws RpcException;
}

FailoverCluster

当出现失败，重试其它服务器，通常用于读操作，但重试会带来更长延迟。默认失败重试次数是2，加上正常的那一次，调用总数是3次。

public class FailoverCluster implements Cluster {

    public final static String NAME = "failover";

    public <T> Invoker<T> join(Directory<T> directory) throws RpcException {
        return new FailoverClusterInvoker<T>(directory);
    }
}

public class FailoverClusterInvoker<T> extends AbstractClusterInvoker<T> {

  public Result doInvoke(Invocation invocation, 
    final List<Invoker<T>> invokers, LoadBalance loadbalance) t {
  	  List<Invoker<T>> copyinvokers = invokers;
  	  //检查属性
  	//默认次数是2次+1
      int len = getUrl().getMethodParameter(invocation.getMethodName(), 
        "retries", 2) + 1;
      if (len <= 0) {
          len = 1;
      }
      // retry loop.
      RpcException le = null; // last exception.
      List<Invoker<T>> invoked = new ArrayList<Invoker<T>>(copyinvokers.size()); 
      Set<String> providers = new HashSet<String>(len);
      //循环重试次数
      for (int i = 0; i < len; i++) {
      	//重试时，进行重新选择，避免重试时invoker列表已发生变化.
      	//注意：如果列表发生了变化，那么invoked判断会失效，因为invoker实例已经改变
      	if (i > 0) {
      	    //检查是否已经注销
      		checkWheatherDestoried();
      		//重新获取invoke列表（避免重试时invoker列表已发生变化.）
      		copyinvokers = list(invocation);
      		//重新检查一下
      		checkInvokers(copyinvokers, invocation);
      	}
      	 //选择对应的invoker
         Invoker<T> invoker = select(loadbalance, invocation, 
           copyinvokers, invoked);
         invoked.add(invoker);
         RpcContext.getContext().setInvokers((List)invoked);
         try {
             //执行
             Result result = invoker.invoke(invocation);
             return result;
         } finally {
             providers.add(invoker.getUrl().getAddress());
         }
     }
     throw new RpcException("");
  }

}

FailfastClusterInvoker
快速失败，只发起一次调用，失败立即报错，通常用于非幂等性的写操作。

public class FailfastClusterInvoker<T> extends AbstractClusterInvoker<T>{

  public FailfastClusterInvoker(Directory<T> directory) {
      super(directory);
  }
  
  public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, 
  LoadBalance loadbalance) throws RpcException {
      checkInvokers(invokers, invocation);
      //选择invoker,执行，出现异常直接抛出
      Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
      try {
          return invoker.invoke(invocation);
      } catch (Throwable e) {
          throw new RpcException("");
      }
  }
}

BroadcastClusterInvoker
广播调用所有提供者，逐个调用。其中一台报错就会返回异常

 public Result doInvoke(final Invocation invocation, List<Invoker<T>> 
    invokers, LoadBalance loadbalance) throws RpcException {
    checkInvokers(invokers, invocation);
    RpcContext.getContext().setInvokers((List)invokers);
    RpcException exception = null;
    Result result = null;
    //循环调用所有的提供者，只要出现异常，就抛出
    for (Invoker<T> invoker: invokers) {
        try {
            result = invoker.invoke(invocation);
        } catch (RpcException e) {
            exception = e;
            logger.warn(e.getMessage(), e);
        } catch (Throwable e) {
            exception = new RpcException(e.getMessage(), e);
            logger.warn(e.getMessage(), e);
        }
    }
    if (exception != null) {
        throw exception;
    }
    return result;
}

FailbackClusterInvoker
失败自动恢复，后台记录失败请求，定时重发，通常用于消息通知操作。

  //忽视异常，放入到列表中，定时等待重试
 protected Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, 
    LoadBalance loadbalance) throws RpcException {
     try {
         checkInvokers(invokers, invocation);
         Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
         return invoker.invoke(invocation);
     } catch (Throwable e) {
         addFailed(invocation, this);
         return new RpcResult(); 
     }
 }
 //启动定时，每隔5秒调用
private void addFailed(Invocation invocation, AbstractClusterInvoker<?> router) {
  if (retryFuture == null) {
      synchronized (this) {
          if (retryFuture == null) {
              retryFuture = 
              scheduledExecutorService.scheduleWithFixedDelay(new Runnable() {
                  public void run() {
                      // 收集统计信息
                      try {
                          retryFailed();
                      } catch (Throwable t) { // 防御性容错
                      }
                  }
              }, 5 * 1000, 5 * 1000, TimeUnit.MILLISECONDS);
          }
      }
  }
  failed.put(invocation, router);
}

void retryFailed() {
    if (failed.size() == 0) {
        return;
    }
    //循环调用出现异常的invoker
    for (Map.Entry<Invocation, AbstractClusterInvoker<?>> entry : new 
    HashMap<Invocation, AbstractClusterInvoker<?>>(failed).entrySet()) {
        Invocation invocation = entry.getKey();
        Invoker<?> invoker = entry.getValue();
        try {
            invoker.invoke(invocation);
            failed.remove(invocation);
        } catch (Throwable e) {
        }
    }
}

FailsafeClusterInvoker
失败安全，出现异常时，直接忽略，通常用于写入审计日志等操作。

 public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, 
 LoadBalance loadbalance) throws RpcException {
    try {
        checkInvokers(invokers, invocation);
        Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
        return invoker.invoke(invocation);
    } catch (Throwable e) {
        return new RpcResult(); // ignore
    }
}

ForkingClusterInvoker
并行调用，只要一个成功即返回，通常用于实时性要求较高的操作，但需要浪费更多服务资源。

public Result doInvoke(final Invocation invocation, List<Invoker<T>> 
  invokers, LoadBalance loadbalance) throws RpcException {
    checkInvokers(invokers, invocation);
    final List<Invoker<T>> selected;
    final int forks = getUrl().getParameter("forks",2);
    final int timeout = getUrl().getParameter("timeout", 1000);
    if (forks <= 0 || forks >= invokers.size()) {
        selected = invokers;
    } else {
        selected = new ArrayList<Invoker<T>>();
        //开启线程
        for (int i = 0; i < forks; i++) {
            //在invoker列表(排除selected)后,如果没有选够,则存在重复循环问题.
            Invoker<T> invoker = select(loadbalance, invocation, 
             invokers, selected);
            if(!selected.contains(invoker)){//防止重复添加invoker
                selected.add(invoker);
            }
        }
    }
    RpcContext.getContext().setInvokers((List)selected);
    final AtomicInteger count = new AtomicInteger();
    //创建队列，进入等待，只有调用成功或者出现异常达到次数，才唤醒
    final BlockingQueue<Object> ref = new LinkedBlockingQueue<Object>();
    for (final Invoker<T> invoker : selected) {
        executor.execute(new Runnable() {
            public void run() {
                try {
                    Result result = invoker.invoke(invocation);
                    //当成功了，即唤醒队列
                    ref.offer(result);
                } catch(Throwable e) {
                    int value = count.incrementAndGet();
                    //当出现异常时，但是次数是selected，才唤醒
                    if (value >= selected.size()) {
                        ref.offer(e);
                    }
                }
            }
        });
    }
    try {
        Object ret = ref.poll(timeout, TimeUnit.MILLISECONDS);
        if (ret instanceof Throwable) {
            Throwable e = (Throwable) ret;
            throw new RpcException("");
        }
        return (Result) ret;
    } catch (InterruptedException e) {
        throw new RpcException("");
    }
}

来源：CSDN

作者：cynthina1

链接：https://blog.csdn.net/liyue1090041509/article/details/103771877

标签

集群服务器

容错机制

Dubbo