What is Bulkhead Pattern used by Hystrix?

后端 未结 2 1963
北海茫月
北海茫月 2021-01-29 19:39

Hystrix, a Netflix API for latency and fault tolerance in complex distributed systems uses Bulkhead Pattern technique for thread isolation. Can someone please elaborate on it.

相关标签:
2条回答
  • 2021-01-29 20:16

    General

    In general, the goal of the bulkhead pattern is to avoid faults in one part of a system to take the entire system down. The term comes from ships where a ship is divided in separate watertight compartments to avoid a single hull breach to flood the entire ship; it will only flood one bulkhead.

    Implementations of the bulkhead pattern can take many forms depending on what kind of faults you want to protect the system from. I will only discuss the type of faults Hystrix handles in this answer.

    I think the bulkhead pattern was popularized by the book Release It! by Michael T. Nygard.

    What Hystrix Solves

    The bulkhead implementation in Hystrix limits the number of concurrent calls to a component. This way, the number of resources (typically threads) that is waiting for a reply from the component is limited.

    Assume you have a request based, multi threaded application (for example a typical web application) that uses three different components, A, B, and C. If requests to component C starts to hang, eventually all request handling threads will hang on waiting for an answer from C. This would make the application entirely non-responsive. If requests to C is handled slowly we have a similar problem if the load is high enough.

    Hystrix' implementation of the bulkhead pattern limits the number of concurrent calls to a component and would have saved the application in this case. Assume we have 30 request handling threads and there is a limit of 10 concurrent calls to C. Then at most 10 request handling threads can hang when calling C, the other 20 threads can still handle requests and use components A and B.

    Hystrix' approaches

    Hystrix' has two different approaches to the bulkhead, thread isolation and semaphore isolation.

    Thread Isolation

    The standard approach is to hand over all requests to component C to a separate thread pool with a fixed number of threads and no (or a small) request queue.

    Semaphore Isolation

    The other approach is to have all callers acquire a permit (with 0 timeout) before requests to C. If a permit can't be acquired from the semaphore, calls to C are not passed through.

    Differences

    The advantage of the thread pool approach is that requests that are passed to C can be timed out, something that is not possible when using semaphores.

    0 讨论(0)
  • 2021-01-29 20:23

    Here is a good example with runtime explanation for bulkhead in Resilience4j which is inspired by Netflix Hystrix.

    Below example configurations might give some clarity of usage.

    Example configurations: Allow maximum 5 concurrent calls at any given time. Keep other calls waiting for until one of the in-process 5 concurrent finishes or until maximum of 2 seconds.

    Idea is not to burden any system with load more than they can consume. If incoming load is greater than consumption, then wait for reasonable time or just timeout & go for alternate path.

    0 讨论(0)
提交回复
热议问题