Service Fabric Deactivate (pause) vs Deactivate (restart)?

前端 未结 1 1748
终归单人心
终归单人心 2021-02-07 21:18

When I log in to Service Fabric Explorer and try to disable a node for an OS upgrade I am presented with two options:

  • Deactivate (Pause)
  • Deactivate (Resta
相关标签:
1条回答
  • 2021-02-07 22:22

    Service Fabric has APIs that let you manage nodes (in C# these are DeactivateNodeAsync and ActivateNodeAsync, in PS they're Enable/Disable-ServiceFabricNode). First of all, most of these are holdovers from when people managed their own clusters, and should be less commonly used in the Azure Hosted Service Fabric Cluster environment compared to when you run your own clusters. Either way when deactivating a node there are several different options, which we call Intents.

    You can think of these as representing increasingly severe operations on the nodes, which you'd use under different situations, and you use them to communicate to Service Fabric what is being done to the node.

    The four different options are:

    1. Pause - effectively "pauses" the node: Services on it will continue to run, but no services should move in or out of the node unless they fail on their own, or unless moving a service to the node is necessary to prevent outage or inconsistency.
    2. Restart - this will move all of the in-memory stateful and stateless services off the node, and then shut down (close) any persistent services (if it is safe to do so, if not we'll build spares).
    3. RemoveData - this will close down all of the services on the node, again building spares first if it is necessary for safety. The user is responsible for ensuring that if the node does come back, it comes back empty.
    4. RemoveNode - this will close down all of the services on the node, again building spares first if necessary for safety. In this case though you're specifically telling SF that this node isn't coming back. SF performs an additional check to make sure that the node which is being removed isn't a SeedNode (one of the nodes currently responsible for maintaining the underlying cluster). Other than that, this is the same as RemoveData.

    Now let's talk about when you'd use each. Pause is most common if you want to debug a given service, process, machine etc, and would like it to not be changed (to the degree possible) while you are looking at it. It would be a little awkward if you went to go diagnose some behavior of a service only to determine that we had just moved it on you. Restart (which is the most common of these we see used) is used when for some reason you want to move all the workloads off the node. For example Service Fabric uses this itself when upgrading the Service Fabric bits on the node - first we deactivate the node with intent restart, and then we wait for that to complete (so we know your services are not running) before we shut down and upgrade our own code on that node. RemoveData is where you know the node is being deprovisioned and will not be coming back (say that the hard drives are going to be swapped out, or the hardware being completely removed), or you know that if the node is coming back it's specifically going to be empty (say you're reimaging the machine). The difference between Restart and RemoveData is that for restart, we know the node is coming back, so we keep the knowledge of the replicas on that node. For persistent replicas this means that we don't have to build the replicas again immediately. But for RemoveData we know that the replicas are not coming back, and so need to build any spares immediately before confirming that the node is safe to restart. RemoveNode builds on top of RemoveData, and is an additional indicator that you have no specific plans to bring this node back. Since it's important to keep the SeedNodes up, SF will fail the call if the node to be removed is currently a Seed. If you really want to remove that specific node, you can reconfigure the cluster to use a different node as a seed. An example of when you'd want to use RemoveData vs. RemoveNode is that if you're scaling down a cluster, you'd be explicitly calling RemoveNode, since you intent for the nodes not to come back and want to make sure you're taking the right ones away so the underlying cluster doesn't collapse.

    Once the operation (whatever it is) is done and you want to re-enable the node, the corresponding call is Activate/Enable. Restarting a node doesn't cause it to become automatically re-enabled. So if you are done with the software patch (or whatever caused you to use intent Restart, for example), and you want services to be placed on the node again, you would call Enable/Activate with the appropriate node Name.

    As an example of the deactivate/disable call, check out the PS API documentation here

    0 讨论(0)
提交回复
热议问题