问题
I am working on designing a system where I need to manage configuration (config files) in a dynamic way across bunch of application servers. I am working with Consul key value store to manage configurations.
I created below node in consul kv store for the purpose of managing configurations.
{"remoteConfig":"abc-123.tgz", "...."}
Here remoteConfig
contains the config file that all the app servers will use (atleast this is the design I got).
Below is what I am trying to do:
- All the app servers keep a watch on above node in Consul and as soon as value of
remoteConfig
key changes, they will be notified and then they will download this config and store it on disk. - Now once all the app servers in the cluster have downloaded the new config then only we should switch to use new configs in memory across all the boxes in the cluster. If few app servers failed to download then we should not switch to use latest configs in remaining boxes where it was successful.
I am able to do first point easily but I am confuse on how to design my second point efficiently which can help me to switch to latest configs only when all the app servers have downloaded that particular config. I do know on how to atomically update a node by acquiring and releasing lock in Consul but confusion is on how to design it efficiently to handle these cases easily.
Question:
- How should I design my node in such a way so that it is easier for me to see that all the machines have download this particular config successfully? And it is time now to switch to latest configs on all the boxes.
- If some machines failed to download a particular config then it should be clear from reading it that this app server failed to download and maybe it can also show timestamp like this app server downloaded this config at this timestamp and they switched to new config at this timestamp.
I don’t have to keep history for all the configs status for each machine, just the latest one will be sufficient. Any other improvements are also welcome in above design to manage the configuration in a dynamic way.
(Note: I can have bunch of other nodes as well (like status node) to do this exercise just fyi. Also instead of Consul, we can use Zookeeper also bcoz lock/leader stuff can be done in both the technologies but for now I am gonna stick to Consul)
回答1:
I can't answer your question, but I am concerned about a potential race condition that might occur if you find a way to achieve your stated goal.
Let's assume you have 5 servers and all are using version 1 of the configuration files. Then the servers are asked to download version 2 of the configuration files. When all 5 servers have done that, you (somehow) send a signal to all 5 servers to tell them to switch from version 1 to version 2 of the configuration files. This is where the race condition can occur. Switching from version 1 to version 2 of the configuration file is not guaranteed to occur at the same point in time in each of the 5 servers. Thus, for a brief period of time (perhaps just a few milliseconds) some servers will still be using version 1 while other servers will be using version 2. During that brief period of time, you will have inconsistent configuration on your servers.
If that brief inconsistency can cause problems for you, then I think you will need a different "switch from version 1 to version 2 of configuration" mechanism, which in essence boils down to: (1) ask all the server processes to terminate; (2) wait for all of them to terminate, and (3) restart them with version 2 of the configuration. Obviously, this approach necessitates a brief period during which servers are not running, which is not ideal, but at least it avoids the race condition.
来源:https://stackoverflow.com/questions/63024655/how-to-design-a-system-that-can-manage-configurations-in-a-dynamic-way-efficient