How do I establish clock synchronization in the cloud (AWS, heroku, etc) across many nodes?

后端 未结 2 591
谎友^
谎友^ 2021-02-08 02:00

I would like to run a large cluster of nodes in the cloud (AWS, Heroku, or maybe self-manged VMS), whose clocks must be synchronized with a predefined tolerance in mind. I\'m l

2条回答
  •  逝去的感伤
    2021-02-08 02:37

    Since the FAQ for NTP specifically states why NTP time sync doesn't work 'right' under virtual machines, it's probably an insurmountable problem.

    Most machines have a RTC (real-time clock) in them, on PCs its how you store the time so that you have a 'rough' guess as to what the time is if ntp is unavailable, once the system is loaded there's a 'tick' clock that is higher resolution - thats what NTP sets.

    That tick clock is subject to the drift of the virtual machine since ticks may or may not happen at the correct intervals - any time mechanism you attempt to use is going to be subject to that drift.

    It's probably suboptimal design to try to enforce ntp synchronization on virtual machines, if machine A and B have a delta of 200ms, and machine B and C have a delta of 200ms, C could 400ms away from A. You can't control that.

    You're better off using a centralized messaging system like zeromq to keep everybody in sync with the job queue, it's going to be more overhead, but relying on system tick time is a dodgy affair at best. There are many clustering solutions that account for cluster participation using all sorts of reliable mechanisms to ensure that everyone is in sync, take a look at corosync or spread - they've solved this already for things like two-phase-commits.

    Incidentally, ntp 'giving up' when drift is too high can be circumvented by instructing it to 'slam' the time to the new value rather than 'slew'. By default ntp will incrementally update the system time to account for its drift from 'real time'. I forget how to configure this in ntpd, but if you use ntpdate the flag is -B

    -B      Force the time to always be slewed using the adjtime(2) system call, even if the measured 
    offset is greater than +-128 ms.  The default is to step the time using settimeofday(2) if the offset 
    is greater than +-128 ms.  Note that, if the offset is much greater than +-128 ms in this case, it
    can take a long time (hours) to slew the clock to the correct value.  During this time, the host 
    should not be used to synchronize clients.
    

提交回复
热议问题