I would like to run a large cluster of nodes in the cloud (AWS, Heroku, or maybe self-manged VMS), whose clocks must be synchronized with a predefined tolerance in mind. I\'m l
After struggling with NTP on VMs for so many months, we have switched using the chrony https://chrony.tuxfamily.org. I have found it to be far superior to ntpd in so many respects (configuration, control, documentation, handling issues where vm clock drifts often and drastically).
Use chrony and don't look back :)
Since the FAQ for NTP specifically states why NTP time sync doesn't work 'right' under virtual machines, it's probably an insurmountable problem.
Most machines have a RTC (real-time clock) in them, on PCs its how you store the time so that you have a 'rough' guess as to what the time is if ntp is unavailable, once the system is loaded there's a 'tick' clock that is higher resolution - thats what NTP sets.
That tick clock is subject to the drift of the virtual machine since ticks may or may not happen at the correct intervals - any time mechanism you attempt to use is going to be subject to that drift.
It's probably suboptimal design to try to enforce ntp synchronization on virtual machines, if machine A and B have a delta of 200ms, and machine B and C have a delta of 200ms, C could 400ms away from A. You can't control that.
You're better off using a centralized messaging system like zeromq to keep everybody in sync with the job queue, it's going to be more overhead, but relying on system tick time is a dodgy affair at best. There are many clustering solutions that account for cluster participation using all sorts of reliable mechanisms to ensure that everyone is in sync, take a look at corosync or spread - they've solved this already for things like two-phase-commits.
Incidentally, ntp 'giving up' when drift is too high can be circumvented by instructing it to 'slam' the time to the new value rather than 'slew'. By default ntp will incrementally update the system time to account for its drift from 'real time'. I forget how to configure this in ntpd, but if you use ntpdate the flag is -B
-B Force the time to always be slewed using the adjtime(2) system call, even if the measured
offset is greater than +-128 ms. The default is to step the time using settimeofday(2) if the offset
is greater than +-128 ms. Note that, if the offset is much greater than +-128 ms in this case, it
can take a long time (hours) to slew the clock to the correct value. During this time, the host
should not be used to synchronize clients.