Can somebody enlighten me how does CAS software like http://www.ptlsim.org/ work? How do they achieve cycle accuracy if there is neither information how many cycles are used
This is not an answer to both questions; I'm only going to answer the second one. Feel free to upvote, although Mackie's answer seems better in general.
Hard RTOS is hard to implement on x86. One special thing that can kill every promise made by RTOS is SMM, or System Management Mode. CPU enters it after System Management Interrupt, which can fire for different reasons - hardware failure, write at some special MMIO location, out
instruction to some special port. You cannot disable it, you cannot really predict when SMI happens and SMI handlers can take very long time to finish.
Essentially, you know exactly nothing about when CPU is in SMM, until something fails in your OS thanks to long time CPU has spent handling SMI. In some special cases, it can become a problem even for not-realtime OSes, not to mention hard RTOSes.
There is also this thread than can provide you with some more points about RTOSing on x86.