Long latency instruction
问题 I would like a long-latency single-uop x86 1 instruction, in order to create long dependency chains as part of testing microarchitectural features. Currently I'm using fsqrt , but I'm wondering is there is something better. Ideally, the instruction will score well on the following criteria: Long latency Stable/fixed latency One or a few uops (especially: not microcoded) Consumes as few uarch resources as possible (load/store buffers, page walkers, etc) Able to chain (latency-wise) with itself