Why would introducing useless MOV instructions speed up a tight loop in x86_64 assembly?

前端 未结 4 1329
北海茫月
北海茫月 2020-12-12 08:48

Background:

While optimizing some Pascal code with embedded assembly language, I noticed an unnecessary MOV instruction, and removed it

4条回答
  •  时光说笑
    2020-12-12 09:33

    I believe in modern CPUs the assembly instructions, while being the last visible layer to a programmer for providing execution instructions to a CPU, actually are several layers from actual execution by the CPU.

    Modern CPUs are RISC/CISC hybrids that translate CISC x86 instructions into internal instructions that are more RISC in behavior. Additionally there are out-of-order execution analyzers, branch predictors, Intel's "micro-ops fusion" that try to group instructions into larger batches of simultaneous work (kind of like the VLIW/Itanium titanic). There are even cache boundaries that could make the code run faster for god-knows-why if it's bigger (maybe the cache controller slots it more intelligently, or keeps it around longer).

    CISC has always had an assembly-to-microcode translation layer, but the point is that with modern CPUs things are much much much more complicated. With all the extra transistor real estate in modern semiconductor fabrication plants, CPUs can probably apply several optimization approaches in parallel and then select the one at the end that provides the best speedup. The extra instructions may be biasing the CPU to use one optimization path that is better than others.

    The effect of the extra instructions probably depends on the CPU model / generation / manufacturer, and isn't likely to be predictable. Optimizing assembly language this way would require execution against many CPU architecture generations, perhaps using CPU-specific execution paths, and would only be desirable for really really important code sections, although if you're doing assembly, you probably already know that.

提交回复
热议问题