x86_64 - Self-modifying code performance

醉酒当歌 提交于 2019-12-08 00:32:48

问题


I am reading the Intel architecture documentation, vol3, section 8.1.3;

Self-modifying code will execute at a lower level of performance than non-self-modifying or normal code. The degree of the performance deterioration will depend upon the frequency of modification and specific characteristics of the code.

So, if I respect the rules:

(* OPTION 1 *) Store modified code (as data) into code segment; Jump to new code or an intermediate location; Execute new code;

(* OPTION 2 ) Store modified code (as data) into code segment; Execute a serializing instruction; ( For example, CPUID instruction *) Execute new code;

AND modify the code once a week, I should only pay the penalty the next time this code is modified and about to be executed. But after that, the performance should be the same as non modified code (+ the cost of a jump to that code).

Is my understanding correct?


回答1:


There's a difference between code that's simply not yet cached, vs. code that modifies instructions that are already speculatively in-flight (fetched, maybe decoded, maybe even sitting in the scheduler and re-order buffer in the out-of-order core). Writes to memory that's already being looked at as instructions by the CPU cause it to fall back to very slow operation. This is what's usually meant by self-modifying code. Avoiding this slowdown even when JIT-compiling is not too hard. Just don't jump to your buffer until after it's all written.

Modified once a week means you might have a one microsecond penalty once a week, if you do it wrong. It's true that frequently-used data is less likely to be evicted from the cache (that's why reading something multiple times is more likely to make it "stick"), but the self-modifying-code pipeline-flush should only apply the very first time, if you encounter it at all. After that, the cache lines being executed are in prob. still hot in L1 I-cache (and uop cache), if the 2nd run happens without much intervening code. It's not still in a modified state in L1 D-cache.

I forget if http://agner.org/optimize/ talks about self-modifying code and JIT. Even if not, you should read Agner's guides if you're writing anything in ASM. Some of the stuff in the main "optimizing asm" is getting out of date and not really relevant for Sandybridge and later Intel CPUs, though. Alignment / decode issues are less of an issue thanks to the uop cache, and alignment issues can be different for SnB-family microarches.




回答2:


"The next time" is probably not the case; caching algorithms take into account accesses beyond the first one (not doing so would be rather naive). However, soon after the first few accesses the penalty should be gone. ("Few" might be two or thousands, but for a computer even a million is nothing.)

Even the code that is currently executing was written into memory at some point (perhaps even recently due to paging), so it experiences similar penalties initially, but that quickly dies down too, so you need not worry.



来源:https://stackoverflow.com/questions/34017361/x86-64-self-modifying-code-performance

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!