I\'m trying to speed up a variable-bitwidth integer compression scheme and I\'m interested in generating and executing assembly code on-the-fly. Currently a lot of time is spe
I found some better documentation from Intel and this seemed like the best place to put it for future reference:
Software should avoid writing to a code page in the same 1-KByte
subpage that is being executed or fetching code in the same 2-KByte
subpage of that is being written.
Intel® 64 and IA-32 Architectures Optimization Reference Manual
It's only a partial answer to the questions (test, test, test) but firmer numbers than the other sources I had found.
3.6.9 Mixing Code and Data.
Self-modifying code works correctly, according to the Intel architecture processor requirements, but incurs a significant performance penalty. Avoid self-modifying code if possible. • Placing writable data in the code segment might be impossible to distinguish from self-modifying code. Writable data in the code segment might suffer the same performance penalty as self- modifying code.
Assembly/Compiler Coding Rule 57. (M impact, L generality) If (hopefully read-only) data must occur on the same page as code, avoid placing it immediately after an indirect jump. For example, follow an indirect jump with its mostly likely target, and place the data after an unconditional branch. Tuning Suggestion 1. In rare cases, a performance problem may be caused by executing data on a code page as instructions. This is very likely to happen when execution is following an indirect branch that is not resident in the trace cache. If this is clearly causing a performance problem, try moving the data elsewhere, or inserting an illegal opcode or a PAUSE instruction immediately after the indirect branch. Note that the latter two alternatives may degrade performance in some circumstances.
Assembly/Compiler Coding Rule 58. (H impact, L generality) Always put code and data on separate pages. Avoid self-modifying code wherever possible. If code is to be modified, try to do it all at once and make sure the code that performs the modifications and the code being modified are on separate 4-KByte pages or on separate aligned 1-KByte subpages.
3.6.9.1 Self-modifying Code.
Self-modifying code (SMC) that ran correctly on Pentium III processors and prior implementations will run correctly on subsequent implementations. SMC and cross-modifying code (when multiple processors in a multiprocessor system are writing to a code page) should be avoided when high performance is desired.
Software should avoid writing to a code page in the same 1-KByte subpage that is being executed or fetching code in the same 2-KByte subpage of that is being written. In addition, sharing a page containing directly or speculatively executed code with another processor as a data page can trigger an SMC condition that causes the entire pipeline of the machine and the trace cache to be cleared. This is due to the self-modifying code condition. Dynamic code need not cause the SMC condition if the code written fills up a data page before that page is accessed as code.
Dynamically-modified code (for example, from target fix-ups) is likely to suffer from the SMC condition and should be avoided where possible. Avoid the condition by introducing indirect branches and using data tables on data pages (not code pages) using register-indirect calls.