MOVDQU instruction + page boundary

后端 未结 2 1765
臣服心动
臣服心动 2021-01-17 16:48

I have a simple test program that loads an xmm register with the movdqu instruction accessing data across a page boundary (OS = Linux).

If the following page is mapp

相关标签:
2条回答
  • 2021-01-17 17:00

    Facing a similar problem with a library I was writing, I got some information from a very helpful contributor.

    The core of the idea is to align the 16-byte reads to the end of the string, then handle the leftover bytes at the beginning. This works because the end of the string must live in an accessible page, and you are guaranteed that the 16-byte truncated starting address must also live in an accessible page.

    Since we never read past the string we cannot potentially stray into a protected page.

    To handle the initial set of bytes, I chose to use the PCMPxSTRM functions, which return the bitmask of matching bytes. Then it's simply a matter of shifting the result to ignore any mask bits that occur before the true beginning of the string.

    0 讨论(0)
  • 2021-01-17 17:17

    First, any algorithm which tries to access an unmapped address will cause a SegFault. If a non-AVX code flow used a 4 byte load to access the last byte of a page and the first 3 bytes of "the next page" which happened to not be mapped then it would also cause a SegFault. No? I believe that the "issue" is that the AVX(1/2/3) registers are so much bigger than "typical" that algorithms which were unsafe (but got away with it) get caught if they are trivially extended to the larger registers.

    Aligned loads (MOVDQA) can never have this problem since they don't cross any boundaries of their own size or greater. Unaligned loads CAN have this problem (as you've noted) and "often" do. The reason for this is that the instruction is defined to load the full size of the target register. You need to look at the operand types in the instruction definitions quite carefully. It doesn't matter how much of the data you are interested in. It matters what the instruction is defined to do.

    However...

    AVX1 (Sandybridge) added a "masked move" capability which is slower than a movdqa or movdqu but will not (architecturally) access the unmapped page so long as the mask is not enabled for the portion of the access which would have fallen in that page. This is meant to address the issue. In general, moving forward, it appears that masked portions (See AVX512) of loads/stores will not cause access violations on IA either.

    (It is a bummer about PCMPxSTRx behavior. Perhaps you could add 15 bytes of padding to your "string" objects?)

    0 讨论(0)
提交回复
热议问题