问题
I have a question about ARM Neon VLD1 instruction's alignment. How does the alignment in the following code work?
DATA .req r0
vld1.16 {d16, d17, d18, d19}, [DATA, :128]!
Does the starting address of this read instruction shifts to DATA + a positive integer, such that it is the smallest multiple of 16(16 bytes = 128 bits) which is no less than DATA, or DATA itself changes to the smallest multiple of 16 no less than DATA?
回答1:
It is a hint to the CPU. Only thing I read about the usefulness of such hint was from a blog post on ARM's site claiming it makes the loading faster, it doesn't say how or why however. Probably because CPU can issue wider loads.
You can also specify an alignment for the pointer passed in Rn, using the optional : parameter, which often speeds up memory accesses.
If you provide the hint you must make sure that DATA
is aligned to 16 bytes otherwise you'll get an hardware exception.
This hardware behavior is described in VLD1 description in ARM ARM as
if ConditionPassed() then
EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n);
address = R[n]; if (address MOD alignment) != 0 then GenerateAlignmentException();
if wback then R[n] = R[n] + (if register_index then R[m] else ebytes);
Elem[D[d],index,esize] = MemU[address,ebytes];
mainly this line
if (address MOD alignment) != 0 then GenerateAlignmentException();
I actually can't understand why CPU can check alignment itself and apply the best condition. May be that would cost too much cycles.
来源:https://stackoverflow.com/questions/14708679/alignment-in-vld1