I want to read a memory location without polluting the cache. I am working on X86 Linux machine. I tried using MOVNTDQA assembler instruction:
asm(\"movntdqa %
The problem with the movntdqa instruction with %%xmm as target (loading from memory) is that this insn is only available with SSE4.1 and on. This means newer Core 2 (45 nm) or i7 only so far. The other way around (storing data to memory) is available in earlier SSE versions.
For this instruction, the processor moves the data into one very small of very few read buffers (Intel doesn't specify the exact size, but assume it is in the range of 16 bytes), where it is readily available, but gets kicked out after a few other loads.
And it does not pollute the other caches, so if you have streaming data, your approach is viable.
Remember, you need to use a sfence insn afterwards.
Prefetching exists in two variants: prefetcht0 (Prefetches data in all caches) and prefetchnt (Prefetch non temporal data). Usually prefetch in all caches is the right thing to do, for a streaming data loop the latter would be better, if you make consequent use of the streaming instructions.
You use it with the address of an object you want to use in the near future, usually some iterations ahead if you have a loop. The prefetch insn doesn't wait or block, it just makes the processor start getting the data at the specified memory location.