I want to read a memory location without polluting the cache. I am working on X86 Linux machine. I tried using MOVNTDQA assembler instruction:
asm(\"movntdqa %
The problem with the movntdqa instruction with %%xmm as target (loading from memory) is that this insn is only available with SSE4.1 and on. This means newer Core 2 (45 nm) or i7 only so far. The other way around (storing data to memory) is available in earlier SSE versions.
For this instruction, the processor moves the data into one very small of very few read buffers (Intel doesn't specify the exact size, but assume it is in the range of 16 bytes), where it is readily available, but gets kicked out after a few other loads.
And it does not pollute the other caches, so if you have streaming data, your approach is viable.
Remember, you need to use a sfence insn afterwards.
Prefetching exists in two variants: prefetcht0 (Prefetches data in all caches) and prefetchnt (Prefetch non temporal data). Usually prefetch in all caches is the right thing to do, for a streaming data loop the latter would be better, if you make consequent use of the streaming instructions.
You use it with the address of an object you want to use in the near future, usually some iterations ahead if you have a loop. The prefetch insn doesn't wait or block, it just makes the processor start getting the data at the specified memory location.
MOVNTDQA is only available with SSE.
Why are you trying to avoid using the cache? CPUs are generally pretty good at deciding what to kick out of the cache when. If do genuinely need to, one way would be to arrange for an alias of the memory area you are reading from to be mapped into your address space with caching disabled and reading from there.
If what you are trying to achieve is actually to minimise your code's impact on another function's working set being held in cache at the time, this should be doable by issuing appropriate prefetch and invalidate instructions.