i am trying to write a memcpy function that does not load the source memory to the cpu cache. The purpose is to avoid cache pollution. The memcpy function below works, but poll
Quoting from Intel:
"The streaming load instruction is intended to accelerate data transfers from the USWC memory type. For other memory types such as cacheable (WB) or Uncacheable (UC), the instruction behaves as a typical 16-byte MOVDQA load instruction. However, future processors may use the streaming load instruction for other memory types (such as WB) as a hint that the intended cache line should be streamed from memory directly to the core while minimizing cache pollution."
That explains why the code does not work — the memory is of type WB.