The task is very simple, writting a seqence of integer variable to memory:
Original code:
for (size_t i=0; i<1000*1000*1000; ++i)
{
data[i]=i;
};
<
Is there any reason why you would expect all of data[]
to be in powered-up RAM pages?
The DDR3 pre-fetchter will correctly predict most accesses but the frequent x86-64 page boundaries might be an issue. You're writing to virtual memory, so at each page boundary there's a potential mis-prediction of the pre-fetcher. You can greatly reduce this by using large pages (e.g. MEM_LARGE_PAGES
on Windows).