I need help with the performance of the following code. It does a memcpy on two dynamically allocated arrays of arbitrary size:
int main()
{
double *a, *b;
u
The first bzero runs longer because of (1) lazy page allocation and (2) lazy page zero-initialization by kernel. While second reason is unavoidable because of security reasons, lazy page allocation may be optimized by using larger ("huge") pages.
There are at least two ways to use huge pages on Linux. Hard way is hugetlbfs. Easy way is Transparent huge pages.
Search khugepaged
in the list of processes on your system. If such process exists, transparent huge pages are supported, you can use them in your application if you change malloc
to this:
posix_memalign((void **)&b, 2*1024*1024, n*sizeof(double));
madvise((void *)b, n*sizeof(double), MADV_HUGEPAGE);