Is there a standard, strided version of memcpy?

问题

I have a column vector A which is 10 elements long. I have a matrix B which is 10 by 10. The memory storage for B is column major. I would like to overwrite the first row in B with the column vector A.

Clearly, I can do:

for ( int i=0; i < 10; i++ )
{
    B[0 + 10 * i] = A[i];
}

where I've left the zero in 0 + 10 * i to highlight that B uses column-major storage (zero is the row-index).

After some shenanigans in CUDA-land tonight, I had a thought that there might be a CPU function to perform a strided memcpy?? I guess at a low-level, performance would depend on the existence of a strided load/store instruction, which I don't recall there being in x86 assembly?

回答1:

Short answer: The code you have written is as fast as it's going to get.

Long answer: The memcpy function is written using some complicated intrinsics or assembly because it operates on memory operands that have arbitrary size and alignment. If you are overwriting a column of a matrix, then your operands will have natural alignment, and you won't need to resort to the same tricks to get decent speed.

来源：https://stackoverflow.com/questions/6013779/is-there-a-standard-strided-version-of-memcpy

标签

memcpy

stride

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!