Fast ARM NEON memcpy

谁说我不能喝 提交于 2019-12-03 16:34:30

ARM has a great tech note on this.

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka13544.html

Your performance will definitely vary depending on the micro-architecture, ARM's note is on the A8 but I think it will give you a decent idea, and the summary at the bottom is a great discussion of the various pros and cons that go beyond just the regular numbers, such as which methods result in the least amount of register usage, etc.

And yes, as another commenter mentions, pre-fetching is very difficult to get right, and will work differently with different micro-architectures, depending on how big the caches are and how big each line is and a bunch of other details about the cache design. You can end up thrashing lines you need if you aren't careful. I would recommend avoiding it for portable code.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!