Fast Gaussian blur on unsigned char image- ARM Neon Intrinsics- iOS Dev

十年热恋 提交于 2019-12-04 20:06:27
Paul R

Before you get into SIMD optimisation with NEON you should first improve your scalar implementation. The biggest problem with your code as it stands is that it has been implemented as if it were a non-separable filter, whereas a Gaussian kernel is separable. By switching to a separable implementation you reduce the number of operations form N^2 to 2N, which in your case of a 5x5 kernel would be a reduction from 25 multiply-adds to 10, i.e. a 2.5x speed up for very little effort.

It may be that a sufficiently optimised scalar implementation will meet your needs without the need to resort to SIMD. If not then you can at least carry these scalar optimisations over into a vectorized implementation.


http://en.wikipedia.org/wiki/Gaussian_blur

http://blogs.mathworks.com/steve/2006/11/28/separable-convolution-part-2/

  1. Separate your kernel, as described by Paul R.
  2. Don't re-invent the wheel. Use vImage, which is part of the Accelerate framework, and implements a vectorized, multi-threaded convolution for you. Specifically, it seems like you want the function vImageConvolve_Planar8.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!