Checksum code implementation for Neon in Intrinsics

前端 未结 1 1333
不思量自难忘°
不思量自难忘° 2021-01-24 10:06

I\'m trying to implement the checksum computation code(2\'s complement addition) for NEON, using intrinsic. The current checksum computation is being carried out on ARM.

相关标签:
1条回答
  • 2021-01-24 10:41

    A few things you can improve:

    • Get rid of the stores to disp - this looks like debug code that got left in ?
    • Don't do horizontal addition within your main loop - just do partial (vertical) sums in the loop and do one final horizontal addition after the loop (see this answer for an example of how to do this - it's for SSE but the principle is the same)
    • Make sure you use gcc -O3 ... to get maximum benefit from compiler optimisation
    • Don't use goto ! (Doesn't affect performance but is evil.)
    0 讨论(0)
提交回复
热议问题