I\'m trying to implement the checksum computation code(2\'s complement addition) for NEON, using intrinsic. The current checksum computation is being carried out on ARM.
A few things you can improve:
disp
- this looks like debug code that got left in ?gcc -O3 ...
to get maximum benefit from compiler optimisationgoto
! (Doesn't affect performance but is evil.)