Optimizing Cortex-A8 color conversion using NEON
问题 I am currently doing a color conversion routine in order to convert from YUY2 to NV12. I have a function which is quite fast, but not as fast as I would expect, mainly due to cache misses. void convert_hd(uint8_t *orig, uint8_t *result) { uint32_t width = 1280; uint32_t height = 720; uint8_t *lineOdd = orig; uint8_t *lineEven = orig + width*2; uint8_t *resultYOdd = result; uint8_t *resultYEven = result + width; uint8_t *resultUV = result + height*width; uint32_t totalLoop = height/2; while