I wrote an algorithm to convert a RGB image to a YUV420. I spend a long time trying to make it faster but I haven\'t find any other way to boost its efficiency, so now I turn to
Do not access pointers more then once, copy the value to the stack and then use the value on the stack. (Aliasing)
...
int v_r = *r;
int v_g = *g;
int v_b = *b;
*y = ((lookup66[v_r] + lookup129[v_g] + lookup25[v_b]) >> 8) + 16;
...
On the other hand, you can do it in SSE without look-up tables and would do 8 pixels at once.