I wrote an algorithm to convert a RGB image to a YUV420. I spend a long time trying to make it faster but I haven\'t find any other way to boost its efficiency, so now I turn to
The only obvious point I can see is that you're doing 3 * i
three times. You could store that result in a local variable but the compiler may well already be doing that. So..
r = rgb + 3 * i;
g = rgb + 3 * i + 1;
b = rgb + 3 * i + 2;
...becomes:
r = rgb + 3 * i;
g = r + 1;
b = g + 1;
..although I doubt it'd have much impact.
As ciphor suggests, I think assembly is the only way you're likely to improve upon what you've got there.