In render script, I am using bound pointers to iterate over a large image. The problem is in the array access performance.
...
for(int i=0; i < channels; i++
Understanding the performance implications of what you write in RenderScript (or openCL) is complex.
Just writing it in RendersScript does not guarantee performance. Many times you encounter cache coherence issues when your memory access hop around.
Quite often it is better to structure the code as a series of kernels that process in a cache friendly manner.
Sorry if this is vague. You questing does not have enough details.