Optimizing data types to increase speed

问题

In order to write efficient code, you should use the simplest possible data types. This is even more true for Renderscript, where the same calculation is repeated so many times in a kernel. Now, I wanted to write a very simple kernel, that takes an (color) bitmap as input and produces an int[] array as output :

#pragma version(1)
#pragma rs java_package_name(com.example.xxx)
#pragma rs_fp_relaxed

uint __attribute__((kernel)) grauInt(uchar4 in) {
uint gr= (uint) (0.21*in.r + 0.72*in.g + 0.07*in.b);    
return gr;  
}

Java side:

int[] data1 = new int[width*height];
ScriptC_gray graysc;
graysc=new ScriptC_gray(rs);
Type.Builder TypeOut = new Type.Builder(rs, Element.U32(rs));
TypeOut.setX(width).setY(height);
Allocation outAlloc = Allocation.createTyped(rs, TypeOut.create());

Allocation inAlloc = Allocation.createFromBitmap(rs, bmpfoto1,     Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_SCRIPT);
graysc.forEach_grauInt(inAlloc,outAlloc);
outAlloc.copyTo(data1);

This takes 40 ms on my Samsung S5 (5.0) and 180 ms on my Samsung Tab2(4.2) for a 600k pixel bitmap. Now I tried to optimize. Since the output is actually an 8bit unsigned integer (0-255), I tried the following:

uchar __attribute__((kernel)) grauInt(uchar4 in) {
uchar gr= 0.2125*in.r + 0.7154*in.g + 0.0721*in.b;
return gr;
}

and in Java changed the 4th line to:

Type.Builder TypeOut = new Type.Builder(rs, Element.U8(rs));

However, this creates the error „32 bit integer source does not match allocation type UNSIGNED_8“. My explanation for this is that the forEach_grauInt(inAlloc,outAlloc) statement expects the same Element-type on the input and output side. Thus I tried to « disconnecd » in- and out-Allocation and consider the input-allocation (bitmap) as a global variable bmpAllocIn as follows:

#pragma version(1)
#pragma rs java_package_name(com.example.dani.oldgauss)
#pragma rs_fp_relaxed

rs_allocation bmpAllocIn;
int32_t width;
int32_t height;

uchar __attribute__((kernel)) grauInt(uint32_t x, uint32_t y) {
uchar4 c=rsGetElementAt_uchar4(bmpAllocIn, x, y);
uchar gr= (uchar) 0.2125*c.r + 0.7154*c.g + 0.0721*c.b;
return gr;
}

With Java side:

int[] data1 = new int[width*height];
ScriptC_gray graysc;
graysc=new ScriptC_gray(rs);

graysc.set_bmpAllocIn(Allocation.createFromBitmap(rs,bmpfoto1));
Type.Builder TypeOut = new Type.Builder(rs, Element.U8(rs));
TypeOut.setX(width).setY(height);
Allocation outAlloc = Allocation.createTyped(rs, TypeOut.create());

graysc.forEach_grauInt(outAlloc);
outAlloc.copyTo(data1);

Now the surprising thing is that I get again the same error message: „32 bit integer source does not match allocation type UNSIGNED_8“. This I cannot understand. What am I doing wrong here?

回答1:

The reason is the

int[] data1 = new int[width * height];

line. You are attempting to use the array it creates as the target for copyTo(), and that throws an exception. Change it to

byte[] data1 = new byte[width * height];

and all will be ok. And BTW, input and output allocations can be of different types.

As a side note, you could also eliminate floating point computation from your RS filter entirely, it will improve the performance on some architectures.

来源：https://stackoverflow.com/questions/32015660/optimizing-data-types-to-increase-speed

标签

types

allocation

renderscript