How to manipulate on the fly YUV Camera frame efficiently in Android?

后端 未结 2 756
既然无缘
既然无缘 2021-01-27 03:26

I\'m adding a black (0) padding around Region of interest (center) of NV21 frame got from Android CameraPreview callbacks in a thread.

To avoid overhead of

相关标签:
2条回答
  • 2021-01-27 03:59

    Obviously, the most efficient way to pass image for detection would be to pass the ROI rectangle to detector. All our image processing functions accept bounding box as a parameter.

    If the black margin is used for display, consider using a black overlay mask for preview layout instead of pixel manipulation.

    If pixel manipulation is inevitable, check if you can limit it to Y OK, you already do this!

    If your detector works on a downscaled image (as my face recognition engine does), it may be wise to apply black out to a resized frame.

    At any rate, keep your loops clean and tidy, remove all recurring calculations. Using Arrays.fill() operations may help significantly, but not dramatically.

    0 讨论(0)
  • 2021-01-27 04:01

    1. Yes. To understand why, let's take a look at the bytecode Android Studio produces for your "left/right of center" nested loop:

    (Annotated excerpt from a release build of blackNonROI, AS 3.2.1):

    :goto_27
    sub-int v2, p2, p4         ;for(int y=verMargin; y<height-verMargin; y++)
    if-ge v1, v2, :cond_45
    const/4 v2, 0x0
    :goto_2c
    if-ge v2, p3, :cond_36     ;for (int x = 0; x < hozMargin; x++)
    mul-int v3, v1, p1
    add-int/2addr v3, v2
    .line 759
    aput-byte v0, p0, v3
    add-int/lit8 v2, v2, 0x1
    goto :goto_2c
    :cond_36
    sub-int v2, p1, p3 
    :goto_38
    if-ge v2, p1, :cond_42     ;for (int x = width-hozMargin; x < width; x++)
    mul-int v3, v1, p1
    add-int/2addr v3, v2
    .line 761
    aput-byte v0, p0, v3
    add-int/lit8 v2, v2, 0x1
    goto :goto_38
    :cond_42
    add-int/lit8 v1, v1, 0x1
    goto :goto_27
    .line 764
    :cond_45                   ;all done with the for loops!
    

    Without bothering to decipher this whole thing line-by-line, it is clear that each of your small, inner loops is performing:

    • 1 comparison
    • 1 integer multiplication
    • 1 addition
    • 1 store
    • 1 goto

    That's a lot, when you consider that all that you really need this inner loop to do is set a certain number of successive array elements to 0.

    Moreover, some of these bytecodes require multiple machine instructions to implement, so I wouldn't be surprised if you're looking at over 20 cycles, just to do a single iteration of one of the inner loops. (I haven't tested what this code looks like once it's compiled by the Dalvik VM, but I sincerely doubt it is smart enough to optimize the multiplications out of these loops.)

    POSSIBLE FIXES

    You could improve performance by eliminating some redundant calculations. For example, each inner loop is recalculating y * width each time. Instead, you could pre-calculate that offset, store it in a local variable (in the outer loop), and use that when calculating the indices.

    When performance is absolutely critical, I will sometimes do this sort of buffer manipulation in native code. If you can be reasonably certain that mPendingFrameData is a DirectByteBuffer, this is an even more attractive option. The disadvantages are 1.) higher complexity, and 2.) less of a "safety net" if something goes wrong/crashes.

    MOST APPROPRIATE FIX

    In your case, the most appropriate solution is probably just to use Arrays.fill(), which is more likely to be implemented in an optimized way.

    Note that the top and bottom blocks are big, contiguous chunks of memory, and can be handled by one Arrays.fill() each:

    Arrays.fill(yuvData, 0, verMargin * width, 0);   //top
    Arrays.fill(yuvData, width * height - verMargin * width, width * height, 0);    //bottom
    

    And then the sides could be handled something like this:

    for(int y=verMargin; y<height-verMargin; y++){
        int offset = y * width;
        Arrays.fill(yuvData, offset, offset + hozMargin, 0);  //left
        Arrays.fill(yuvData, offset + width, offset + width - hozMargin, 0);   //right
    }
    

    There are more opportunities for optimization, here, but we're already at the point of diminishing returns. For example, since the end of each row of is adjacent to the start of the next one (in memory), you could actually combine two smaller fill() calls into a larger one that covers both the right side of row N and the left side of row N + 1. And so forth.

    2. Not sure. If your preview is displaying without any corruption/tearing, then it's probably a safe place to call the function from (from a thread safety standpoint), and is therefor probably as good a place as any.

    3 and 4. There could be libraries for doing this task; I don't know of any offhand, for Java-based NV21 frames. You'd have to do some format conversions, and I don't think it's be worth it. Using a GPU to do this work is excessive over-optimization, in my opinion, but it may be appropriate for some specialized applications. I'd consider going to JNI (native code) before I'd ever consider using the GPU.

    I think your choice to do the manipulation directly to the NV21, instead of converting to a bitmap, is a good one (considering your needs and the fact that the task is simple enough to avoid needing a graphics library).

    0 讨论(0)
提交回复
热议问题