I\'m looking for the fastest way to decode a local mpeg-4 video\'s frames on the iPhone. I\'m simply interested in the luminance values of the pixels in every 10th frame. I
Seemingly vImage might be appropriate, assuming you can use iOS 5. Every 10th frame seems to be within reason for using a framework like vImage. However, any type of actual real-time processing is almost certainly going to require OpenGL.
Assuming the bottleneck of your application is in the code that converts the video frames to a displayable format (like RGB), you might be interested in a code I shared that was used to convert one .mp4 frame (encoded as YV12) to RGB using Qt and OpenGL. This application uploads the frame to the GPU and activates a GLSL fragment shader to do the conversion from YV12 to RGB, so it could be displayed in a QImage
.
static const char *p_s_fragment_shader =
"#extension GL_ARB_texture_rectangle : enable\n"
"uniform sampler2DRect tex;"
"uniform float ImgHeight, chromaHeight_Half, chromaWidth;"
"void main()"
"{"
" vec2 t = gl_TexCoord[0].xy;" // get texcoord from fixed-function pipeline
" float CbY = ImgHeight + floor(t.y / 4.0);"
" float CrY = ImgHeight + chromaHeight_Half + floor(t.y / 4.0);"
" float CbCrX = floor(t.x / 2.0) + chromaWidth * floor(mod(t.y, 2.0));"
" float Cb = texture2DRect(tex, vec2(CbCrX, CbY)).x - .5;"
" float Cr = texture2DRect(tex, vec2(CbCrX, CrY)).x - .5;"
" float y = texture2DRect(tex, t).x;" // redundant texture read optimized away by texture cache
" float r = y + 1.28033 * Cr;"
" float g = y - .21482 * Cb - .38059 * Cr;"
" float b = y + 2.12798 * Cb;"
" gl_FragColor = vec4(r, g, b, 1.0);"
"}"
If you are willing to use an iOS 5 only solution, take a look at the sample app ChromaKey from the 2011 WWDC session on AVCaputureSession.
That demo captures 30 FPS of video from the built-in camera and passes each frame to OpenGL as a texture. It then uses OpenGL to manipulate the frame, and optionally writes the result out to an output video file.
The code uses some serious low-level magic to bind a Core Video Pixel buffer from an AVCaptureSession to OpenGL so they share memory in the graphics hardware.
It should be fairly straightforward to change the AVCaptureSession to use a movie file as input rather than camera input.
You could probably set up the session to deliver frames in Y/UV form rather than RGB, where the Y component is luminance. Failing that, it would be a pretty simple matter to write a shader that would convert RGB values for each pixel to luminance values.
You should be able to do all this on ALL Frames, not just every 10th frame.