I have a Numpy image array with shape (1000, 50, 100, 3) (class 'numpy.ndarray') which is containing 1000 images RGB (height = 50, width = 100, channels = 3). I want to first convert the RGB values to YUV values and rescale them to obtain yuv values. A prototypical implementation of a pixel-wise converter is given below.

My question: Is there a simple way how I can carry out this transformation?

def yuv(_pixel):
    R, G, B = _pixel[0], _pixel[1], _pixel[2]
    Y = 0.299 *  R + 0.587 * G + 0.114 * B
    y = Y / 127.5 - 1
    u = (0.493 * (B - Y)) / 127.5 - 1
    v = (0.887 * (R - Y)) / 127.5 - 1
    return np.array([y, u, v])


Have you looked into numpy.apply_along_axis ?

You could do:

images_yuv = np.apply_along_axis( yuv, -1, images_rgb)

You can vectorize the conversion so that all R, G and B pixels are transformed at the same time with:

def yuv_vec(images):
    R, G, B = images[:, :, :, 0], images[:, :, :, 1], images[:, :, :, 2]
    y = (0.299 *  R + 0.587 * G + 0.114 * B) / 127.5 - 1
    u = (0.493 * (B - y)) / 127.5 - 1
    v = (0.887 * (R - y)) / 127.5 - 1
    yuv_img = np.empty(images.shape)
    yuv_img[:, :, :, 0] = y
    yuv_img[:, :, :, 1] = u
    yuv_img[:, :, :, 2] = v
    return yuv_img

To time the performance, I'll show a short nested loop implementation of the yuv function shown in the question:

def yuv(_pixel):
    R, G, B = _pixel[0], _pixel[1], _pixel[2]
    y = (0.299 *  R + 0.587 * G + 0.114 * B) / 127.5 - 1
    u = (0.493 * (B - Y)) / 127.5 - 1
    v = (0.887 * (R - Y)) / 127.5 - 1
    return np.array([y, u, v])

def yuvloop(imgs):
    yuvimg = np.empty(imgs.shape)
    for n in range(imgs.shape[0]):
        for i in range(imgs.shape[1]):
            for j in range(imgs.shape[2]):
                yuvimg[n, i, j] = yuv(imgs[n, i, j])
    return yuvimg

Some speed comparisons:

imgs = np.random.randint(0, 256, size=(100, 50, 100, 3))
%timeit yuvloop(imgs)
# Out: 8.79 s ± 265 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
% timeit np.apply_along_axis(yuv, -1, imgs)
# Out: 9.92 s ± 360 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit yuv_vec(imgs)
# Out: 34.4 ms ± 385 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

So this is 256 times faster than looping over the pixels. Using np.apply_along_axis seems to be even slower. The result of all three is the same.
I reduced the size of the test sample to 100 images, otherwise testing would have been too slow.

