I wonder whether it is possible to optimize element-wise operations on buffers. In my case there are three input buffers (containing source, mean and variance values) and th