In vision we usually process multiple images at once. I believe this is possible because most images are the same size or we can easily pad them with zeros if they are not (and