For instance, consider the DFT or DCT. Precisely, what would be the differences between an image transformed by sub-blocks, and an image transformed whole? Is the resulting
They are designed so they can be implemented using parallel hardware. Each block is independent, and can be calculated on a different computing node, or shared out to as many nodes as you have.
Also as noted in an answer to Why JPEG compression processes image by 8x8 blocks? the computational complexity is high. I think (block_y_size × block_y_size)2