For instance, consider the DFT or DCT. Precisely, what would be the differences between an image transformed by sub-blocks, and an image transformed whole? Is the resulting
They are designed so they can be implemented using parallel hardware. Each block is independent, and can be calculated on a different computing node, or shared out to as many nodes as you have.
Also as noted in an answer to Why JPEG compression processes image by 8x8 blocks? the computational complexity is high. I think (block_y_size × block_y_size)2
It's to make the image smaller. There a many ways to subdivide an image into blocks. The most simple is by complete rows. More advance tiling is with fractals, i.e hilbert curve. Jpeg 2000 uses a hilbert curve. It uses additional spatial information and it's also used in mapping applications.