Why do computers have byte-addressable memory, and not 4-byte-addressable memory (or 8-byte-addressable memory for 64bit)? Yeah, I see how it could be useful sometimes, it
We needed to settle down on some size for standardization. People chose 8-bit size for the reasons mentioned by Shane above. since then we are stuck with byte addressable memory. now it is impossible to change due to various compatibility issues and the fact that OPCODES are a byte long only. but using a trick, memory is easily made word-addressable to fetch/store data/addresses!
Processors actually do access memory in quantities of 64-bit (x86 did since Pentium or so); 64-bit processors often have a 128-bit bus. Plus, in accessing main memory, you have bursts that fill an entire cache line, which is even larger units of memory.
It's only the addressing that is byte-based; this adds little overhead and is not excessive at all.
Today, you absolutely need byte-based addressing for networking protocols. Implementing TCP with word-based addressing would be difficult: what do you want read() to return if what you received where 17 bytes? Likewise, higher layers are byte-based: HTTP would be fairly difficult to implement if you get a request line like "GET / HTTP/1.0" be presented in units of four bytes. You essentially would have to split the words back into bytes with shift operations and such (which now the processors do in hardware, thanks to byte-based addressing).
Largely historical reasons - it has become the standard that CPUs understand. Here is a good discussion on it:
Generally, a size has to be chosen to be convenient for both data and machine instructions. 8 bits (256 values) is enough to accommodate common characters in English and some other languages. Designers of 8-bit processors presumably found that being able to encode 256 common instructions as one byte was a "reasonable tradeoff". And at the time, 8 bits was also generally enough to encode other things such as a pixel colour or screen coordinate. Having a byte size that is a power of 2 may also have been felt to be a "neater" design. It is interesting to note that, for example, Marxer, E. (1974), Elements of Data Processing, describes a byte as being either 6-bit and 8-bit depending on whether the computer was of the "octal" or "hexadecimal" type.
Certainly, other sizes were used in the early days.