We have 8-bit, 16-bit, 32-bit and 64-bit hardware architectures and operating systems. But not, say, 42-bit or 69-bit ones.
Why? Is it something fundamental that makes 2
Related, but possibly not the reason, I heard that the convention of 8 bits in a byte is because it's how IBM rigged up the IBM System/360 architecture.
A common reason is that you can number your bits in binary. This comes in useful in quite a few situations. For instance, in bitshift or rotate operations. You can rotate a 16 bit value over 0 to 15 bits. An attempt to rotate over 16 bits is also trivial: that's equivalent to a rotation over 0 bits. And a rotation over 1027 bits is equal to a rotation over 3 bits. In general, a rotation of a register of width W over N bits equals a rotation over N modulo W, and the operation "modulo W" is trivial when W is a power of 2.
Byte is related to encoding of characters mostly of the western world hence 8 bit. Word is not related to encoding it related to width of address hence it is varied from 4 to 80 etc etc
Many (most?) early pre-microprocessor CPUs have some number of bits per word that are not a power of two.
In particular, Seymour Cray and his team built many highly influential machines with non-power-of-two word sizes and address sizes -- 12 bit, 48 bit, 60 bit, etc.
A surprisingly large number of early computers had 36-bit words, entirely due to the fact that humans have 10 fingers. The Wikipedia "36-bit" article has more details on the relationship between 10 fingers and 36 bits, and links to articles on many other historically important but no longer popular bit sizes, most of them not a power of two.
I speculate that
(a) 8 bit addressable memory became popular because it was slightly more convenient for storing 7-bit ASCII and 4 bit BCD, without either awkward packing or wasting multiple bits per character; and no other memory width had any great advantage.
(b) As Stephen C. Steel points out, that slight advantage is multiplied by economies of scale and market forces -- more 8-bit-wide memories are used, and so economies of scale make them slightly cheaper, leading to even more 8-bit-wide memories being used in new designs, etc.
(c) Wider bus widths in theory made a CPU faster, but putting the entire CPU on a single chip made it vastly cheaper and perhaps slightly faster than any previous multi-part CPU system of any bus width. At first there were barely enough transistors for a 4 bit CPU, then a 8 bit CPU. Later, there were barely enough transistors for a 16 bit CPU, to a huge fanfare and "16 bit" marketing campaign. Right around the time one would expect a 24 bit CPU ...
(d) the RISC revolution struck. The first two RISC chips were 32 bits, for whatever reason, and people had been conditioned to think that "more bits are better", so every manufacturer jumped on the 32 bit bandwagon. Also, IEEE 754-1985 was standardized with 32-bit and 64-bit floating point numbers. There were some 24 bit CPUs, but most people have never heard of them.
(e) For software compatibility reasons, manufacturers maintained the illusion of a 32-bit databus even on processors with a 64 bit front-side bus (such the Intel Pentium and the AMD K5, etc.) or on motherboards with a 4 bit wide bus (LPC bus).
My trusty old HP 32S calculator was 12-bits.
That's mostly a matter of tradition. It is not even always true. For example, floating-point units in processors (even contemporary ones) have 80-bits registers. And there's nothing that would force us to have 8-bit bytes instead of 13-bit bytes.
Sometimes this has mathematical reasoning. For example, if you decide to have an N bits byte and want to do integer multiplication you need exactly 2N bits to store the results. Then you also want to add/subtract/multiply those 2N-bits integers and now you need 2N-bits general-purpose registers for storing the addition/subtraction results and 4N-bits registers for storing the multiplication results.