Implementations might differ between the actual sizes of types, but on most, types like unsigned int and float are always 4 bytes. But why does a type always occupy a certai
Then
myInt
would occupy 4 bytes with my compiler. However, the actual value,255
can be represented with only 1 byte, so why wouldmyInt
not just occupy 1 byte of memory?
This is known as variable-length encoding, there are various encodings defined, for example VLQ. One of the most famous, however, is probably UTF-8: UTF-8 encodes code points on a variable number of bytes, from 1 to 4.
Or the more generalized way of asking: Why does a type have only one size associated with it when the space required to represent the value might be smaller than that size?
As always in engineering, it's all about trade-offs. There is no solution which has only advantages, so you have to balance advantages and trade-offs when designing your solution.
The design which was settled on was to use fixed-size fundamental types, and the hardware/languages just flew down from there.
So, what is the fundamental weakness of variable encoding, which caused it to be rejected in favor of more memory hungry schemes? No Random Addressing.
What is the index of the byte at which the 4th code point starts in a UTF-8 string?
It depends on the values of the previous code points, a linear scan is required.
Surely there are variable-length encoding schemes which are better at random-addressing?
Yes, but they are also more complicated. If there's an ideal one, I've never seen it yet.
Does Random Addressing really matters anyway?
Oh YES!
The thing is, any kind of aggregate/array relies on fixed-size types:
struct
? Random Addressing!Which means you essentially have the following trade-off:
Fixed size types OR Linear memory scans