Why are types always a certain size no matter its value?

后端 未结 19 1836
谎友^
谎友^ 2021-01-30 15:22

Implementations might differ between the actual sizes of types, but on most, types like unsigned int and float are always 4 bytes. But why does a type always occupy a certai

19条回答
  •  生来不讨喜
    2021-01-30 16:00

    Because in a language like C++, a design goal is that simple operations compile down to simple machine instructions.

    All mainstream CPU instruction sets work with fixed-width types, and if you want to do variable-width types, you have to do multiple machine instructions to handle them.

    As for why the underlying computer hardware is that way: It's because it's simpler, and more efficient for many cases (but not all).

    Imagine the computer as a piece of tape:

    | xx | xx | xx | xx | xx | xx | xx | xx | xx | xx | xx | xx | xx | ...
    

    If you simply tell the computer to look at the first byte on the tape, xx, how does it know whether or not the type stops there, or proceeds on to the next byte? If you have a number like 255 (hexadecimal FF) or a number like 65535 (hexadecimal FFFF) the first byte is always FF.

    So how do you know? You have to add additional logic, and "overload" the meaning of at least one bit or byte value to indicate that the value continues to the next byte. That logic is never "free", either you emulate it in software or you add a bunch of additional transistors to the CPU to do it.

    The fixed-width types of languages like C and C++ reflect that.

    It doesn't have to be this way, and more abstract languages which are less concerned with mapping to maximally efficient code are free to use variable-width encodings (also known as "Variable Length Quantities" or VLQ) for numeric types.

    Further reading: If you search for "variable length quantity" you can find some examples of where that kind of encoding is actually efficient and worth the additional logic. It's usually when you need to store a huge amount of values which might be anywhere within a large range, but most values tend towards some small sub-range.


    Note that if a compiler can prove that it can get away with storing the value in a smaller amount of space without breaking any code (for example it's a variable only visible internally within a single translation unit), and its optimization heuristics suggest that it'll be more efficient on the target hardware, it's entirely allowed to optimize it accordingly and store it in a smaller amount of space, so long as the rest of the code works "as if" it did the standard thing.

    But, when the code has to inter-operate with other code that might be compiled separately, sizes have to stay consistent, or ensure that every piece of code follows the same convention.

    Because if it's not consistent, there's this complication: What if I have int x = 255; but then later in the code I do x = y? If int could be variable-width, the compiler would have to know ahead of time to pre-allocate the maximum amount of space it'll need. That's not always possible, because what if y is an argument passed in from another piece of code that's compiled separately?

提交回复
热议问题