Implementations might differ between the actual sizes of types, but on most, types like unsigned int and float are always 4 bytes. But why does a type always occupy a certai
It is an optimization and simplification.
You can either have fixed sized objects. Thus storing the value.
Or you can have variable sized objets. But storing value and size.
The code that manipulates number does not need to worry about size. You assume that you always use 4 bytes and make the code very simple.
The code the manipulates number must understand when reading a variable that it must read the value and size. Use the size to make sure all the high bits are zero out in the register.
When place the value back in memory if the value has not exceeded its current size then simply place the value back in memory. But if the value has shrunk or grown you need to move the storage location of the object to another location in memory to make sure it does not overflow. Now you have to track the position of that number (as it can move if it grows too large for its size). You also need to track all the unused variable locations so they can potentially be reused.
The code generated for fixed size objects is a lot simpler.
Compression uses the fact that 255 will fit into one byte. There are compression schemes for storing large data sets that will actively use different size values for different numbers. But since this is not live data you don't have the complexities described above. You use less space to store the data at a cost of compressing/de-compressing the data for storage.
Something simple which most answers seem to miss:
Being able to work out a type's size at compile time allows a huge number of simplifying assumptions to be made by the compiler and the programmer, which bring a lot of benefits, particularly with regards to performance. Of course, fixed-size types have concomitant pitfalls like integer overflow. This is why different languages make different design decisions. (For instance, Python integers are essentially variable-size.)
Probably the main reason C++ leans so strongly to fixed-size types is its goal of C compatibility. However, since C++ is a statically-typed language which tries to generate very efficient code, and avoids adding things not explicitly specified by the programmer, fixed-size types still make a lot of sense.
So why did C opt for fixed-size types in the first place? Simple. It was designed to write '70s-era operating systems, server software, and utilities; things which provided infrastructure (such as memory management) for other software. At such a low level, performance is critical, and so is the compiler doing precisely what you tell it to.
I like Sergey's house analogy, but I think a car analogy would be better.
Imagine variable types as types of cars and people as data. When we're looking for a new car, we choose the one that fits our purpose best. Do we want a small smart car that can only fit one or two people? Or a limousine to carry more people? Both have their benefits and drawbacks like speed and gas mileage (think speed and memory usage).
If you have a limousine and you're driving alone, it's not going to shrink to fit only you. To do that, you'd have to sell the car (read: deallocate) and buy a new smaller one for yourself.
Continuing the analogy, you can think of memory as a huge parking lot filled with cars, and when you go to read, a specialized chauffeur trained solely for your type of car goes to fetch it for you. If your car could change types depending on the people inside it, you would need to bring a whole host of chauffeurs every time you wanted to get your car since they would never know what kind of car will be sitting in the spot.
In other words, trying to determine how much memory you need to read at run time would be hugely inefficient and outweigh the fact that you could maybe fit a few more cars in your parking lot.
Because it would be very complicated and computation heavy to have simple types with dynamic sizes. I'm not sure it this would be even possible.
Computer would have to check how many bits the number takes after every change of its value. It would be quite a lot additional operations.
And it would be much harder to perform calculations when you don't know sizes of variables during the compilation.
To support dynamic sizes of variables, computer would actually have to remember how many bytes a variable has right now which ... would require additional memory to store that information. And this information would have to be analyzed before every operation on the variable to choose the right processor instruction.
To better understands how computer works and why variables has constant sizes, learn basics of assembler language.
Although, I suppose it would be possible to achieve something like that with constexpr values. However, this would make the code less predictable for a programmer. I suppose that some compiler optimizations may do something like that but they hide it from a programmer to keep things simple.
I described here only the problems that concerns performance of a program. I omitted all problems that would have to be solved to save memory by reducing sizes of variables. Honestly, I don't think that it is even possible.
In conclusion, using smaller variables than declared has sense only if their values are known during the compilation. It is quite probable that modern compilers do that. In other cases it would cause too many hard or even unsolvable problems.
The short answer is: Because the C++ standard says so.
The long answer is: What you can do on a computer is ultimately limited by hardware. It is, of course, possible to encode an integer into a variable number of bytes for storage, but then reading it would either require special CPU instructions to be performant, or you could implement it in software, but then it would be awfully slow. Fixed-size operations are available in the CPU for loading values of predefined widths, there are none for variable widths.
Another point to consider is how computer memory works. Let's say your integer type could take up anywhere between 1 to 4 bytes of storage. Suppose you store the value 42 into your integer: it takes up 1 byte, and you place it at memory address X. Then you store your next variable at location X+1 (I'm not considering alignment at this point) and so on. Later you decide to change your value to 6424.
But this doesn't fit into a single byte! So what do you do? Where do you put the rest? You already have something at X+1, so can't place it there. Somewhere else? How will you know later where? Computer memory does not support insert semantics: you can't just place something at a location and push everything after it aside to make room!
Aside: What you're talking about is really the area of data compression. Compression algorithms exist to pack everything tighter, so at least some of them will consider not using more space for your integer than it needs. However, compressed data is not easy to modify (if possible at all) and just ends up being recompressed every time you make any changes to it.
Then
myInt
would occupy 4 bytes with my compiler. However, the actual value,255
can be represented with only 1 byte, so why wouldmyInt
not just occupy 1 byte of memory?
This is known as variable-length encoding, there are various encodings defined, for example VLQ. One of the most famous, however, is probably UTF-8: UTF-8 encodes code points on a variable number of bytes, from 1 to 4.
Or the more generalized way of asking: Why does a type have only one size associated with it when the space required to represent the value might be smaller than that size?
As always in engineering, it's all about trade-offs. There is no solution which has only advantages, so you have to balance advantages and trade-offs when designing your solution.
The design which was settled on was to use fixed-size fundamental types, and the hardware/languages just flew down from there.
So, what is the fundamental weakness of variable encoding, which caused it to be rejected in favor of more memory hungry schemes? No Random Addressing.
What is the index of the byte at which the 4th code point starts in a UTF-8 string?
It depends on the values of the previous code points, a linear scan is required.
Surely there are variable-length encoding schemes which are better at random-addressing?
Yes, but they are also more complicated. If there's an ideal one, I've never seen it yet.
Does Random Addressing really matters anyway?
Oh YES!
The thing is, any kind of aggregate/array relies on fixed-size types:
struct
? Random Addressing!Which means you essentially have the following trade-off:
Fixed size types OR Linear memory scans