On what exactly does the size of a primitive data type like int
depend on?
I think there are two parts to this question:
What sizes primitive types are allowed to be.
This is specified by the C and C++ standards: the types have allowed minimum value ranges they must have, which implicitly places a lower bound on their size in bits (e.g. long
must be at least 32 bit to comply with the standard).
The standards do not specify the size in bytes, because the definition of the byte is up to the implementation, e.g. char
is byte, but byte size (CHAR_BIT
macro) may be 16 bit.
The actual size as defined by the implementation.
This, as other answers have already pointed out, is dependent on the implementation: the compiler. And the compiler implementation, in turn, is heavily influenced by the target architecture. So it's plausible to have two compilers running on the same OS and architecture, but having different size of int
. The only assumption you can make is the one stated by the standard (given that the compiler implements it).
There also may be additional ABI requirements (e.g. fixed size of enums).
First of all, it depends on Compiler. Compiler in turns usually depends on the architecture, processor, development environment etc because it takes them into account. So you may say it's a combination of all. But I would NOT say that. I would say, Compiler, since on the same machine you may have different sizes of POD and built-in types, if you use different compilers. Also note that your source code is input to the compiler, so it's the compiler which makes final decision of the sizes of POD and built-in types. However, it's also true that this decision is influenced by the underlying architecture of the target machine. After all, the real useful compiler has to emit efficient code that eventually runs on the machine you target.
Compilers provides options
too. Few of them might effect sizes also!
Size of char
, signed char
and unsigned char
is defined by C++ Standard itself! Sizes of all other types are defined by the compiler.
C++03 Standard $5.3.3/1 says,
sizeof(char), sizeof(signed char) and sizeof(unsigned char) are 1; the result of sizeof applied to any other fundamental type (3.9.1) is implementation-defined. [Note: in particular,sizeof(bool) and sizeof(wchar_t) are implementation-defined.69)
C99 Standard ($6.5.3.4) also itself defines the size of char
, signed char
and unsigned char
to be 1, but leaves the size of other types to be defined by the compiler!
EDIT:
I found this C++ FAQ chapter really good. The entire chapter. It's very tiny chapter though. :-)
http://www.parashift.com/c++-faq-lite/intrinsic-types.html
Also read the comments below, there are some good arguments!
If you're asking about the size of a primitive type like int
, I'd say it depends on the factor you cited.
The compiler/environment couple (where environment often means OS) is surely a part of it, since the compiler can map the various "sensible" sizes on the builtin types in different ways for various reasons: for example, compilers on x86_64 Windows will usually have a 32 bit long
and a 64 bit long long
to avoid breaking code thought for plain x86; on x86_64 Linux, instead, long
is usually 64 bit because it's a more "natural" choice and apps developed for Linux are generally more architecture-neutral (because Linux runs on a much greater variety of architectures).
The processor surely matters in the decision: int
should be the "natural size" of the processor, usually the size of the general-purpose registers of the processor. This means that it's the type that will work faster on the current architecture. long
instead is often thought as a type which trades performance for an extended range (this is rarely true on regular PCs, but on microcontrollers it's normal).
If in instead you're also talking about struct
s & co. (which, if they respect some rules, are POD
), again the compiler and the processor influence their size, since they are made of builtin types and of the appropriate padding chosen by the compiler to achieve the best performance on the target architecture.
It depends on the implementation (compiler).
Implementation-defined behavior means unspecified behavior where each implementation documents how the choice is made.
As I commented under @Nawaz's answer, it technically depends solely on the compiler.
The compiler is just tasked with taking valid C++ code, and outputting valid machine code (or whatever language it targets).
So a C++ compiler could decide to make an int
have a size of 15, and require it to be aligned on 5-byte boundaries, and it could decide to insert arbitrary padding between the variables in a POD. Nothing in the standard prohibits this, and it could still generate working code.
It'd just be much slower.
So in practice, compilers take some hints from the system they're running on, in two ways:
- the CPU has certain preferences: for example, it may have 32-bit wide registers, so making an int
32 bits wide would be a good idea, and it usually requires variables to be naturally aligned (a 4-byte wide variable must be aligned on an address divisible by 4, for example), so a sensible compiler respects these preferences because it yields faster code.
- the OS may have some influence too, in that if it uses another ABI than the compiler, making system calls is going to be needlessly difficult.
But those are just practical considerations to make life a bit easier for the programmer or to generate faster code. They're not required.
The compiler has the final word, and it can choose to completely ignore both the CPU and the OS. As long as it generates a working executable with the semantics specified in the C++ standard.
A struct
can also be POD, in which case you can explicity control potential padding between members with #pragma pack
on some compilers.