The purpose of a pointer is to save the address of a specific variable. Then the memory structure of following code should look like:
int a = 5;
int *b = &a;
Why does this type of code generate a warning?
int a = 5; int *b = &a; int *c = &b;
The &
operator yields a pointer to the object, that is &a
is of type int *
so assigning (through initialization) it to b
which is also of type int *
is valid. &b
yields a pointer to object b
, that is &b
is of type pointer to int *
, i .e., int **
.
C says in the constraints of the assignment operator (which hold for the initialization) that (C11, 6.5.16.1p1): "both operands are pointers to qualified or unqualified versions of compatible types". But in the C definition of what is a compatible type int **
and int *
are not compatible types.
So there is a constraint violation in the int *c = &b;
initialization which means a diagnostic is required by the compiler.
One of the rationale of the rule here is there is no guarantee by the Standard that the two different pointer types are the same size (except for void *
and the character pointer types), that is sizeof (int *)
and sizeof (int **)
can be different values.
If the purpose of pointer is just to save the memory address, I think there should be no hierarchy if the address we are going to save refers variable, pointer, double pointer, ... etc. so below type of code should be valid.
Well that's true for the machine (after all roughly everything is a number). But in many languages variables are typed, means that the compiler can then ensure that you use them correctly (types impose a correct context on variables)
It is true that a pointer to pointer and a pointer (probably) use the same amount of memory to store their value (beware this is not true for int and pointer to int, the size of an address is not related to the size of a house).
So if you have an address of an address you should use as is and not as a simple address because if you access the pointer to pointer as a simple pointer, then you would be able to manipulate an address of int as if it is a int, which is not (replace int without anything else and you should see the danger). You may be confused because all of this are numbers, but in everyday life you don't: I personally make a big difference in $1 and 1 dog. dog and $ are types, you know what you can do with them.
You can program in assembly and make what you want, but you will observe how dangerous it is, because you can do almost what you want, especially weird things. Yes modifying an address value is dangerous, suppose you have an autonomous car that should deliver something at an address expressed in distance: 1200 memory street (address) and suppose in that street houses are separated by 100ft (1221 is a non valid address), if you are able to manipulate addresses as you like as integer, you would be able to try to deliver at 1223 and let the packet in the middle of the pavement.
Another example could be, house, address of the house, entry number in an address book of that address. All of these three are different concepts, different types...
The C language is strongly typed. This means that, for every address, there is a type, which tells the compiler how to interpret the value at that address.
In your example:
int a = 5;
int *b = &a;
The type of a
is int
, and the type of b
is int *
(read as "pointer to int
"). Using your example, the memory would contain:
..... memory address ...... value ........ type
a ... 0x00000002 .......... 5 ............ int
b ... 0x00000010 .......... 0x00000002 ... int*
The type is not actually stored in memory, it's just that the compiler knows that, when you read a
you'll find an int
, and when you read b
you'll find the address of a place where you can find an int
.
In your second example:
int a = 5;
int *b = &a;
int **c = &b;
The type of c
is int **
, read as "pointer to pointer to int
". It means that, for the compiler:
c
is a pointer;c
, you get the address of another pointer;int
.That is,
c
is a pointer (int **
);*c
is also a pointer (int *
);**c
is an int
.And the memory would contain:
..... memory address ...... value ........ type
a ... 0x00000002 .......... 5 ............ int
b ... 0x00000010 .......... 0x00000002 ... int*
c ... 0x00000020 .......... 0x00000010 ... int**
Since the "type" is not stored together with the value, and a pointer can point to any memory address, the way the compiler knows the type of the value at an address is basically by taking the pointer's type, and removing the rightmost *
.
By the way, that's for a common 32-bit architecture. For most 64-bit architectures, you'll have:
..... memory address .............. value ................ type
a ... 0x0000000000000002 .......... 5 .................... int
b ... 0x0000000000000010 .......... 0x0000000000000002 ... int*
c ... 0x0000000000000020 .......... 0x0000000000000010 ... int**
Addresses are now 8 bytes each, while an int
is still only 4 bytes. Since the compiler knows the type of each variable, it can easily deal with this difference, and read 8 bytes for a pointer and 4 bytes for the int
.
That would be because any pointer T*
is actually of type pointer to a T
(or address of a T
), where T
is the pointed-to type. In this case, *
can be read as pointer to a(n)
, and T
is the pointed-to type.
int x; // Holds an integer.
// Is type "int".
// Not a pointer; T is nonexistent.
int *px; // Holds the address of an integer.
// Is type "pointer to an int".
// T is: int
int **pxx; // Holds the address of a pointer to an integer.
// Is type "pointer to a pointer to an int".
// T is: int*
This is used for dereferencing purposes, where the dereference operator will take a T*
, and return a value whose type is T
. The return type can be seen as truncating the leftmost "pointer to a(n)", and being whatever's left over.
*x; // Invalid: x isn't a pointer.
// Even if a compiler allows it, this is a bad idea.
*px; // Valid: px is "pointer to int".
// Return type is: int
// Truncates leftmost "pointer to" part, and returns an "int".
*pxx; // Valid: pxx is "pointer to pointer to int".
// Return type is: int*
// Truncates leftmost "pointer to" part, and returns a "pointer to int".
Note how for each of the above operations, the dereference operator's return type matches the original T*
declaration's T
type.
This greatly aids both primitive compilers and programmers in parsing a pointer's type: For a compiler, the address-of operator adds a *
to the type, the dereference operator removes a *
from the type, and any mismatch is an error. For a programmer, the number of *
s is a direct indication of how many levels of indirection you're dealing with (int*
always points to int
, float**
always points to float*
which in turn always points to float
, etc.).
Now, taking this into consideration, there are two major issues with only using a single *
regardless of the number of levels of indirection:
In both cases, the only way to determine the value's actual type would be to backtrack it, forcing you to look somewhere else to find it.
void f(int* pi);
int main() {
int x;
int *px = &x;
int *ppx = &px;
int *pppx = &ppx;
f(pppx);
}
// Ten million lines later...
void f(int* pi) {
int i = *pi; // Well, we're boned.
// To see what's wrong, see main().
}
This... is a very dangerous problem, and one that is easily solved by having the number of *
s directly represent the level of indirection.
If the purpose of pointer is just to save the memory address, I think there should be no hierarchy if the address we are going to save refers variable, pointer, double pointer, ... etc
At runtime, yes, a pointer just holds an address. But at compile time there is also a type associated with every variable. As the others have said, int*
and int**
are two different, incompatible types.
There is one type, void*
, that does what you want: It stores only an address, you can assign any address to it:
int a = 5;
int *b = &a;
void *c = &b;
But when you want to dereference a void*
, you need to supply the 'missing' type information yourself:
int a2 = **((int**)c);
I think there should be no hierarchy if the address we are going to save refers variable, pointer, double pointer
Without the "hierarchy" it would be very easy to generate UB all over without any warnings - that would be horrible.
Consider this:
char c = 'a';
char* pc = &c;
char** ppc = &pc;
printf("%c\n", **ppc); // compiles ok and is valid
printf("%c\n", **pc); // error: invalid type argument of unary ‘*’
The compiler gives me an error and thereby it helps me to know that I have done something wrong and I can correct the bug.
But without "hierarchy", like:
char c = 'a';
char* pc = &c;
char* ppc = &pc;
printf("%c\n", **ppc); // compiles ok and is valid
printf("%c\n", **pc); // compiles ok but is invalid
The compiler can't give any error as there are no "hierarchy".
But when the line:
printf("%c\n", **pc);
executes, it is UB (undefined behavior).
First *pc
reads the char
as if it was a pointer, i.e. probably reads 4 or 8 bytes even though we only reserved 1 byte. That is UB.
If the program didn't crash due to the UB above but just returned some garbish value, the second step would be to dereference the garbish value. Once again UB.
Conclusion
The type system helps us to detect bugs by seeing int*, int**, int***, etc as different types.