I\'ve recently decided that I just have to finally learn C/C++, and there is one thing I do not really understand about pointers or more precisely, their definition.
Use the "Clockwise Spiral Rule" to help parse C/C++ declarations;
There are three simple steps to follow:
Starting with the unknown element, move in a spiral/clockwise direction; when encountering the following elements replace them with the corresponding english statements:
[X]
or[]
: Array X size of... or Array undefined size of...
(type1, type2)
: function passing type1 and type2 returning...
*
: pointer(s) to...- Keep doing this in a spiral/clockwise direction until all tokens have been covered.
- Always resolve anything in parenthesis first!
Also, declarations should be in separate statements when possible (which is true the vast majority of times).
In 4, 5 and 6, test
is always a pointer and test2
is not a pointer. White space is (almost) never significant in C++.
There are three pieces to this puzzle.
The first piece is that whitespace in C and C++ is normally not significant beyond separating adjacent tokens that are otherwise indistinguishable.
During the preprocessing stage, the source text is broken up into a sequence of tokens - identifiers, punctuators, numeric literals, string literals, etc. That sequence of tokens is later analyzed for syntax and meaning. The tokenizer is "greedy" and will build the longest valid token that's possible. If you write something like
inttest;
the tokenizer only sees two tokens - the identifier inttest
followed by the punctuator ;
. It doesn't recognize int
as a separate keyword at this stage (that happens later in the process). So, for the line to be read as a declaration of an integer named test
, we have to use whitespace to separate the identifier tokens:
int test;
The *
character is not part of any identifier; it's a separate token (punctuator) on its own. So if you write
int*test;
the compiler sees 4 separate tokens - int
, *
, test
, and ;
. Thus, whitespace is not significant in pointer declarations, and all of
int *test;
int* test;
int*test;
int * test;
are interpreted the same way.
The second piece to the puzzle is how declarations actually work in C and C++1. Declarations are broken up into two main pieces - a sequence of declaration specifiers (storage class specifiers, type specifiers, type qualifiers, etc.) followed by a comma-separated list of (possibly initialized) declarators. In the declaration
unsigned long int a[10]={0}, *p=NULL, f(void);
the declaration specifiers are unsigned long int
and the declarators are a[10]={0}
, *p=NULL
, and f(void)
. The declarator introduces the name of the thing being declared (a
, p
, and f
) along with information about that thing's array-ness, pointer-ness, and function-ness. A declarator may also have an associated initializer.
The type of a
is "10-element array of unsigned long int
". That type is fully specified by the combination of the declaration specifiers and the declarator, and the initial value is specified with the initializer ={0}
. Similarly, the type of p
is "pointer to unsigned long int
", and again that type is specified by the combination of the declaration specifiers and the declarator, and is initialized to NULL
. And the type of f
is "function returning unsigned long int
" by the same reasoning.
This is key - there is no "pointer-to" type specifier, just like there is no "array-of" type specifier, just like there is no "function-returning" type specifier. We can't declare an array as
int[10] a;
because the operand of the []
operator is a
, not int
. Similarly, in the declaration
int* p;
the operand of *
is p
, not int
. But because the indirection operator is unary and whitespace is not significant, the compiler won't complain if we write it this way. However, it is always interpreted as int (*p);
.
Therefore, if you write
int* p, q;
the operand of *
is p
, so it will be interpreted as
int (*p), q;
Thus, all of
int *test1, test2;
int* test1, test2;
int * test1, test2;
do the same thing - in all three cases, test1
is the operand of *
and thus has type "pointer to int
", while test2
has type int
.
Declarators can get arbitrarily complex. You can have arrays of pointers:
T *a[N];
you can have pointers to arrays:
T (*a)[N];
you can have functions returning pointers:
T *f(void);
you can have pointers to functions:
T (*f)(void);
you can have arrays of pointers to functions:
T (*a[N])(void);
you can have functions returning pointers to arrays:
T (*f(void))[N];
you can have functions returning pointers to arrays of pointers to functions returning pointers to T
:
T *(*(*f(void))[N])(void); // yes, it's eye-stabby. Welcome to C and C++.
and then you have signal
:
void (*signal(int, void (*)(int)))(int);
which reads as
signal -- signal
signal( ) -- is a function taking
signal( ) -- unnamed parameter
signal(int ) -- is an int
signal(int, ) -- unnamed parameter
signal(int, (*) ) -- is a pointer to
signal(int, (*)( )) -- a function taking
signal(int, (*)( )) -- unnamed parameter
signal(int, (*)(int)) -- is an int
signal(int, void (*)(int)) -- returning void
(*signal(int, void (*)(int))) -- returning a pointer to
(*signal(int, void (*)(int)))( ) -- a function taking
(*signal(int, void (*)(int)))( ) -- unnamed parameter
(*signal(int, void (*)(int)))(int) -- is an int
void (*signal(int, void (*)(int)))(int); -- returning void
and this just barely scratches the surface of what's possible. But notice that array-ness, pointer-ness, and function-ness are always part of the declarator, not the type specifier.
One thing to watch out for - const
can modify both the pointer type and the pointed-to type:
const int *p;
int const *p;
Both of the above declare p
as a pointer to a const int
object. You can write a new value to p
setting it to point to a different object:
const int x = 1;
const int y = 2;
const int *p = &x;
p = &y;
but you cannot write to the pointed-to object:
*p = 3; // constraint violation, the pointed-to object is const
However,
int * const p;
declares p
as a const
pointer to a non-const int
; you can write to the thing p
points to
int x = 1;
int y = 2;
int * const p = &x;
*p = 3;
but you can't set p
to point to a different object:
p = &y; // constraint violation, p is const
Which brings us to the third piece of the puzzle - why declarations are structured this way.
The intent is that the structure of a declaration should closely mirror the structure of an expression in the code ("declaration mimics use"). For example, let's suppose we have an array of pointers to int
named ap
, and we want to access the int
value pointed to by the i
'th element. We would access that value as follows:
printf( "%d", *ap[i] );
The expression *ap[i]
has type int
; thus, the declaration of ap
is written as
int *ap[N]; // ap is an array of pointer to int, fully specified by the combination
// of the type specifier and declarator
The declarator *ap[N]
has the same structure as the expression *ap[i]
. The operators *
and []
behave the same way in a declaration that they do in an expression - []
has higher precedence than unary *
, so the operand of *
is ap[N]
(it's parsed as *(ap[N])
).
As another example, suppose we have a pointer to an array of int
named pa
and we want to access the value of the i
'th element. We'd write that as
printf( "%d", (*pa)[i] );
The type of the expression (*pa)[i]
is int
, so the declaration is written as
int (*pa)[N];
Again, the same rules of precedence and associativity apply. In this case, we don't want to dereference the i
'th element of pa
, we want to access the i
'th element of what pa
points to, so we have to explicitly group the *
operator with pa
.
The *
, []
and ()
operators are all part of the expression in the code, so they are all part of the declarator in the declaration. The declarator tells you how to use the object in an expression. If you have a declaration like int *p;
, that tells you that the expression *p
in your code will yield an int
value. By extension, it tells you that the expression p
yields a value of type "pointer to int
", or int *
.
So, what about things like cast and sizeof
expressions, where we use things like (int *)
or sizeof (int [10])
or things like that? How do I read something like
void foo( int *, int (*)[10] );
There's no declarator, aren't the *
and []
operators modifying the type directly?
Well, no - there is still a declarator, just with an empty identifier (known as an abstract declarator). If we represent an empty identifier with the symbol λ, then we can read those things as (int *λ)
, sizeof (int λ[10])
, and
void foo( int *λ, int (*λ)[10] );
and they behave exactly like any other declaration. int *[10]
represents an array of 10 pointers, while int (*)[10]
represents a pointer to an array.
And now the opinionated portion of this answer. I am not fond of the C++ convention of declaring simple pointers as
T* p;
and consider it bad practice for the following reasons:
T* p, q;
, all the duplicates to those questions, etc.);T* a[N]
is asymmetrical with use (unless you're in the habit of writing * a[i]
);T* p
convention cleanly, which...no);In the end, it just indicates confused thinking about how the two languages' type systems work.
There are good reasons to declare items separately; working around a bad practice (T* p, q;
) isn't one of them. If you write your declarators correctly (T *p, q;
) you are less likely to cause confusion.
I consider it akin to deliberately writing all your simple for
loops as
i = 0;
for( ; i < N; )
{
...
i++
}
Syntactically valid, but confusing, and the intent is likely to be misinterpreted. However, the T* p;
convention is entrenched in the C++ community, and I use it in my own C++ code because consistency across the code base is a good thing, but it makes me itch every time I do it.
A good rule of thumb, a lot of people seem to grasp these concepts by: In C++ a lot of semantic meaning is derived by the left-binding of keywords or identifiers.
Take for example:
int const bla;
The const applies to the "int" word. The same is with pointers' asterisks, they apply to the keyword left of them. And the actual variable name? Yup, that's declared by what's left of it.
I would say that the initial convention was to put the star on the pointer name side (right side of the declaration
in the c programming language by Dennis M. Ritchie the stars are on the right side of the declaration.
by looking at the linux source code at https://github.com/torvalds/linux/blob/master/init/main.c we can see that the star is also on the right side.
You can follow the same rules, but it's not a big deal if you put stars on the type side. Remember that consistency is important, so always but the star on the same side regardless of which side you have choose.
#include <type_traits>
std::add_pointer<int>::type test, test2;