Consider the following code snippet
#include
#define A -B
#define B -C
#define C 5
int main()
{
printf(\"The value of A is %d\\n\", A);
r
Pass it the -E option (Ex: gcc -E a.c
). This will output preprocessed source code.
int main()
{
printf("The value of A is %d\n", - -5);
return 0;
}
So it will introduce a space between -
and -5
hence it will be not considered as an decrement operator --
, so printf
will print 5.
GCC Documentation On Token Spacing provides the Information on Why There is an Extra Space Produced:
First, consider an issue that only concerns the stand-alone preprocessor: there needs to be a guarantee that re-reading its preprocessed output results in an identical token stream. Without taking special measures, this might not be the case because of macro substitution. For example:
#define PLUS +
#define EMPTY
#define f(x) =x=
+PLUS -EMPTY- PLUS+ f(=)
==> + + - - + + = = =
not
==> ++ -- ++ ===
One solution would be to simply insert a space between all adjacent tokens. However, we would like to keep space insertion to a minimum, both for aesthetic reasons and because it causes problems for people who still try to abuse the preprocessor for things like Fortran source and Makefiles.
For now, just notice that when tokens are added (or removed, as shown by the EMPTY example) from the original lexed token stream, we need to check for accidental token pasting. We call this paste avoidance. Token addition and removal can only occur because of macro expansion, but accidental pasting can occur in many places: both before and after each macro replacement, each argument replacement, and additionally each token created by the
#
and##
operators.
The preprocessor introduces a space in-between the expansion of B
and C
:
#define A -B
#define B -C
#define C 5
A
with output (generated via cpp < test.c
)
# 1 "test.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 329 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "test.c" 2
- -5
In C language the program source code is split into so called preprocessing tokens at a very early stage of translation (phase 3), before macro substitution takes place (phase 4). Later (at phase 7) preprocessing tokens are converted into regular tokens which are fed into syntactic and semantic analyzer of the compiler proper (see "5.1.1.2 Translation phases" in the language specification).
Phase 3 is the stage when the preprocessing tokens for future C language operators and other lexical elements are formed (identifiers, numbers, punctuators, string literals etc.) Multi-character punctuators like --
, >>=
and so on are formed at that early stage. In order to eventually obtain a token for --
operator at phase 7 you need to have that --
early as a complete punctuator at phase 3. No additional punctuator concatenation occurs when transitioning from preprocessing tokens to regular tokens at phase 7, which means that two adjacent -
punctuators detected at phase 3 will NOT become a single token --
at phase 7. The compiler proper will never have a chance to see these two adjacent -
and a single token --
.
In other words, in C you cannot use preprocessor to concatenate things by placing them next to each other. This is why preprocessor has dedicated features like ##
to facilitate concatenation. And ##
is what you have to use to perform concatenation of two tokens into a single token.
BTW, it is not correct to explain this behavior by claiming that preprocessor will place a space character between your -
characters. Nothing like that is present in the language specification. What really happens is that in the internal structures of the compiler your -
tokens forever remain as two separate tokens. How preprocessor and compiler achieve that is their internal implementation detail. In implementations with loosely coupled preprocessor and compiler proper (e.g. completely independent modules that communicate through an intermediate textual representation) injecting a space between adjacent punctuators is defintely a natural way to implement the required separation of tokens.
I do not think so. Even macro expansion is text processing, it is impossible to create a token from across macro boundaries. Therefore it as -(-5)
, not --5
, because --
is a single token.