I have this fragment of code that reads arithmetic expressions like 1 + 2 * 3
into integers and characters:
int main() {
int d, flag = 0;
When you enter
1 + 2
The first number is scanned. When scanf
tries to scan the second number, it starts by scanning +
(which is a valid start for a number with unary +), but after +
stumbles on [space]
: failure
[space]
isn't consumed, but +
is, even if the scan failed. Which explains why the +
and -
chars alone are consumed but not seen.
With *
, *
isn't consumed because a number cannot start by *
The fact that you have something different on MinGW is a mystery to me, I'm on MinGW and I get the "wrong" behaviour you described. BUT my hypothesis is that the standard library that you're using is "smarter" than standard implementations, and puts back the +
or -
when it finds it, so it can be properly read by getchar
afterwards.
I suggest that you try compiling your code using -D__USE_MINGW_ANSI_STDIO=1
to make sure that gcc
doesn't use Microsoft scanf
implementation, and you should get the "buggy" behaviour again (I'm not sure that there's a standard for parsing botched numbers BTW)
Your scanf
approach is indeed doomed because:
1 +2
(without space), then the sign won't be read either, because it's part of the second number.The best way here is to use a custom lexer reading char by char.
'+2' was scanned to 'd' because of scanf("%d", &d)). You can try with '1-2' and see that '-' also can't display.
Jean-François Fabre correctly explained what actually happens, but my opinion is that it is simply unspecified by the standard what should happen in that case.
Draft n1570 for C11 says at 7.21.6.2 The fscanf function
12 The conversion specifiers and their meanings are:
d Matches an optionally signed decimal integer, whose format is the same as expected for the subject sequence of the strtol function with the value 10 for the base argument. The corresponding argument shall be a pointer to signed integer.
...
and strtol
is described in 7.22.1.4 The strtol, strtoll, strtoul, and strtoull functions (emphasize mine)
Description
2 The strtol, strtoll, strtoul, and strtoull functions convert the initial portion of the string pointed to by nptr to long int, long long int, unsigned long int, and unsigned long long int representation, respectively. First, they decompose the input string into three parts: an initial, possibly empty, sequence of white-space characters (as specified by the isspace function), a subject sequence resembling an integer represented in some radix determined by the value of base, and a final string of one or more unrecognized characters, including the terminating null character of the input string. Then, they attempt to convert the subject sequence to an integer, and return the result.
I could not find anywhere in the standard what exactly could resemble a decimal integer. It is clear that positive and negative number do, and that numbers prefixed with a plus sign (+
) also do. But it is not specified whether the plus and minus signs (+
and -
) alone do resemble a decimal integer or not.
If an implementation decides that they do, a %d
specifier will eat alone +
and -
signs, if it decides that they do not, it will leave them in the stream.