I read into C11 standard this:
Input white-space characters (as specified by the isspace function) are skipped, unless the specification includes a [,
%s
consumes everything until a whitespace character and discards leading whitespace characters not trailing ones. The [
conversion specifier in the second scanf
does not skip leading whitespace characters and therefore, fails to scan because of the newline character(which is a whitespace character) left over by the first scanf
.
To fix the issue, either use
int c;
while((c=getchar())!='\n' && c!=EOF);
After the first scanf
to clear the stdin
or add a space before the format specifier(%[
) in the second scanf
.
Your excerpt from the standard omits important context. The preceding text specifies that skipping whitespace is the first step in processing a conversion specifier for a type other than c
, [
, or n
.
The next step, other than for an n
specifier, is to read an input item, which is defined as "the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence" (quoted from C99, but equivalent applies to C2011).
An s
item "[m]atches a sequence of non-white-space characters", so with the input you specify, the first scanf()
reads everything up to, but not including, the newline.
The standard explicitly specifies
Trailing white space (including new-line characters) is left unread unless matched by a directive.
so the newline definitely remains unscanned at this point.
The format given to the next scanf()
starts with a %[
conversion specifier, which, as you already observed, does not cause whitespace (leading or otherwise) to be skipped, though it can include whitespace in the item that is scanned. Since the next character available from the input is a newline, however, and the given scan set for your %[
does not include that character, zero characters are scanned for that item. Going back to the standard (C99, again):
If the length of the input item is zero, the execution of the directive fails; this condition is a matching failure unless end-of-file, an encoding error, or a read error prevented input from the stream, in which case it is an input failure.
There are easier ways to read free-form input line by line, but you can do it with scanf()
if you must. For example:
char buff[10 + 1] = {0};
printf("Input: ");
/*
* Ignore leading whitespace and scan a string of up to 10 non-whitespace
* characters. Zero-length inputs will produce a matching failure, leaving
* the buffer unchanged (and initialized to an empty string). End of
* input will produce an input error, which is ignored.
*/
scanf("%10s", buff);
/* Scan and ignore anything else up to a newline. There will
* be an (ignorable) matching failure if the next available character is a
* newline. Any input error generated by this call is also ignored.
*/
scanf("%*[^\n]");
/*
* Consume the next character, if any. If there is one, it will be a
* newline. An input error will occur if we're already at the end of stdin;
* a careful program would test for that (by comparing the return value to
* EOF) but this one doesn't.
*/
scanf("%*c");
printf("Input: ");
/* scan the second string; again, we're ignoring matching and input errors */
char buff_2[5 + 1] = {0};
scanf("%5[abcde]", buff_2);
If you're exclusively using scanf()
for such a job then it is essential to read each line in three steps, as shown, because each one can produce a matching failure that would prevent any attempt to match subsequent items.
Note, too, how maximum field widths are matched to buffer sizes in that example, which your original code did not do correctly.
So also %s doesn't discard a new line?
%s
tells scanf
to discard any leading whitespace, including newlines. It will then read any non-whitespace characters, leaving any trailing whitespace in the input buffer.
So assuming your input stream looks like "\n\ntest\n"
, scanf("%s", buf)
will discard the two leading newlines, consume the string "test"
, and leave the trailing newline in the input stream, so after the call the input stream looks like "\n"
.
Edit
Responding to xdevel2000's comment here.
Let's talk about how conversion specifiers work. Here are some relevant paragraphs from the online C 2011 standard:
7.21.6.2 The fscanf function
...
9 An input item is read from the stream, unless the specification includes an n specifier. An input item is defined as the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence.285) The first character, if any, after the input item remains unread. If the length of the input item is zero, the execution of the directive fails; this condition is a matching failure unless end-of-file, an encoding error, or a read error prevented input from the stream, in which case it is an input failure.
10 Except in the case of a%
specifier, the input item (or, in the case of a%n
directive, the count of input characters) is converted to a type appropriate to the conversion specifier. If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. Unless assignment suppression was indicated by a*
, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.
12 The conversion specifiers and their meanings are:
...
c
Matches a sequence of characters of exactly the number specified by the field width (1 if no field width is present in the directive).286)
...
s
Matches a sequence of non-white-space characters.286)
...
[
Matches a nonempty sequence of characters from a set of expected characters (the scanset).286)
...
285)fscanf
pushes back at most one input character onto the input stream. Therefore, some sequences that are acceptable tostrtod
,strtol
, etc., are unacceptable tofscanf
.
286) No special provisions are made for multibyte characters in the matching rules used by thec
,s
, and[
conversion specifiers — the extent of the input field is determined on a byte-by-byte basis. The resulting field is nevertheless a sequence of multibyte characters that begins in the initial shift state.
%s
matches a sequence of non-whitespace characters. Here's a basic algorithm describing how it works (not taking into account end of file or other exceptional conditions):
c <- next character from input stream
while c is whitespace
c <- next character from input stream
while c is not whitespace
append c to target buffer
c <- next character from input stream
push c back onto input stream
append 0 terminator to target buffer
The first whitespace character after the non-whitespace characters (if any) is pushed back onto the input stream for the next input operation to read.
By contrast, the algorithm for the %c
specifier is dead simple (unless you're using a field width greater than 1, which I've never done and won't get into here):
c <- next character from input stream
write c to target
The algorithm for the %[
conversion specifier is a little different:
c <- next character from input stream
while c is in the list of characters in the scan set
append c to target buffer
c <- next character from input stream
append 0 to target buffer
push c back onto input stream
So, it's a mistake to describe any conversion specifier as "retaining" trailing whitespace (which would imply that the trailing whitespace is saved to the target buffer); that's not the case. Trailing whitespace is left in the input stream for the next input operation to read.