The scanf function, the specifer %s and the new line

前端 未结 3 1346
终归单人心
终归单人心 2020-12-20 02:00

I read into C11 standard this:

Input white-space characters (as specified by the isspace function) are skipped, unless the specification includes a [,

相关标签:
3条回答
  • 2020-12-20 02:28

    %s consumes everything until a whitespace character and discards leading whitespace characters not trailing ones. The [ conversion specifier in the second scanf does not skip leading whitespace characters and therefore, fails to scan because of the newline character(which is a whitespace character) left over by the first scanf.

    To fix the issue, either use

    int c;
    while((c=getchar())!='\n' && c!=EOF);
    

    After the first scanf to clear the stdin or add a space before the format specifier(%[) in the second scanf.

    0 讨论(0)
  • 2020-12-20 02:37

    Your excerpt from the standard omits important context. The preceding text specifies that skipping whitespace is the first step in processing a conversion specifier for a type other than c, [, or n.

    The next step, other than for an n specifier, is to read an input item, which is defined as "the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence" (quoted from C99, but equivalent applies to C2011).

    An s item "[m]atches a sequence of non-white-space characters", so with the input you specify, the first scanf() reads everything up to, but not including, the newline.

    The standard explicitly specifies

    Trailing white space (including new-line characters) is left unread unless matched by a directive.

    so the newline definitely remains unscanned at this point.

    The format given to the next scanf() starts with a %[ conversion specifier, which, as you already observed, does not cause whitespace (leading or otherwise) to be skipped, though it can include whitespace in the item that is scanned. Since the next character available from the input is a newline, however, and the given scan set for your %[ does not include that character, zero characters are scanned for that item. Going back to the standard (C99, again):

    If the length of the input item is zero, the execution of the directive fails; this condition is a matching failure unless end-of-file, an encoding error, or a read error prevented input from the stream, in which case it is an input failure.

    There are easier ways to read free-form input line by line, but you can do it with scanf() if you must. For example:

    char buff[10 + 1] = {0};
    printf("Input: ");
    /*
     * Ignore leading whitespace and scan a string of up to 10 non-whitespace
     * characters.  Zero-length inputs will produce a matching failure, leaving
     * the buffer unchanged (and initialized to an empty string).  End of
     * input will produce an input error, which is ignored.
     */
    scanf("%10s", buff);
    
    /* Scan and ignore anything else up to a newline.  There will
     * be an (ignorable) matching failure if the next available character is a
     * newline.  Any input error generated by this call is also ignored.
     */
    scanf("%*[^\n]");
    
    /*
     * Consume the next character, if any.  If there is one, it will be a
     * newline.  An input error will occur if we're already at the end of stdin;
     * a careful program would test for that (by comparing the return value to
     * EOF) but this one doesn't.
     */
    scanf("%*c");
    
    printf("Input: ");
    
    /* scan the second string; again, we're ignoring matching and input errors */
    char buff_2[5 + 1] = {0};
    scanf("%5[abcde]", buff_2);
    

    If you're exclusively using scanf() for such a job then it is essential to read each line in three steps, as shown, because each one can produce a matching failure that would prevent any attempt to match subsequent items.

    Note, too, how maximum field widths are matched to buffer sizes in that example, which your original code did not do correctly.

    0 讨论(0)
  • 2020-12-20 02:46

    So also %s doesn't discard a new line?

    %s tells scanf to discard any leading whitespace, including newlines. It will then read any non-whitespace characters, leaving any trailing whitespace in the input buffer.

    So assuming your input stream looks like "\n\ntest\n", scanf("%s", buf) will discard the two leading newlines, consume the string "test", and leave the trailing newline in the input stream, so after the call the input stream looks like "\n".

    Edit

    Responding to xdevel2000's comment here.

    Let's talk about how conversion specifiers work. Here are some relevant paragraphs from the online C 2011 standard:

    7.21.6.2 The fscanf function
    ...
    9 An input item is read from the stream, unless the specification includes an n specifier. An input item is defined as the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence.285) The first character, if any, after the input item remains unread. If the length of the input item is zero, the execution of the directive fails; this condition is a matching failure unless end-of-file, an encoding error, or a read error prevented input from the stream, in which case it is an input failure.

    10 Except in the case of a % specifier, the input item (or, in the case of a %n directive, the count of input characters) is converted to a type appropriate to the conversion specifier. If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.

    12 The conversion specifiers and their meanings are:
    ...
    c Matches a sequence of characters of exactly the number specified by the field width (1 if no field width is present in the directive).286)
    ...
    s Matches a sequence of non-white-space characters.286)
    ...
    [ Matches a nonempty sequence of characters from a set of expected characters (the scanset).286)
    ...
    285) fscanf pushes back at most one input character onto the input stream. Therefore, some sequences that are acceptable to strtod, strtol, etc., are unacceptable to fscanf.

    286) No special provisions are made for multibyte characters in the matching rules used by the c, s, and [ conversion specifiers — the extent of the input field is determined on a byte-by-byte basis. The resulting field is nevertheless a sequence of multibyte characters that begins in the initial shift state.

    %s matches a sequence of non-whitespace characters. Here's a basic algorithm describing how it works (not taking into account end of file or other exceptional conditions):

    c <- next character from input stream
    while c is whitespace
      c <- next character from input stream
    while c is not whitespace
      append c to target buffer
      c <- next character from input stream
    push c back onto input stream
    append 0 terminator to target buffer
    

    The first whitespace character after the non-whitespace characters (if any) is pushed back onto the input stream for the next input operation to read.

    By contrast, the algorithm for the %c specifier is dead simple (unless you're using a field width greater than 1, which I've never done and won't get into here):

    c <- next character from input stream
    write c to target
    

    The algorithm for the %[ conversion specifier is a little different:

    c <- next character from input stream
    while c is in the list of characters in the scan set
      append c to target buffer
      c <- next character from input stream
    append 0 to target buffer
    push c back onto input stream
    

    So, it's a mistake to describe any conversion specifier as "retaining" trailing whitespace (which would imply that the trailing whitespace is saved to the target buffer); that's not the case. Trailing whitespace is left in the input stream for the next input operation to read.

    0 讨论(0)
提交回复
热议问题