Is scanf's “regex” support a standard?

后端 未结 2 730
抹茶落季
抹茶落季 2021-01-01 15:58

Is scanf\'s \"regex\" support a standard? I can\'t find the answer anywhere.

This code works in gcc but not in Visual Studio:

scanf(\"%[^\\n]\",a);
<         


        
相关标签:
2条回答
  • 2021-01-01 16:37

    The "%[" format spec for scanf() is standard and has been since C90.

    MSVC does support it.

    You can also provide a field width in the format spec to provide safety against buffer overruns:

    int main()
    {
        char buf[9];
    
        scanf("%8[^\n]",buf);
    
        printf("%s\n", buf);
        printf("strlen(buf) == %u\n", strlen(buf));
    
        return 0;
    }
    

    Also note that the "%[" format spec doesn't mean that scanf() supports regular expressions. That particular format spec is similar to a capability of regexs (and no doubt was an influenced by regex), but it's far more limited than regular expressions.

    0 讨论(0)
  • 2021-01-01 16:49

    That particular format string should work fine in a conforming implementation. The [ character introduces a scanset for matching a non-empty set of characters (with the ^ meaning that the scanset is an inversion of the characters supplied). In other words, the format specifier %[^\n] should match every character that's not a newline.

    From C99 7.19.6.2, slightly paraphrased:

    The [ format specifier matches a nonempty sequence of characters from a set of expected characters (the scanset). If no l length modifier is present, the corresponding argument shall be a pointer to the initial element of a character array large enough to accept the sequence and a terminating null character, which will be added automatically.

    If an l length modifier is present, the input shall be a sequence of multibyte characters that begins in the initial shift state. Each multibyte character is converted to a wide character as if by a call to the mbrtowc function, with the conversion state described by an mbstate_t object initialized to zero before the first multibyte character is converted. The corresponding argument shall be a pointer to the initial element of an array of wchar_t large enough to accept the sequence and the terminating null wide character, which will be added automatically.

    The conversion specifier includes all subsequent characters in the format string, up to and including the matching right bracket ]. The characters between the brackets (the scanlist) compose the scanset, unless the character after the left bracket is a circumflex ^, in which case the scanset contains all characters that do not appear in the scanlist between the circumflex and the right bracket. If the conversion specifier begins with [] or [^], the right bracket character is in the scanlist and the next following right bracket character is the matching right bracket that ends the specification; otherwise the first following right bracket character is the one that ends the specification. If a - character is in the scanlist and is not the first, nor the second where the first character is a ^, nor the last character, the behavior is implementation-defined.

    It's possible, if MSVC isn't working correctly, that this is just one of the many examples where Microsoft either don't conform to the latest standard, or think they know better :-)

    0 讨论(0)
提交回复
热议问题