Disadvantages of scanf

前端 未结 9 1153
傲寒
傲寒 2020-11-22 00:34

I want to know the disadvantages of scanf().

In many sites, I have read that using scanf might cause buffer overflows. What is the reason f

相关标签:
9条回答
  • 2020-11-22 00:50

    Many answers here discuss the potential overflow issues of using scanf("%s", buf), but the latest POSIX specification more-or-less resolves this issue by providing an m assignment-allocation character that can be used in format specifiers for c, s, and [ formats. This will allow scanf to allocate as much memory as necessary with malloc (so it must be freed later with free).

    An example of its use:

    char *buf;
    scanf("%ms", &buf); // with 'm', scanf expects a pointer to pointer to char.
    
    // use buf
    
    free(buf);
    

    See here. Disadvantages to this approach is that it is a relatively recent addition to the POSIX specification and it is not specified in the C specification at all, so it remains rather unportable for now.

    0 讨论(0)
  • 2020-11-22 00:57

    The problems with scanf are (at a minimum):

    • using %s to get a string from the user, which leads to the possibility that the string may be longer than your buffer, causing overflow.
    • the possibility of a failed scan leaving your file pointer in an indeterminate location.

    I very much prefer using fgets to read whole lines in so that you can limit the amount of data read. If you've got a 1K buffer, and you read a line into it with fgets you can tell if the line was too long by the fact there's no terminating newline character (last line of a file without a newline notwithstanding).

    Then you can complain to the user, or allocate more space for the rest of the line (continuously if necessary until you have enough space). In either case, there's no risk of buffer overflow.

    Once you've read the line in, you know that you're positioned at the next line so there's no problem there. You can then sscanf your string to your heart's content without having to save and restore the file pointer for re-reading.

    Here's a snippet of code which I frequently use to ensure no buffer overflow when asking the user for information.

    It could be easily adjusted to use a file other than standard input if necessary and you could also have it allocate its own buffer (and keep increasing it until it's big enough) before giving that back to the caller (although the caller would then be responsible for freeing it, of course).

    #include <stdio.h>
    #include <string.h>
    
    #define OK         0
    #define NO_INPUT   1
    #define TOO_LONG   2
    #define SMALL_BUFF 3
    static int getLine (char *prmpt, char *buff, size_t sz) {
        int ch, extra;
    
        // Size zero or one cannot store enough, so don't even
        // try - we need space for at least newline and terminator.
    
        if (sz < 2)
            return SMALL_BUFF;
    
        // Output prompt.
    
        if (prmpt != NULL) {
            printf ("%s", prmpt);
            fflush (stdout);
        }
    
        // Get line with buffer overrun protection.
    
        if (fgets (buff, sz, stdin) == NULL)
            return NO_INPUT;
    
        // Catch possibility of `\0` in the input stream.
    
        size_t len = strlen(buff);
        if (len < 1)
            return NO_INPUT;
    
        // If it was too long, there'll be no newline. In that case, we flush
        // to end of line so that excess doesn't affect the next call.
    
        if (buff[len - 1] != '\n') {
            extra = 0;
            while (((ch = getchar()) != '\n') && (ch != EOF))
                extra = 1;
            return (extra == 1) ? TOO_LONG : OK;
        }
    
        // Otherwise remove newline and give string back to caller.
        buff[len - 1] = '\0';
        return OK;
    }
    

    And, a test driver for it:

    // Test program for getLine().
    
    int main (void) {
        int rc;
        char buff[10];
    
        rc = getLine ("Enter string> ", buff, sizeof(buff));
        if (rc == NO_INPUT) {
            // Extra NL since my system doesn't output that on EOF.
            printf ("\nNo input\n");
            return 1;
        }
    
        if (rc == TOO_LONG) {
            printf ("Input too long [%s]\n", buff);
            return 1;
        }
    
        printf ("OK [%s]\n", buff);
    
        return 0;
    }
    

    Finally, a test run to show it in action:

    $ printf "\0" | ./tstprg     # Singular NUL in input stream.
    Enter string>
    No input
    
    $ ./tstprg < /dev/null       # EOF in input stream.
    Enter string>
    No input
    
    $ ./tstprg                   # A one-character string.
    Enter string> a
    OK [a]
    
    $ ./tstprg                   # Longer string but still able to fit.
    Enter string> hello
    OK [hello]
    
    $ ./tstprg                   # Too long for buffer.
    Enter string> hello there
    Input too long [hello the]
    
    $ ./tstprg                   # Test limit of buffer.
    Enter string> 123456789
    OK [123456789]
    
    $ ./tstprg                   # Test just over limit.
    Enter string> 1234567890
    Input too long [123456789]
    
    0 讨论(0)
  • 2020-11-22 01:02

    It is very hard to get scanf to do the thing you want. Sure, you can, but things like scanf("%s", buf); are as dangerous as gets(buf);, as everyone has said.

    As an example, what paxdiablo is doing in his function to read can be done with something like:

    scanf("%10[^\n]%*[^\n]", buf));
    getchar();
    

    The above will read a line, store the first 10 non-newline characters in buf, and then discard everything till (and including) a newline. So, paxdiablo's function could be written using scanf the following way:

    #include <stdio.h>
    
    enum read_status {
        OK,
        NO_INPUT,
        TOO_LONG
    };
    
    static int get_line(const char *prompt, char *buf, size_t sz)
    {
        char fmt[40];
        int i;
        int nscanned;
    
        printf("%s", prompt);
        fflush(stdout);
    
        sprintf(fmt, "%%%zu[^\n]%%*[^\n]%%n", sz-1);
        /* read at most sz-1 characters on, discarding the rest */
        i = scanf(fmt, buf, &nscanned);
        if (i > 0) {
            getchar();
            if (nscanned >= sz) {
                return TOO_LONG;
            } else {
                return OK;
            }
        } else {
            return NO_INPUT;
        }
    }
    
    int main(void)
    {
        char buf[10+1];
        int rc;
    
        while ((rc = get_line("Enter string> ", buf, sizeof buf)) != NO_INPUT) {
            if (rc == TOO_LONG) {
                printf("Input too long: ");
            }
            printf("->%s<-\n", buf);
        }
        return 0;
    }
    

    One of the other problems with scanf is its behavior in case of overflow. For example, when reading an int:

    int i;
    scanf("%d", &i);
    

    the above cannot be used safely in case of an overflow. Even for the first case, reading a string is much more simpler to do with fgets rather than with scanf.

    0 讨论(0)
  • 2020-11-22 01:09

    Yes, you are right. There is a major security flaw in scanf family(scanf,sscanf, fscanf..etc) esp when reading a string, because they don't take the length of the buffer (into which they are reading) into account.

    Example:

    char buf[3];
    sscanf("abcdef","%s",buf);
    

    clearly the the buffer buf can hold MAX 3 char. But the sscanf will try to put "abcdef" into it causing buffer overflow.

    0 讨论(0)
  • 2020-11-22 01:10

    From the comp.lang.c FAQ: Why does everyone say not to use scanf? What should I use instead?

    scanf has a number of problems—see questions 12.17, 12.18a, and 12.19. Also, its %s format has the same problem that gets() has (see question 12.23)—it’s hard to guarantee that the receiving buffer won’t overflow. [footnote]

    More generally, scanf is designed for relatively structured, formatted input (its name is in fact derived from “scan formatted”). If you pay attention, it will tell you whether it succeeded or failed, but it can tell you only approximately where it failed, and not at all how or why. You have very little opportunity to do any error recovery.

    Yet interactive user input is the least structured input there is. A well-designed user interface will allow for the possibility of the user typing just about anything—not just letters or punctuation when digits were expected, but also more or fewer characters than were expected, or no characters at all (i.e., just the RETURN key), or premature EOF, or anything. It’s nearly impossible to deal gracefully with all of these potential problems when using scanf; it’s far easier to read entire lines (with fgets or the like), then interpret them, either using sscanf or some other techniques. (Functions like strtol, strtok, and atoi are often useful; see also questions 12.16 and 13.6.) If you do use any scanf variant, be sure to check the return value to make sure that the expected number of items were found. Also, if you use %s, be sure to guard against buffer overflow.

    Note, by the way, that criticisms of scanf are not necessarily indictments of fscanf and sscanf. scanf reads from stdin, which is usually an interactive keyboard and is therefore the least constrained, leading to the most problems. When a data file has a known format, on the other hand, it may be appropriate to read it with fscanf. It’s perfectly appropriate to parse strings with sscanf (as long as the return value is checked), because it’s so easy to regain control, restart the scan, discard the input if it didn’t match, etc.

    Additional links:

    • longer explanation by Chris Torek
    • longer explanation by yours truly

    References: K&R2 Sec. 7.4 p. 159

    0 讨论(0)
  • 2020-11-22 01:11

    Problems I have with the *scanf() family:

    • Potential for buffer overflow with %s and %[ conversion specifiers. Yes, you can specify a maximum field width, but unlike with printf(), you can't make it an argument in the scanf() call; it must be hardcoded in the conversion specifier.
    • Potential for arithmetic overflow with %d, %i, etc.
    • Limited ability to detect and reject badly formed input. For example, "12w4" is not a valid integer, but scanf("%d", &value); will successfully convert and assign 12 to value, leaving the "w4" stuck in the input stream to foul up a future read. Ideally the entire input string should be rejected, but scanf() doesn't give you an easy mechanism to do that.

    If you know your input is always going to be well-formed with fixed-length strings and numerical values that don't flirt with overflow, then scanf() is a great tool. If you're dealing with interactive input or input that isn't guaranteed to be well-formed, then use something else.

    0 讨论(0)
提交回复
热议问题