How to read / parse input in C? The FAQ

后端 未结 1 1399
孤城傲影
孤城傲影 2020-11-21 11:38

I have problems with my C program when I try to read / parse input.

Help?


This is a FAQ entry.

StackOverflow has many questions relate

1条回答
  •  日久生厌
    2020-11-21 12:04

    The Beginner's C Input Primer

    • Text mode vs. Binary mode
    • Check fopen() for failure
    • Pitfalls
      • Check any functions you call for success
      • EOF, or "why does the last line print twice"
      • Do not use gets(), ever
      • Do not use fflush() on stdin or any other stream open for reading, ever
      • Do not use *scanf() for potentially malformed input
      • When *scanf() does not work as expected
    • Read, then parse
      • Read (part of) a line of input via fgets()
      • Parse the line in-memory
    • Clean Up

    Text mode vs. Binary mode

    A "binary mode" stream is read in exactly as it has been written. However, there might (or might not) be an implementation-defined number of null characters ('\0') appended at the end of the stream.

    A "text mode" stream may do a number of transformations, including (but not limited to):

    • removal of spaces immediately before a line-end;
    • changing newlines ('\n') to something else on output (e.g. "\r\n" on Windows) and back to '\n' on input;
    • adding, altering, or deleting characters that are neither printing characters (isprint(c) is true), horizontal tabs, or new-lines.

    It should be obvious that text and binary mode do not mix. Open text files in text mode, and binary files in binary mode.

    Check fopen() for failure

    The attempt to open a file may fail for various reasons -- lack of permissions, or file not found being the most common ones. In this case, fopen() will return a NULL pointer. Always check whether fopen returned a NULL pointer, before attempting to read or write to the file.

    When fopen fails, it usually sets the global errno variable to indicate why it failed. (This is technically not a requirement of the C language, but both POSIX and Windows guarantee to do it.) errno is a code number which can be compared against constants in errno.h, but in simple programs, usually all you need to do is turn it into an error message and print that, using perror() or strerror(). The error message should also include the filename you passed to fopen; if you don't do that, you will be very confused when the problem is that the filename isn't what you thought it was.

    #include 
    #include 
    #include 
    
    int main(int argc, char **argv)
    {
        if (argc < 2) {
            fprintf(stderr, "usage: %s file\n", argv[0]);
            return 1;
        }
    
        FILE *fp = fopen(argv[1], "rb");
        if (!fp) {
            // alternatively, just `perror(argv[1])`
            fprintf(stderr, "cannot open %s: %s\n", argv[1], strerror(errno));
            return 1;
        }
    
        // read from fp here
    
        fclose(fp);
        return 0;
    }
    

    Pitfalls

    Check any functions you call for success

    This should be obvious. But do check the documentation of any function you call for their return value and error handling, and check for those conditions.

    These are errors that are easy when you catch the condition early, but lead to lots of head-scratching if you do not.

    EOF, or "why does the last line print twice"

    The function feof() returns true if EOF has been reached. A misunderstanding of what "reaching" EOF actually means makes many beginners write something like this:

    // BROKEN CODE
    while (!feof(fp)) {
        fgets(buffer, BUFFER_SIZE, fp);
        printf("%s", buffer);
    }
    

    This makes the last line of the input print twice, because when the last line is read (up to the final newline, the last character in the input stream), EOF is not set.

    EOF only gets set when you attempt to read past the last character!

    So the code above loops once more, fgets() fails to read another line, sets EOF and leaves the contents of buffer untouched, which then gets printed again.

    Instead, check whether fgets failed directly:

    // GOOD CODE
    while (fgets(buffer, BUFFER_SIZE, fp)) {
        printf("%s", buffer);
    }
    

    Do not use gets(), ever

    There is no way to use this function safely. Because of this, it has been removed from the language with the advent of C11.

    Do not use fflush() on stdin or any other stream open for reading, ever

    Many people expect fflush(stdin) to discard user input that has not yet been read. It does not do that. In plain ISO C, calling fflush() on an input stream has undefined behaviour. It does have well-defined behavior in POSIX and in MSVC, but neither of those make it discard user input that has not yet been read.

    Usually, the right way to clear pending input is read and discard characters up to and including a newline, but not beyond:

    int c;
    do c = getchar(); while (c != EOF && c != '\n');
    

    Do not use *scanf() for potentially malformed input

    Many tutorials teach you to use *scanf() for reading any kind of input, because it is so versatile.

    But the purpose of *scanf() is really to read bulk data that can be somewhat relied upon being in a predefined format. (Such as being written by another program.)

    Even then *scanf() can trip the unobservant:

    • Using a format string that in some way can be influenced by the user is a gaping security hole.
    • If the input does not match the expected format, *scanf() immediately stops parsing, leaving any remaining arguments uninitialized.
    • It will tell you how many assignments it has successfully done -- which is why you should check its return code (see above) -- but not where exactly it stopped parsing the input, making graceful error recovery difficult.
    • It skips any leading whitespaces in the input, except when it does not ([, c, and n conversions). (See next paragraph.)
    • It has somewhat peculiar behaviour in some corner cases.

    When *scanf() does not work as expected

    A frequent problem with *scanf() is when there is an unread whitespace (' ', '\n', ...) in the input stream that the user did not account for.

    Reading a number ("%d" et al.), or a string ("%s"), stops at any whitespace. And while most *scanf() conversion specifiers skip leading whitespace in the input, [, c and n do not. So the newline is still the first pending input character, making either %c and %[ fail to match.

    You can skip over the newline in the input, by explicitly reading it e.g. via fgetc(), or by adding a whitespace to your *scanf() format string. (A single whitespace in the format string matches any number of whitespace in the input.)

    Read, then parse

    We just adviced against using *scanf() except when you really, positively, know what you are doing. So, what to use as a replacement?

    Instead of reading and parsing the input in one go, as *scanf() attempts to do, separate the steps.

    Read (part of) a line of input via fgets()

    fgets() has a parameter for limiting its input to at most that many bytes, avoiding overflow of your buffer. If the input line did fit into your buffer completely, the last character in your buffer will be the newline ('\n'). If it did not all fit, you are looking at a partially-read line.

    Parse the line in-memory

    Especially useful for in-memory parsing are the strtol() and strtod() function families, which provide similar functionality to the *scanf() conversion specifiers d, i, u, o, x, a, e, f, and g.

    But they also tell you exactly where they stopped parsing, and have meaningful handling of numbers too large for the target type.

    Beyond those, C offers a wide range of string processing functions. Since you have the input in memory, and always know exactly how far you have parsed it already, you can walk back as many times you like trying to make sense of the input.

    And if all else fails, you have the whole line available to print a helpful error message for the user.

    Clean Up

    Make sure you explicitly close any stream you have (successfully) opened. This flushes any as-yet unwritten buffers, and avoids resource leaks.

    fclose(fp);
    

    0 讨论(0)
提交回复
热议问题