C file size discrepency

前端 未结 2 1154
没有蜡笔的小新
没有蜡笔的小新 2021-01-23 00:30

I am trying to learn C, and am currently working on a toy script. Right now, it simply opens a text file, reads it char by char, and spits it out onto the command line.

相关标签:
2条回答
  • 2021-01-23 01:02

    If you're running windows:

    FILE * fp = fopen("test.txt", "r");
    

    opens the file in text mode which implies \r\n conversion to \n

    So if your file has 7 lines, the conversion removes 7 chars (that is, if the file was using Windows-style line termination)

    The fix is to open it in binary mode

    FILE * fp = fopen("test.txt", "rb");
    

    so ftell and reading chars one by one should match.

    Of course, that's wasting space & not very convenient to have \r chars in your text, so you could allocate like you're doing, and in the end perform a realloc to shrink down the allocated memory with the actual number of chars (since it's smaller, it's ok)

    stringOfFile = realloc(stringOfFile,i+1);
    

    Note that since I've taken the need to add the nul-terminator into account, I've added 1 to the number of chars, so if there aren't any \r chars in the file, the realloc could increase the size of the block by 1.

    So, as I was hinting at, don't forget to nul-terminate your string or printf doesn't stop properly:

    stringOfFile[i] = '\0';
    

    (unless you don't care about creating a C-string, since storing the string size + display char-by-char is also correct)

    We've see that the ftell method is tricky, and in some cases, when the stream is for instance the output of a command (popen returns a FILE * but you cannot fseek it) or a socket, whatever, this principle cannot be applied since we don't know the size of the data in advance.

    In the general case, it would be better to:

    • allocate a small buffer
    • read char by char and store
    • if buffer is full, call realloc to increase the size by some step (not at every char, performance would be bad)
    • in the end, call realloc again to adjust the size more precisely

    (that solves the binary/text issue transparently as well)

    Note that if you're working with large files (>4GB) you have to use 64-bit unsigned integers for positions and fopen64 flavours of I/O functions (and all offset variables like i should be unsigned / conform to return type of ftell or you'll start having problems at 2GB). Well, I suppose it doesn't matter much when processing moderately small text files.

    Also, check David answer. With text files, putting the result of getc in a char should work, but not in the general case with binary files.

    0 讨论(0)
  • 2021-01-23 01:13
        char tmp = getc(fp); //current letter in file
        int i=0;
        while (tmp != EOF) //End-Of-File (defined in stdio.h)
    

    You need to check the value returned by getc for EOF. Instead, you convert it to a char and then check whether that's equal to EOF converted to a char. But what if the value of char that converts to EOF is actually in the file? Check the docs, getc returns an int.

    You have other mistakes as well.

    0 讨论(0)
提交回复
热议问题