问题
I am trying to learn C, and am currently working on a toy script. Right now, it simply opens a text file, reads it char by char, and spits it out onto the command line.
I looked up how to see the size of a file (using fseek() and then ftell()), but the result it returns doesn't match up with the number I get from counting the characters in a while loop as I iterate through the file.
I'm wondering if the discrepency is due to windows using \r\n and not just \n, since the discrepency seems to be #newlines+1.
Below is the script I am working on:
#include <stdio.h>
#include <stdlib.h>
int main()
{
FILE * fp = fopen("test.txt", "r");
fseek(fp, 0, SEEK_END);
char * stringOfFile = malloc(ftell(fp));
printf("allocated %d characters for file\n", ftell(fp));
fseek(fp,0,SEEK_SET);//reset pointer
char tmp = getc(fp); //current letter in file
int i=0;
while (tmp != EOF) //End-Of-File (defined in stdio.h)
{
*(stringOfFile+i) = tmp;
tmp = getc(fp);
i++;
}
fclose(fp);
printf("Turns out we had %d characters to store.\nThe file was as follows:\n", i);
printf("%s", stringOfFile);
}
And the output I get (with a simple test file you can see from the output) is:
allocated 67 characters for file
Turns out we had 60 characters to store.
The file was as follows:
line1
line2
line3
line4
line5
(last)line6
lmnopqrstuvw▬$YL Æ
where the tail bits of the printing seem to be garbage from allocating too much memory to the string.
Thanks in advance for any help/answer you can provide!
回答1:
If you're running windows:
FILE * fp = fopen("test.txt", "r");
opens the file in text mode which implies \r\n
conversion to \n
So if your file has 7 lines, the conversion removes 7 chars (that is, if the file was using Windows-style line termination)
The fix is to open it in binary mode
FILE * fp = fopen("test.txt", "rb");
so ftell
and reading chars one by one should match.
Of course, that's wasting space & not very convenient to have \r
chars in your text, so you could allocate like you're doing, and in the end perform a realloc
to shrink down the allocated memory with the actual number of chars (since it's smaller, it's ok)
stringOfFile = realloc(stringOfFile,i+1);
Note that since I've taken the need to add the nul-terminator into account, I've added 1 to the number of chars, so if there aren't any \r
chars in the file, the realloc
could increase the size of the block by 1.
So, as I was hinting at, don't forget to nul-terminate your string or printf
doesn't stop properly:
stringOfFile[i] = '\0';
(unless you don't care about creating a C-string, since storing the string size + display char-by-char is also correct)
We've see that the ftell
method is tricky, and in some cases, when the stream is for instance the output of a command (popen
returns a FILE *
but you cannot fseek
it) or a socket, whatever, this principle cannot be applied since we don't know the size of the data in advance.
In the general case, it would be better to:
- allocate a small buffer
- read char by char and store
- if buffer is full, call
realloc
to increase the size by some step (not at every char, performance would be bad) - in the end, call
realloc
again to adjust the size more precisely
(that solves the binary/text issue transparently as well)
Note that if you're working with large files (>4GB) you have to use 64-bit unsigned integers for positions and fopen64
flavours of I/O functions (and all offset variables like i
should be unsigned / conform to return type of ftell
or you'll start having problems at 2GB). Well, I suppose it doesn't matter much when processing moderately small text files.
Also, check David answer. With text files, putting the result of getc
in a char
should work, but not in the general case with binary files.
回答2:
char tmp = getc(fp); //current letter in file
int i=0;
while (tmp != EOF) //End-Of-File (defined in stdio.h)
You need to check the value returned by getc
for EOF
. Instead, you convert it to a char
and then check whether that's equal to EOF
converted to a char
. But what if the value of char
that converts to EOF
is actually in the file? Check the docs, getc
returns an int
.
You have other mistakes as well.
来源:https://stackoverflow.com/questions/48157920/c-file-size-discrepency