Use ftell to find the file size

三世轮回 提交于 2020-07-21 07:20:07

问题


  fseek(f, 0, SEEK_END); 
  size = ftell(f);

If ftell(f) tells us the current file position, the size here should be the offset from the end of the file to the beginning. Why is the size not ftell(f)+1? Should not ftell(f) only give us the position of the end of the file?


回答1:


File positions are like the cursor in a text entry widget: they are in between the bytes of the file. This is maybe easiest to understand if I draw a picture:

This is a hypothetical file. It contains four characters: a, b, c, and d. Each character gets a little box to itself, which we call a "byte". (This file is ASCII.) The fifth box has been crossed out because it's not part of the file yet, but but if you appended a fifth character to the file it would spring into existence.

The valid file positions in this file are 0, 1, 2, 3, and 4. There are five of them, not four; they correspond to the vertical lines before, after, and in between the boxes. When you open the file (assuming you don't use "a"), you start out on position 0, the line before the first byte in the file. When you seek to the end, you arrive at position 4, the line after the last byte in the file. Because we start counting from zero, this is also the number of bytes in the file. (This is one of the several reasons why we start counting from zero, rather than one.)

I am obliged to warn you that there are several reasons why

fseek(fp, 0, SEEK_END);
long int nbytes = ftell(fp);

might not give you the number you actually want, depending on what you mean by "file size" and on the contents of the file. In no particular order:

  • On Windows, if you open a file in text mode, the numbers you get from ftell on that file are not byte offsets from the beginning of the file; they are more like fgetpos cookies, that can only be used in a subsequent call to fseek. If you need to seek around in a text file on Windows you may be better off opening the file in binary mode and dealing with both DOS and Unix line endings yourself — this is actually my recommendation for production code in general, because it's perfectly possible to have a file with DOS line endings on a Unix system, or vice versa.

  • On systems where long int is 32 bits, files can easily be bigger than that, in which case ftell will fail, return −1 and set errno to EOVERFLOW. POSIX.1-2001-compliant systems provide a function called ftello that returns an off_t quantity that can represent larger file sizes, provided you put #define _FILE_OFFSET_BITS 64 at the very top of all your source files (before any #includes). I don't know what the Windows equivalent is.

  • If your file contains characters that are beyond ASCII, then the number of bytes in the file is very likely to be different from the number of characters in the file. (For instance, if the file is encoded in UTF-8, the character will take up three bytes, Ä will take up either two or three bytes depending on whether it's "composed", and జ్ఞా will take up twelve bytes because, despite being a single grapheme, it's a string of four Unicode code points.) ftell(o) will still tell you the correct number to pass to malloc, if your goal is to read the entire file into memory, but iterating over "characters" will not be so simple as for (i = 0; i < len; i++).

  • If you are using C's "wide streams" and "wide characters", then, just like text streams on Windows, the numbers you get from ftell on that file are not byte offsets and may not be useful for anything other than subsequent calls to fseek. But wide streams and characters are a bad design anyway; you're actually more likely to be able to handle all the world's languages correctly if you stick to processing UTF-8 by hand in narrow streams and characters.




回答2:


I'm not sure why fseek()/ftell() is taught as a generic way to get the size of a file. It only works because an implementation defines it to work. POSIX does, for one. Windows does, also, for binary streams - but not for text streams.

It's wrong to not add a caveat or warning to, "This is how you get the number of bytes in a file." Because when a programmer first gets on a system that doesn't define fseek()/ftell() as byte offsets, they're going to have problems. I've seen it.

"But I was told this is how you can always do it."

"Well, no. Whoever taught you was wrong."

Because it is impossible to use fseek()/ftell() to get the size of a file in strictly-conforming C code.

For a binary stream, 7.21.9.2 The fseek function, paragraph 3 of the C standard:

For a binary stream, the new position, measured in characters from the beginning of the file, is obtained by adding offset to the position specified by whence. The specified position is the beginning of the file if whence is SEEK_SET, the current value of the file position indicator if SEEK_CUR , or end-of-file if SEEK_END. A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.

Footnote 268 specifically states:

Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state.

So you can't seek the the end of a binary stream to get a file's size in bytes.

And for a text stream, 7.21.9.4 The ftell function, paragraph 2 states:

The ftell function obtains the current value of the file position indicator for the stream pointed to by stream. For a binary stream, the value is the number of characters from the beginning of the file. For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read.

So you can't use ftell() on a text stream to get a byte count.

The only strictly-conformant approach that I'm aware of to get the number of bytes in a file is to read them one-by-one with fgetc() and count them.



来源:https://stackoverflow.com/questions/49121638/use-ftell-to-find-the-file-size

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!