How to find number of characters in a file without traversing the contents

我是研究僧i 提交于 2019-12-23 09:25:39

问题


In a project, I have to read a file, and i have to work with the number of characters in a file, and is there a way to get number of characters without reading it character by character (otherwise i will have to read the file twice, once just to find the number of characters in it).

Is it even possible?


回答1:


You can try this:

FILE *fp = ... /*open as usual*/;
fseek(fp, 0L, SEEK_END);
size_t fileSize = ftell(fp);

However, this returns the number of bytes in the file, not the number of characters. It is not the same unless the encoding is known to be one byte per character (e.g. ASCII).

You'd need to "rewind" the file back to the beginning after you've learned the size:

fseek(fp, 0L, SEEK_SET);



回答2:


Yes.

Seek to the end get the position of the end that is the size.

FILE*  file = fopen("Plop");
fseek(file, 0, SEEK_END);
size_t  size = ftell(file);      // This is the size of the file.
                                 // But note it is in bytes.
                                 // Also note if you are reading it into memory this is
                                 // is the value you want unless you plan to dynamically
                                 // convert the character encoding as you read.

fseek(file, 0, SEEK_SET);        // Move the position back to the start.

In C++ the stream have the same functionality:

std::ifstream   file("Plop");
file.seekg(0, std::ios_base::end);
size_t size = file.tellg();

file.seekg(0, std::ios_base::beg);



回答3:


The simple answer is no. More precisely, it's system dependent: under Unix, it's possible (e.g. using stat); under Windows, it's not possible for a text file, but if you're reading the file in binary, there's a function GetFileSize which can be used.

Although not guaranteed, under all of the implementations I know (for these two platforms), seeking to the end of the file, then doing an ftell, will return something which, when converted to a sufficiently large integral type, will give the same results as the above (with the same restrictions).

Finally: why do you need this information? If it's just to allocate an appropriately sized buffer, even with a text file, GetFileSize (and tell after seeking to the end) will return a value slightly larger than the number of bytes you can read. You're buffer will be slightly oversized, but this is generally not a problem.




回答4:


I think you are likely looking for a dynamic memory solution. What you actually asked is "is there a way to get the number of characters in a file without reading it?". The answer (assuming one byte per character) is yes, you can use the stat call to get the file size, and the file size in bytes is the number of characters. With UTF-8 the answer is no, but let's put that aside for the moment since just-learning computer scientists usually don't worry about internationalization.

I think the reason you want to know how many characters there are is so that you can have storage big enough to hold them all. You don't need to know how big the file is to store the whole thing.

If you have an std::vector<char>, it can start out able to hold ten characters, then grow to hold twenty, then ten thousand... And when you're done reading the file, it will hold them all, even though you never knew how many there would be.




回答5:


Off the top of my head is so have a look at the file size and divide that by how many bytes a single character is?

Problems arise when dealing with white space and end lines etc.



来源:https://stackoverflow.com/questions/9132151/how-to-find-number-of-characters-in-a-file-without-traversing-the-contents

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!