问题
I am learning C++and I have to read a file in binary mode. Here's how I do it (following the C++ reference):
unsigned values[255];
unsigned total;
ifstream in ("test.txt", ifstream::binary);
while(in.good()){
unsigned val = in.get();
if(in.good()){
values[val]++;
total++;
cout << val <<endl;
}
}
in.close();
So, I am reading the file byte per byte till in.good()
is true. I put some cout
at the end of the while
in order to understand what's happening, and here is the output:
marco@iceland:~/workspace/huffman$ ./main
97
97
97
97
10
98
98
10
99
99
99
99
10
100
100
10
101
101
10
221497852
marco@iceland:~/workspace/huffman$
Now, the input file "test.txt" is just:
aaaa
bb
cccc
dd
ee
So everything works perfectly till the end, where there's that 221497852. I guess it's something about the end of file, but I can't figure the problem out.
I am using gedit & g++ on a debian machine(64bit). Any help help will be appreciated.
Many thanks,
Marco
回答1:
fstream::get returns an int
-value. This is one of the problems.
Secondly, you are reading in binary, so you shouldn't use formatted streams. You should use fstream::read:
// read a file into memory
#include <iostream> // std::cout
#include <fstream> // std::ifstream
int main () {
std::ifstream is ("test.txt", std::ifstream::binary);
if (is) {
// get length of file:
is.seekg (0, is.end);
int length = is.tellg();
is.seekg (0, is.beg);
char * buffer = new char [length];
std::cout << "Reading " << length << " characters... ";
// read data as a block:
is.read (buffer,length);
if (is)
std::cout << "all characters read successfully.";
else
std::cout << "error: only " << is.gcount() << " could be read";
is.close();
// ...buffer contains the entire file...
delete[] buffer;
}
return 0;
}
回答2:
This isn't the way istream::get()
was designed to be used.
The classical idiom for using this function would be:
for ( int val = in.get(); val != EOF; val = in.get() ) {
// ...
}
or even more idiomatic:
char ch;
while ( in.get( ch ) ) {
// ...
}
The first loop is really inherited from C, where in.get()
is
the equivalent of fgetc()
.
Still, as far as I can tell, the code you give should work. It's not idiomatic, and it's not
The C++ standard is unclear what it should return if the
character value read is negative. fgetc()
requires a value in
the range [0...UCHAR_MAX]
, and I think it safe to assume that
this is the intent here. It is, at least, what every
implementation I've used does. But this doesn't affect your
input. Depending on how the implementation interprets the
standard, the return value of in.get()
must be in the range
[0...UCHAR_MAX]
or [CHAR_MIN...CHAR_MAX]
, or it must be EOF
(typically -1). (The reason I'm fairly sure that the intent is
to require [0...UCHAR_MAX]
is because otherwise, you may not
be able to distinguish end of file from a valid character.)
And if the return value is EOF (almost always
-1), failbit
should be set, so in.good()
would return
false
. There is no case where in.get()
would be allowed
to return 221497852. The only explication I can possibly think
of for your results is that your file has some character with
bit 7 set at the end of the file, that the implementation is
returning a negative number for this (but not end of file,
because it is a character), which results in an out of bounds
index in values[val]
, and that this out of bounds index
somehow ends up modifying val
. Or that your implementation is
broken, and is not setting failbit
when it returns end of
file.
To be certain, I'd be interested in knowing what you get from the following:
std::ifstream in( "text.txt", std::ios_base::binary );
int ch = in.get();
while ( ch != std::istream::traits_type::eof() ) {
std::cout << ch << std::endl;
ch = in.get();
}
This avoids any issues of a possibly invalid index, and any type
conversions (although the conversion int
to unsigned
is well
defined). Also, out of curiosity (since I can only access VC++
here), you might try replacing in
as follows:
std::istringstream in( "\n\xE5" );
I would expect to get:
10
233
(Assuming 8 bit bytes and an ASCII based code set. Both of which are almost, but not quite universal today.)
回答3:
I've eventually figured this out. Apparently it seems the problem wasn't due to any code. The problem was gedit. It always appends a newline character at the end of file. This also happen with other editors, such as vim. For some editor this can be configured to not append anything, but in gedit this is apparently not possible. https://askubuntu.com/questions/13317/how-to-stop-gedit-gvim-vim-nano-from-adding-end-of-file-newline-char
Cheers to everyone who asked me,
Marco
来源:https://stackoverflow.com/questions/16435180/c-reading-a-file-in-binary-mode-problems-with-end-of-file