When does `ifstream::readsome` set `eofbit`?

痞子三分冷 提交于 2019-12-21 04:08:34

问题


This code loops forever:

#include <iostream>
#include <fstream>
#include <sstream>

int main(int argc, char *argv[])
{
    std::ifstream f(argv[1]);
    std::ostringstream ostr;

    while(f && !f.eof())
    {
        char b[5000];
        std::size_t read = f.readsome(b, sizeof b);
        std::cerr << "Read: " << read << " bytes" << std::endl;
        ostr.write(b, read);
    }
}

It's because readsome is never setting eofbit.

cplusplus.com says:

Errors are signaled by modifying the internal state flags:

eofbit The get pointer is at the end of the stream buffer's internal input array when the function is called, meaning that there are no positions to be read in the internal buffer (which may or not be the end of the input sequence). This happens when rdbuf()->in_avail() would return -1 before the first character is extracted.

failbit The stream was at the end of the source of characters before the function was called.

badbit An error other than the above happened.

Almost the same, the standard says:

[C++11: 27.7.2.3]: streamsize readsome(char_type* s, streamsize n);

32. Effects: Behaves as an unformatted input function (as described in 27.7.2.3, paragraph 1). After constructing a sentry object, if !good() calls setstate(failbit) which may throw an exception, and return. Otherwise extracts characters and stores them into successive locations of an array whose first element is designated by s. If rdbuf()->in_avail() == -1, calls setstate(eofbit) (which may throw ios_base::failure (27.5.5.4)), and extracts no characters;

  • If rdbuf()->in_avail() == 0, extracts no characters
  • If rdbuf()->in_avail() > 0, extracts min(rdbuf()->in_avail(),n)).

33. Returns: The number of characters extracted.

That the in_avail() == 0 condition is a no-op implies that ifstream::readsome itself is a no-op if the stream buffer is empty, but the in_avail() == -1 condition implies that it will set eofbit when some other operation has led to in_avail() == -1.

This seems like an inconsistency, even despite the "some" nature of readsome.

So what are the semantics of readsome and eof? Have I interpreted them correctly? Are they an example of poor design in the streams library?


(Stolen from the [IMO] invalid libstdc++ bug 52169.)


回答1:


I think this is a customization point, not really used by the default stream implementations.

in_avail() returns the number of chars it can see in the internal buffer, if any. Otherwise it calls showmanyc() to try to detect if chars are known to be available elsewhere, so a buffer fill request is guaranteed to succeed.

In turn, showmanyc() will return the number of chars it knows about, if any, or -1 if it knows that a read will fail, or 0 if it doesn't have a clue.

The default implementation (basic_streambuf) always returns 0, so that is what you get unless you have a stream with some other streambuf overriding showmanyc.

Your loop is essentially read-as-many-chars-as-you-know-is-safe, and it gets stuck when that is zero (meaning "not sure").




回答2:


If no character is available (i.e. gptr() == egptr() for the std:streambuf) the virtual member function showhowmanyc() is called. I could have an implementation of showmanyc() which returns an error code. Why that may be useful is a different question. However, this could set eof(). Of course, in_avail() is meant not to fail and not to block and just return the characters known to be available. That is, the loop you have above is essentially guaranteed to be an infinite loop unless you have a rather odd stream buffer.




回答3:


I don't think that readsome() is meant for what you're trying to do (read from a file on disk)... from cplusplus.com:

The function is intended to be used to read binary data from certain types of asynchronic sources that may wait for more characters, since it stops reading when the local buffer exhausts, avoiding potential unexpected delays.

So it sounds like readsome() is intended for streams from a network socket or something like that, and you probably want to just use read().




回答4:


Others have answered why readsome won't set eofbit by design. I will suggest a way to read some bytes until eof without setting fail bit in a intuitive way, in the same way you were trying to use readsome. This is the result of research in another question.

#include <iostream>
#include <fstream>
#include <sstream>

using namespace std;

streamsize Read(istream &stream, char *buffer, streamsize count)
{
    // This consistently fails on gcc (linux) 4.8.1 with failbit set on read
    // failure. This apparently never fails on VS2010 and VS2013 (Windows 7)
    streamsize reads = stream.rdbuf()->sgetn(buffer, count);

    // This rarely sets failbit on VS2010 and VS2013 (Windows 7) on read
    // failure of the previous sgetn()
    stream.rdstate();

    // On gcc (linux) 4.8.1 and VS2010/VS2013 (Windows 7) this consistently
    // sets eofbit when stream is EOF for the conseguences  of sgetn(). It
    // should also throw if exceptions are set, or return on the contrary,
    // and previous rdstate() restored a failbit on Windows. On Windows most
    // of the times it sets eofbit even on real read failure
    stream.peek();

    return reads;
}

int main(int argc, char *argv[])
{
    ifstream instream("filepath", ios_base::in | ios_base::binary);
    while (!instream.eof())
    {
        char buffer[0x4000];
        size_t read = Read(instream, buffer, sizeof(buffer));
        // Do something with buffer 
    }
}


来源:https://stackoverflow.com/questions/9191876/when-does-ifstreamreadsome-set-eofbit

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!