问题
I have the following code:
#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main(int argc, char* argv[]) {
stringstream buffer("1234567890 ");
cout << "pos-before: " << buffer.tellg() << endl;
buffer.ignore(10, ' ');
cout << "pos-after: " << buffer.tellg() << endl;
cout << "eof: " << buffer.eof() << endl;
}
And it produces this output:
pos-before: 0
pos-after: 11
eof: 0
I would expect pos-after
to be 10
and not 11
. According to the specification, the ignore method should stop when any one of the following condition is set:
- count characters were extracted. This test is disabled in the special case when count equals
std::numeric_limits<std::streamsize>::max()
- end of file conditions occurs in the input sequence, in which case the function calls setstate(eofbit)
- the next available character c in the input sequence is delim, as determined by
Traits::eq_int_type(Traits::to_int_type(c), delim)
. The delimiter character is extracted and discarded. This test is disabled if delim isTraits::eof()
In this case I expect rule 1 to trigger before all the other rules and to stop when the stream position is 10.
Execution shows that it is not the case. What did I misunderstood ?
I also tried a variation of the code where I ignore only 9 characters. In this case the output is the expected one:
pos-before: 0
pos-after: 9
eof: 0
So it looks like in the case where ignore()
extracted the count of characters, it still checks if the next character is the delimiter
and if it is, it extracts it too.
I can reproduce with g++
and clang++
.
I also tried this variation of the code:
#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main(int argc, char* argv[]) {
cout << "--- 10x get\n";
stringstream buffer("1234567890");
cout << "pos-before: " << buffer.tellg() << '\n';
for(int i=0; i<10; ++i)
buffer.get();
cout << "pos-after: " << buffer.tellg() << '\n';
cout << "eof: " << buffer.eof() << '\n';
cout << "--- ignore(10)\n";
stringstream buffer2("1234567890");
cout << "pos-before: " << buffer2.tellg() << '\n';
buffer2.ignore(10);
cout << "pos-after: " << buffer2.tellg() << '\n';
cout << "eof: " << buffer2.eof() << '\n';
}
And the result is:
--- 10x get
pos-before: 0
pos-after: 10
eof: 0
--- ignore(10)
pos-before: 0
pos-after: -1
eof: 1
We see that using ignore()
produces an end-of-file condition on the file. Indicating that ignore()
did try to extract a character after having extracted 10 characters. But in this case, the 3rd condition is disabled and ignore()
should not have tried to look at what the next character was.
回答1:
The specification of std::basic_istream::ignore
in [istream.unformatted] paragraph 25 is a bit unclear clear: it states "Characters are extracted until any of the following occurs:" without any indication of order. Paragraph 25.1 states that at most n
characters are extracted (unless n
is std::numeric_limits<std::streamsize>
) and paragraph 25.3 states that the characters match. However, even if the conditions can be applied in any order, there is no conflict here: the n
th character is not, yet, the expected character and ignore()
is supposed to stop.
As was pointed out in a comment, there was/is a bug in libstdc++
which seems to be still present with the library shipping with gcc-10.2.0
. Using clang++
with libc++
(if necessary, use -stdlib=libc++
when invoking clang++
) doesn't show the same behavior.
As an aside: the unformatted input operations are setting a count of characters read which can be accessed using gcount()
. Seeking within a stream is a rather way more expensive operation than accessing this count. Using gcount()
also shows the problem (and speaking of expensive operations, I also replaced use of std::endl
by using '\n'
; see this video or this article for more details):
#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>
int main() {
std::istringstream buffer("1234567890 ");
buffer.ignore(10, ' ');
std::cout << "gcount: " << buffer.gcount() << '\n';
std::cout << "eof: " << std::boolalpha << buffer.eof() << '\n';
}
回答2:
cppreference is notorious -- you should generally not rely on it for corner cases in the language, and refer to the spec instead, which says:
Effects: Behaves as an unformatted input function (as described above). After constructing a sentry object, extracts characters and discards them. Characters are extracted until any of the following occurs:
- n != numeric_limits::max() (18.3.2) and n characters have been extracted so far
- end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit), which may throw ios_base::failure (27.5.5.4));
- traits::eq_int_type(traits::to_int_type(c), delim) for the next available input character c (in which case c is extracted).
Using "any of" here instead of "one of" makes it clear that ignore
will stop if more than one of the conditions applies. That's essentiall the issue here -- both the first and thrid conditions apply, which brings up an underspecified corner case -- the third condition states that the next available character (that matches the delimiter) will also be extracted.
So this is exactly what the library is doing in this case -- the third condition applies, so it extracts the character. The fact that the first condition also applies is immaterial.
来源:https://stackoverflow.com/questions/64204443/why-does-stdbasic-istreamignore-extract-more-characters-than-specified