问题
C++ standard library supports a few ways to introduce custom delimiters for input streams, as I understand recommended way is a using new locale and ctype objects:
first way (inherited from ctype specialization) :
struct csv_whitespace : std::ctype<char>
{
bool do_is(mask m, char_type c) const
{
if ((m & space) && c == ' ') {
return false; // space will NOT be classified as whitespace
}
if ((m & space) && c == ',') {
return true; // comma will be classified as whitespace
}
return ctype::do_is(m, c); // leave the rest to the parent class
}
};
// for cin stream :
cin.imbue(std::locale(cin.getloc(), new csv_whitespace));
second way (parameterized ctype specialization):
// getting existing table for ctype<char> specialization
const auto temp = std::ctype<char>::classic_table();
// create a copy of the table in vector container
std::vector<std::ctype<char>::mask> new_table_vector(temp, temp + std::ctype<char>::table_size);
// add/remove stream separators using bitwise arithmetic.
// use char-based indices because ascii codes here are equal to indices
new_table_vector[' '] ^= ctype_base::space;
new_table_vector['\t'] &= ~(ctype_base::space | ctype_base::cntrl);
new_table_vector[':'] |= ctype_base::space;
// A ctype initialized with new_table_vector would delimit on '\n' and ':' but not ' ' or '\t'.
// ....
// usage of the mask above.
cin.imbue(locale(cin.getloc(), new std::ctype<char>(new_table_vector.data())));
But is there way to include a delimiters into a resulted tokens? e.g.
aaa&bbb*ccc%ddd&eee
where
& * %
are delimiters defined using one of methods above. and result strings would be:
aaa
&bbb
*ccc
%ddd
&eee
so you see - that delimiters are included into result strings. this is a question - how to configure (and is it possible?) input stream for that?
Thank you
回答1:
The short answer is no, istream
s do not provide an inate method for extracting and retaining separators. istream
s provide the following extraction methods:
- operator>> - discards the delimiter
- get - does not extract a delimiter at all
- getline - discard a delimiter
- read - doesn't respect delimiters
- readsome - doesn't respect delimiters
However, let's assume that you slurpped your istream
into string foo
, then you could use a regex like this to tokenize:
((?:^|[&*%])[^&*%]*)
Live Example
This could be used with a regex_token_iterator like this:
const regex re{ "((?:^|[&*%])[^&*%]*)" };
const vector<string> bar{ sregex_token_iterator(cbegin(foo), cend(foo), re, 1), sregex_token_iterator() };
Live Example
来源:https://stackoverflow.com/questions/50154766/how-to-include-c-input-stream-delimiters-into-result-tokens