问题
I am using the code below to try and match symbols using regex, (as an example, I am trying to match the circle star symbol, http://graphemica.com/%E2%9C%AA)
#include <boost/regex.hpp>
//...
std::wstring text = L"a✪c";
auto re = L"(\\p{S}|\\p{L})+?";
boost::wregex r(re);
boost::regex_token_iterator<std::wstring::const_iterator>
i(boost::make_regex_token_iterator(text, r, 1)), j;
while (i != j)
{
std::wstring x = *i;
++i;
}
//...
The byte value of text
is {97, 10026, 99}
, (or `{0x61,0x272A, 0x63}').
So it is a valid symbol.
The code matches the 2 letters, 'a'
0x61
and 'c'``0x63
, but not the symbol ✪
(0x272A
).
I have tried it with a couple of other symbols and none of them work, (© for example).
What am I missing here?
回答1:
The Boost.Regex documentation explicitly states that there's no support for Unicode-specific character classes when using boost::wregex
.
If you want this functionality, you'll need to build Boost.Regex with ICU support enabled then use the boost::u32regex
type instead of boost::wregex
.
来源:https://stackoverflow.com/questions/38525120/common-symbols-ps-not-been-matched-using-boost-wregex