问题
I want to match the word "février" or any other month by using regular expression.
Regular expression:
^(JANVIER|FEVRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AOUT|SEPTEMBRE|OCTOBRE|NOVEMBRE|DECEMBRE|Jan|Feb|Mar|Apr|May|Jun|JUN|Jul|Aug|Sep|Oct|Nov|Dec|[jJ]anvier|[Ff]évrier|[mM]ars|[aA]vril|[mM]ai|[jJ]uin|[jJ]uillet|[aA]o[éû]t|aout|[sS]eptembre|[oO]ctobre|[nN]ovembre|[dD][eé]cembre)$
Problem
The problem is that I cannot match the words that contain unicode letters: à,é,è etc.
I found on the following website: Unicode that the unicode value of é
is \u00E9
. Can i integrate this value in the regular expression? and how can I use unicode values in regular expressions.
void returnValue(string pattern)
{
bool x = false;
const boost::regex e("février");
x = boost::regex_search(pattern.c_str(),e);
if(x){ cout <<"found"<<endl; }
}
回答1:
You can match a unicode with boost::regex. There are two ways to do it.
Rely on wchar_t if your platform's wchar_t can hold Unicode characters and your platform's C/C++ runtime correctly handles wide character constants. (this has few pitfalls, not suggested, read about this in the link I provided)
Use a Unicode aware regular expression type (boost::u32regex). Boost has to be configured to enable this via Building With Unicode and ICU Support
http://www.boost.org/doc/libs/1_42_0/libs/regex/doc/html/boost_regex/unicode.html
来源:https://stackoverflow.com/questions/23932970/unicode-regular-expressions-c