I have to check if the particular string begins with another one. Strings are encoded using utf8, and a comparison should be case insensitive.
I know that this is ve
The only way I know of that is UTF8/internationalization/culture-aware is the excellent and well-maintained IBM ICU: International Components for Unicode. It's a C/C++ library for *nix or Windows into which a ton of research has gone to provide a culture-aware string library, including case-insensitive string comparison that's both fast and accurate.
IMHO, the two things you should never write yourself unless you're doing a thesis paper are encryption and culture-sensitive string libraries.
Are there any restrictions on what can be in the string you're looking
for? It it's user input, and can be any UTF-8 string, the problem is
extremely complex. As others have mentioned, one character can have
several different representations, so you'd probably have to normalize
the strings first. Then: what counts as equal? Should 'E'
compare
equal to 'é'
(as is usual in some circles in French), or not (which
would be conform to the "official" rules of the Imprimerie nationale).
For all but the most trivial definitions, rolling your own will represent a significant effort. For this sort of thing, the library ICU is the reference. It contains all that you'll need. Note however that it works on UTF16, not UTF8, so you'll have to convert the strings first, as well as normalizing them. (ICU has support for both.)
Using the stl regex classes you could do something like the following snippet. Unfortunately its not utf8. Changing str2
to std::wstring str2 = L"hello World"
results in a lot of conversion warnings. Making str1
an std::wchar
doesn't work at all, since std::regex doesn't allow a whar input (as far as i can see).
#include <regex>
#include <iostream>
#include <string>
int main()
{
//The input strings
std::string str1 = "Hello";
std::string str2 = "hello World";
//Define the regular expression using case-insensitivity
std::regex regx(str1, std::regex_constants::icase);
//Only search at the beginning
std::regex_constants::match_flag_type fl = std::regex_constants::match_continuous;
//display some output
std::cout << std::boolalpha << std::regex_search(str2.begin(), str2.end(), regx, fl) << std::endl;
return 0;
}