C++11 case insensitive comparison of beginning of a string (unicode)

后端 未结 3 464
青春惊慌失措
青春惊慌失措 2021-01-05 01:45

I have to check if the particular string begins with another one. Strings are encoded using utf8, and a comparison should be case insensitive.

I know that this is ve

相关标签:
3条回答
  • 2021-01-05 02:17

    The only way I know of that is UTF8/internationalization/culture-aware is the excellent and well-maintained IBM ICU: International Components for Unicode. It's a C/C++ library for *nix or Windows into which a ton of research has gone to provide a culture-aware string library, including case-insensitive string comparison that's both fast and accurate.

    IMHO, the two things you should never write yourself unless you're doing a thesis paper are encryption and culture-sensitive string libraries.

    0 讨论(0)
  • 2021-01-05 02:23

    Are there any restrictions on what can be in the string you're looking for? It it's user input, and can be any UTF-8 string, the problem is extremely complex. As others have mentioned, one character can have several different representations, so you'd probably have to normalize the strings first. Then: what counts as equal? Should 'E' compare equal to 'é' (as is usual in some circles in French), or not (which would be conform to the "official" rules of the Imprimerie nationale).

    For all but the most trivial definitions, rolling your own will represent a significant effort. For this sort of thing, the library ICU is the reference. It contains all that you'll need. Note however that it works on UTF16, not UTF8, so you'll have to convert the strings first, as well as normalizing them. (ICU has support for both.)

    0 讨论(0)
  • 2021-01-05 02:34

    Using the stl regex classes you could do something like the following snippet. Unfortunately its not utf8. Changing str2 to std::wstring str2 = L"hello World" results in a lot of conversion warnings. Making str1 an std::wchar doesn't work at all, since std::regex doesn't allow a whar input (as far as i can see).

    #include <regex>
    #include <iostream>
    #include <string>
    
    int main()
    {
        //The input strings
        std::string str1 = "Hello";
        std::string str2 = "hello World";
    
        //Define the regular expression using case-insensitivity
        std::regex regx(str1, std::regex_constants::icase);
    
        //Only search at the beginning 
        std::regex_constants::match_flag_type fl = std::regex_constants::match_continuous;
    
        //display some output
        std::cout << std::boolalpha << std::regex_search(str2.begin(), str2.end(), regx, fl) << std::endl;
    
        return 0;
    }
    
    0 讨论(0)
提交回复
热议问题