I am working on a C++
project that need to get data from unicode text
.
I have a problem that I can't lower some unicode character
.
I use wchar_t
to store unicode character which read from a unicode file. After that, I use _wcslwr
to lower a wchar_t
string. There are many case still not lower such as:
Đ Â Ă Ê Ô Ơ Ư Ấ Ắ Ế Ố Ớ Ứ Ầ Ằ Ề Ồ Ờ Ừ Ậ Ặ Ệ Ộ Ợ Ự
which lower case is:
đ â ă ê ô ơ ư ấ ắ ế ố ớ ứ ầ ằ ề ồ ờ ừ ậ ặ ệ ộ ợ ự
I have try tolower
and it is still not working.
If you call only tolower
, it will call std::tolower
from header clocale
which will call the tolower
for ansi character only.
The correct signature should be:
template< class charT >
charT tolower( charT ch, const locale& loc );
Here below is 2 versions which works well:
#include <iostream>
#include <cwctype>
#include <clocale>
#include <algorithm>
#include <locale>
int main() {
std::setlocale(LC_ALL, "");
std::wstring data = L"Đ Â Ă Ê Ô Ơ Ư Ấ Ắ Ế Ố Ớ Ứ Ầ Ằ Ề Ồ Ờ Ừ Ậ Ặ Ệ Ộ Ợ Ự";
std::wcout << data << std::endl;
// C std::towlower
for(auto c: data)
{
std::wcout << static_cast<wchar_t>(std::towlower(c));
}
std::wcout << std::endl;
// C++ std::tolower(charT, std::locale)
std::locale loc("");
for(auto c: data)
{
// This is recommended
std::wcout << std::tolower(c, loc);
}
std::wcout << std::endl;
return 0;
}
Reference:
来源:https://stackoverflow.com/questions/34433380/lowercase-of-unicode-character