How we can convert a multi language string or unicode string to upper/lower case in C or C++.
I found 2 solution of that problem_
1. setlocale(LC_CTYPE, "en_US.UTF-8"); // the locale will be the UTF-8 enabled English
std::wstring str = L"Zoë Saldaña played in La maldición del padre Cardona.ëèñ";
std::wcout << str << std::endl;
for (wstring::iterator it = str.begin(); it != str.end(); ++it)
*it = towupper(*it);
std::wcout << "toUpper_onGCC_LLVM_1 :: "<< str << std::endl;
this is working on LLVM GCC 4.2 Compiler.
2. std::locale::global(std::locale("en_US.UTF-8")); // the locale will be the UTF-8 enabled English
std::wcout.imbue(std::locale());
const std::ctype<wchar_t>& f = std::use_facet< std::ctype<wchar_t> >(std::locale());
std::wstring str = L"Chloëè";//"Zoë Saldaña played in La maldición del padre Cardona.";
f.toupper(&str[0], &str[0] + str.size());
std::wcout << str << std::endl;
This is working in Apple LLVM 4.2.
Both case i ran on Xocde. But I am finding a way to run this code in Eclipse with g++ Compiler.
You can iterate through a wstring
and use towupper / towlower
for (wstring::iterator it = a.begin(); it != a.end(); ++it)
*it = towupper(*it);
For C I would use toupper
after adjusting the C locale in the current thread.
setlocale(LC_CTYPE, "en_US.UTF8");
For C++ I would use the toupper
method of std::ctype<char>
:
std::locale loc;
auto& f = std::use_facet<std::ctype<char>>(loc);
char str[80] = "Hello World";
f.toupper(str, str+strlen(str));
In Windows, consider CharUpperBuffW
and CharLowerBuffW
for mixed-language applications where locale is unknown. These functions handle diacritics where toupper()
does not.
With quite a lot of difficulty if you're going to do it right.
The usual use-case for this is for comparison purposes, but the problem is more general than that.
There is a fairly detailed paper from C++ Report circa 2000 from Matt Austern here (PDF)
If you want a sane and mature solution, look at IBM's ICU. Here's an example:
#include <iostream>
#include <unicode/unistr.h>
#include <string>
int main(){
icu::UnicodeString us("óóßChloë");
us.toUpper(); //convert to uppercase in-place
std::string s;
us.toUTF8String(s);
std::cout<<"Upper: "<<s<<"\n";
us.toLower(); //convert to lowercase in-place
s.clear();
us.toUTF8String(s);
std::cout<<"Lower: "<<s<<"\n";
return 0;
}
Output:
Upper: ÓÓSSCHLOË
Lower: óósschloë
Note: In the later step SS
isn't being treated as capital of German ß