wchar-t | 易学教程

wchar_t for UTF-16 on Linux?

阅读更多关于 wchar_t for UTF-16 on Linux?

问题 Does it make any sense to store UTF-16 encoded text using wchar_t* on Linux? The obvious problem is that wchar_t is four bytes on Linux and UTF-16 takes usually two (or sometimes two groups of two) bytes per character. I'm trying to use a third-party library that does exactly that and it seems very confusing. Looks like things are messed up because on Windows wchar_t is two bytes, but I just want to double check since it's a pretty expensive commercial library and may be I just don't

how does windows wchar_t handle unicode characters outside the basic multilingual plane?

阅读更多关于 how does windows wchar_t handle unicode characters outside the basic multilingual plane?

问题 I've looked at a number of other posts here and elsewhere (see below), but I still don't have a clear answer to this question: How does windows wchar_t handle unicode characters outside the basic multilingual plane? That is: many programmers seem to feel that UTF-16 is harmful because it is a variable-length code. wchar_t is 16-bits wide on windows, but 32-bits wide on Unix/MacOS The Windows APIs use wide-characters, not Unicode. So what does Windows do when you want to code something like 𠂊

swprintf chokes on characters outside 8-bit range

阅读更多关于 swprintf chokes on characters outside 8-bit range

问题 This happens on OS X, though I suspect it applies to any UNIX-y OS. I have two strings that look like this: const wchar_t *test1 = (const wchar_t *)"\x44\x00\x00\x00\x73\x00\x00\x00\x00\x00\x00\x00"; const wchar_t *test2 = (const wchar_t *)"\x44\x00\x00\x00\x19\x20\x00\x00\x73\x00\x00\x00\x00\x00\x00\x00"; In the debugger, test1 looks like "Ds" and test2 looks like "D's" (with the curly apostrophe). I then call this code: wchar_t buf1[100], buf2[100]; int ret1 = swprintf(buf1, 100, L"%ls",

Assigning non-ASCII characters to wide char and printing with printf

阅读更多关于 Assigning non-ASCII characters to wide char and printing with printf

How can I assign non-ASCII characters to a wide char and print it to the console? This code down doesn't work: #include <stdio.h> int main(void) { wchar_t wc = L'ć'; printf("%lc\n", wc); printf("%ld\n", wc); return 0; } Output: 263 Press [Enter] to close the terminal ... I'm using MinGW GCC on Windows 7. I think your calls to printf() fail with an «Illegal byte sequence» error returned in errno , at least that is what happens here on MacOS X with the above example code (and also if using wprintf() instead of printf() ). For me it works when I call setlocale(LC_ALL, ""); before the call to

QChar to wchar_t

阅读更多关于 QChar to wchar_t

问题 I need to convert a QChar to a wchar_t I've tried the following: #include <cstdlib> #include <QtGui/QApplication> #include <iostream> using namespace std; int main(int argc, char** argv) { QString mystring = "Hello World\n"; wchar_t myArray[mystring.size()]; for (int x=0; x<mystring.size(); x++) { myArray[x] = mystring.at(x).toLatin1(); cout << mystring.at(x).toLatin1(); // checks the char at index x (fine) } cout << "myArray : " << myArray << "\n"; // doesn't give me correct value return 0;

wchar_t for UTF-16 on Linux?

阅读更多关于 wchar_t for UTF-16 on Linux?

Does it make any sense to store UTF-16 encoded text using wchar_t* on Linux? The obvious problem is that wchar_t is four bytes on Linux and UTF-16 takes usually two (or sometimes two groups of two) bytes per character. I'm trying to use a third-party library that does exactly that and it seems very confusing. Looks like things are messed up because on Windows wchar_t is two bytes, but I just want to double check since it's a pretty expensive commercial library and may be I just don't understand something. While it's possible to store UTF-16 in wchar_t , such wchar_t values (or arrays of them

Conversion of wchar_t* to string [duplicate]

阅读更多关于 Conversion of wchar_t* to string [duplicate]

问题 This question already has answers here : How do I convert wchar_t* to std::string? (6 answers) Closed 4 years ago . How can I convert an wchar_t* array to an std::string varStr in win32 console. 回答1: Use wstring, see this code: // Your wchar_t* wchar_t* txt = L"Hello World"; wstring ws(txt); // your new String string str(ws.begin(), ws.end()); // Show String cout << str << endl; 回答2: You should use the wstring class belonging to the namespace std. It has a constructor which accepts a

How do you efficiently copy BSTR to wchar_t[]?

阅读更多关于 How do you efficiently copy BSTR to wchar_t[]?

I have a BSTR object that I would like to convert to copy to a wchar__t object. The tricky thing is the length of the BSTR object could be anywhere from a few kilobytes to a few hundred kilobytes. Is there an efficient way of copying the data across? I know I could just declare a wchar_t array and alway allocate the maximum possible data it would ever need to hold. However, this would mean allocating hundreds of kilobytes of data for something that potentially might only require a few kilobytes. Any suggestions? Euro Micelli First, you might not actually have to do anything at all, if all you

Lowercase of Unicode character

阅读更多关于 Lowercase of Unicode character

I am working on a C++ project that need to get data from unicode text . I have a problem that I can't lower some unicode character . I use wchar_t to store unicode character which read from a unicode file. After that, I use _wcslwr to lower a wchar_t string. There are many case still not lower such as: Đ Â Ă Ê Ô Ơ Ư Ấ Ắ Ế Ố Ớ Ứ Ầ Ằ Ề Ồ Ờ Ừ Ậ Ặ Ệ Ộ Ợ Ự which lower case is: đ â ă ê ô ơ ư ấ ắ ế ố ớ ứ ầ ằ ề ồ ờ ừ ậ ặ ệ ộ ợ ự I have try tolower and it is still not working. If you call only tolower , it will call std::tolower from header clocale which will call the tolower for ansi character only.

how does windows wchar_t handle unicode characters outside the basic multilingual plane?

阅读更多关于 how does windows wchar_t handle unicode characters outside the basic multilingual plane?

I've looked at a number of other posts here and elsewhere (see below), but I still don't have a clear answer to this question: How does windows wchar_t handle unicode characters outside the basic multilingual plane? That is: many programmers seem to feel that UTF-16 is harmful because it is a variable-length code. wchar_t is 16-bits wide on windows , but 32-bits wide on Unix/MacOS The Windows APIs use wide-characters, not Unicode. So what does Windows do when you want to code something like 𠂊 (U+2008A) Han Character on Windows? The implementation of wchar_t under the Windows stdlib is UTF-16