wchar-t

wchar_t for UTF-16 on Linux?

落花浮王杯 提交于 2019-12-07 04:54:34
问题 Does it make any sense to store UTF-16 encoded text using wchar_t* on Linux? The obvious problem is that wchar_t is four bytes on Linux and UTF-16 takes usually two (or sometimes two groups of two) bytes per character. I'm trying to use a third-party library that does exactly that and it seems very confusing. Looks like things are messed up because on Windows wchar_t is two bytes, but I just want to double check since it's a pretty expensive commercial library and may be I just don't

how does windows wchar_t handle unicode characters outside the basic multilingual plane?

落花浮王杯 提交于 2019-12-06 18:31:27
问题 I've looked at a number of other posts here and elsewhere (see below), but I still don't have a clear answer to this question: How does windows wchar_t handle unicode characters outside the basic multilingual plane? That is: many programmers seem to feel that UTF-16 is harmful because it is a variable-length code. wchar_t is 16-bits wide on windows, but 32-bits wide on Unix/MacOS The Windows APIs use wide-characters, not Unicode. So what does Windows do when you want to code something like 𠂊

swprintf chokes on characters outside 8-bit range

偶尔善良 提交于 2019-12-06 08:19:00
问题 This happens on OS X, though I suspect it applies to any UNIX-y OS. I have two strings that look like this: const wchar_t *test1 = (const wchar_t *)"\x44\x00\x00\x00\x73\x00\x00\x00\x00\x00\x00\x00"; const wchar_t *test2 = (const wchar_t *)"\x44\x00\x00\x00\x19\x20\x00\x00\x73\x00\x00\x00\x00\x00\x00\x00"; In the debugger, test1 looks like "Ds" and test2 looks like "D's" (with the curly apostrophe). I then call this code: wchar_t buf1[100], buf2[100]; int ret1 = swprintf(buf1, 100, L"%ls",

Assigning non-ASCII characters to wide char and printing with printf

浪子不回头ぞ 提交于 2019-12-06 02:32:09
How can I assign non-ASCII characters to a wide char and print it to the console? This code down doesn't work: #include <stdio.h> int main(void) { wchar_t wc = L'ć'; printf("%lc\n", wc); printf("%ld\n", wc); return 0; } Output: 263 Press [Enter] to close the terminal ... I'm using MinGW GCC on Windows 7. I think your calls to printf() fail with an «Illegal byte sequence» error returned in errno , at least that is what happens here on MacOS X with the above example code (and also if using wprintf() instead of printf() ). For me it works when I call setlocale(LC_ALL, ""); before the call to

QChar to wchar_t

廉价感情. 提交于 2019-12-06 01:48:01
问题 I need to convert a QChar to a wchar_t I've tried the following: #include <cstdlib> #include <QtGui/QApplication> #include <iostream> using namespace std; int main(int argc, char** argv) { QString mystring = "Hello World\n"; wchar_t myArray[mystring.size()]; for (int x=0; x<mystring.size(); x++) { myArray[x] = mystring.at(x).toLatin1(); cout << mystring.at(x).toLatin1(); // checks the char at index x (fine) } cout << "myArray : " << myArray << "\n"; // doesn't give me correct value return 0;

wchar_t for UTF-16 on Linux?

╄→гoц情女王★ 提交于 2019-12-05 11:28:50
Does it make any sense to store UTF-16 encoded text using wchar_t* on Linux? The obvious problem is that wchar_t is four bytes on Linux and UTF-16 takes usually two (or sometimes two groups of two) bytes per character. I'm trying to use a third-party library that does exactly that and it seems very confusing. Looks like things are messed up because on Windows wchar_t is two bytes, but I just want to double check since it's a pretty expensive commercial library and may be I just don't understand something. While it's possible to store UTF-16 in wchar_t , such wchar_t values (or arrays of them

Conversion of wchar_t* to string [duplicate]

感情迁移 提交于 2019-12-05 10:00:29
问题 This question already has answers here : How do I convert wchar_t* to std::string? (6 answers) Closed 4 years ago . How can I convert an wchar_t* array to an std::string varStr in win32 console. 回答1: Use wstring, see this code: // Your wchar_t* wchar_t* txt = L"Hello World"; wstring ws(txt); // your new String string str(ws.begin(), ws.end()); // Show String cout << str << endl; 回答2: You should use the wstring class belonging to the namespace std. It has a constructor which accepts a

How do you efficiently copy BSTR to wchar_t[]?

北城以北 提交于 2019-12-05 08:31:55
I have a BSTR object that I would like to convert to copy to a wchar__t object. The tricky thing is the length of the BSTR object could be anywhere from a few kilobytes to a few hundred kilobytes. Is there an efficient way of copying the data across? I know I could just declare a wchar_t array and alway allocate the maximum possible data it would ever need to hold. However, this would mean allocating hundreds of kilobytes of data for something that potentially might only require a few kilobytes. Any suggestions? Euro Micelli First, you might not actually have to do anything at all, if all you

Lowercase of Unicode character

旧巷老猫 提交于 2019-12-05 07:47:45
I am working on a C++ project that need to get data from unicode text . I have a problem that I can't lower some unicode character . I use wchar_t to store unicode character which read from a unicode file. After that, I use _wcslwr to lower a wchar_t string. There are many case still not lower such as: Đ Â Ă Ê Ô Ơ Ư Ấ Ắ Ế Ố Ớ Ứ Ầ Ằ Ề Ồ Ờ Ừ Ậ Ặ Ệ Ộ Ợ Ự which lower case is: đ â ă ê ô ơ ư ấ ắ ế ố ớ ứ ầ ằ ề ồ ờ ừ ậ ặ ệ ộ ợ ự I have try tolower and it is still not working. If you call only tolower , it will call std::tolower from header clocale which will call the tolower for ansi character only.

how does windows wchar_t handle unicode characters outside the basic multilingual plane?

自闭症网瘾萝莉.ら 提交于 2019-12-05 00:03:40
I've looked at a number of other posts here and elsewhere (see below), but I still don't have a clear answer to this question: How does windows wchar_t handle unicode characters outside the basic multilingual plane? That is: many programmers seem to feel that UTF-16 is harmful because it is a variable-length code. wchar_t is 16-bits wide on windows , but 32-bits wide on Unix/MacOS The Windows APIs use wide-characters, not Unicode. So what does Windows do when you want to code something like 𠂊 (U+2008A) Han Character on Windows? The implementation of wchar_t under the Windows stdlib is UTF-16