Convert ASCII string to Unicode? Windows, pure C

后端 未结 6 870
滥情空心
滥情空心 2021-01-13 06:08

I\'ve found answers to this question for many programming languages, except for C, using the Windows API. No C++ answers please. Consider the following:

#inc         


        
相关标签:
6条回答
  • 2021-01-13 06:46

    If you KNOW that the input is pure ASCII and there are no extended character sets involved, there's no need to call any fancy conversion function. All the character codes in ASCII are the same in Unicode, so all you need to do is copy from one array to the other.

    #include <windows.h>
    char *string = "The quick brown fox jumps over the lazy dog";
    int len = strlen(string);
    WCHAR unistring[len+1];
    int i;
    for (i = 0; i <= len; ++i)
        unistring[i] = string[i];
    
    0 讨论(0)
  • 2021-01-13 06:53

    You can use mbstowcs to convert from "multibyte" to wide character strings.

    0 讨论(0)
  • 2021-01-13 07:01

    If you are really serious about Unicode, you should refer to International Components for Unicode, which is a cross-platform solution for handling Unicode conversions and storage in either C or C++.

    Your WCHAR, for example, is not Unicode to begin with, because Microsoft somewhat prematurely defined wchar_t to be 16bit (UCS-2), and got stuck in backward compatibility hell when Unicode became 32bit: UCS-2 is almost, but not quite identical to UTF-16, the latter being in fact a multibyte encoding just like UTF-8. "Wide" format in Unicode means 32 bit (UTF-32), and even then you don't have a 1:1 relationship between code points (i.e. 32bit-values) and abstract characters (i.e. a printable glyph).

    Gratuituous, losely related list of links:

    • The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
    • The UTF-8 Everywhere Manifesto
    • Commonly confused characters by Greg Baker
    0 讨论(0)
  • 2021-01-13 07:04

    MultiByteToWideChar:

    #include <windows.h>
    char *string = "The quick brown fox jumps over the lazy dog";
    size_t len = strlen(string);
    WCHAR unistring[len + 1];
    int result = MultiByteToWideChar(CP_OEMCP, 0, string, -1, unistring, len + 1);
    
    0 讨论(0)
  • 2021-01-13 07:05

    You should look into MultiByteToWideChar function.

    0 讨论(0)
  • 2021-01-13 07:07

    This is another way to do it. It's not as direct, but when you don't feel like typing in 6 arguments in a very specific order, and remembering codepage numbers/macros to MultiByteToWideChar, it does the job. Takes 16 microseconds on this laptop to perform, most of it(9 microseconds) spent in AddAtomW.

    For reference, MultiByteToWideChar takes between 0 and 1 microseconds.

    #include <Windows.h>
    
    const wchar_t msg[] = L"We did it!";
    
    int main(int argc, char **argv)
    {
        char result[(sizeof(msg) / 2) + 1];        
        ATOM tmp;
    
        tmp = AddAtomW(msg);
        GetAtomNameA(tmp, result, sizeof(result));
        MessageBoxA(NULL ,result,"it says", MB_OK | MB_ICONINFORMATION);
        DeleteAtom(tmp);
    
        return 0;
    }
    
    0 讨论(0)
提交回复
热议问题