UTF-8 in Windows

前端未结

关注

 4  1384

别那么骄傲

How do I set the code page to UTF-8 in a C Windows program?

I have a third party library that uses fopen to open files. I can use wcstombs to convert my Unicode fi

相关标签:

4条回答

北恋

2020-12-09 08:56

Unfortunately, there is no way to make Unicode the current codepage in Windows. The CP_UTF7 and CP_UTF8 constants are pseudo-codepages, used only in MultiByteToWideChar and WideCharToMultiByte conversion functions, like Ben mentioned.

Your problem is similar to that of the fstream C++ classes. The fstream constructors accept only char* names, making impossible to open a file with a true Unicode name. The only solution offered by VC was a hack: open the file separately and then set the handle to the stream object. I'm afraid this isn't an option for you, of course, since the third party library probably doesn't accept handles.

The only solution I can think of is to create a temporary file with a non-Unicode name, which is hard-linked to the original, and use that as a parameter.

0 讨论(0)
发布评论:

提交评论
- 加载中...
广开言路

2020-12-09 09:01
2018 update: Windows 10 has made the "65001" code page less "pseudo" in two steps:
1. conhost changes: Windows Subsystem for Linux uses code page 65001 for its consoles. It is also possible to run chcp 65001 in cmd.exe since WSL. (It has caused some pretty dumb Python bugs.)
2. full-featured locale: Windows since build 17035 allows setting UTF-8 as the locale codepage. This is available from the April 2018 update.
0 讨论(0)
发布评论:

提交评论
- 加载中...
梦毁少年i

2020-12-09 09:10
All Windows APIs think in UTF-16, so you're better off writing a wrapper around your library that converts at the boundaries.

Oddly enough, Windows thinks UTF-8 is a codepage for the purposes of conversion, so you use the same APIs as you would to convert between codepages:
```
std::wstring Utf8ToUtf16(const char* u8string)
{
    int wcharcount = strlen(u8string);
    wchar_t *tempWstr = new wchar_t[wcharcount];
    MultiByteToWideChar(CP_UTF8, 0, u8string, -1, tempWstr, wcharcount);
    wstring w(tempWstr);
    delete [] tempWstr;
    return w;
}
```
And something of similar form to convert back.
0 讨论(0)
发布评论:

提交评论
- 加载中...
死守一世寂寞

2020-12-09 09:19

Use cygwin (which provides a UTF-8 locale by default), or write your own libc hack for Windows that does the necessary UTF-8 to UTF-16 translations and wraps the nonstandard _wfopen etc. functions.

0 讨论(0)
发布评论:

提交评论
- 加载中...