Text on MFC Controls - Unicode Characters such as Japanese get cut off

问题

Background

I'm working on a C++/MFC application and we've been converting it to display unicode characters to support foreign languages. For the most part this has been successful and unicode characters are displayed correctly. But I've encountered an issue where certain text on certain controls gets cut off.

Example

Here you can see a button that should display "ログアウト／終了" but gets cutoff and displays an unknown character in it's place.

But if I pad the string with spaces it displays fine. The number of spaces needed varies by string. This string needed 4 spaces to display correctly, whereas another string with one less character needed 5 spaces; there doesn't seem to be a correlation or pattern with the number of spaces needed. And also, I don't want to pad strings randomly throughout the code, especially when other languages don't need this at all.

What I've tried (doesn't work)

Shrinking the font size
Resizing the control
Changing the font facename
Changing the font character set
Copying the control properties from another control in the application that does not have this issue
Add extra null terminators
Padding with zero-width characters
Using SetWindowTextW
Changing source and execution character sets
Changing system locale

The only thing I've found that works is padding with an arbitrary amount of spaces which is certainly not an ideal solution.

Other info

I've only noticed this issue for Japanese characters, but have only tested English, German, and Japanese.
Japanese characters use 3 bytes of data, which I suspect has something to do with this but I don't know what or why. English characters use 1 byte and certain German characters use 2 bytes.
A control (button/label/etc) in one place may have an issue whereas a control in a different place that contains the same text does not have the issue, even if they're both buttons..etc.
When the text is cutoff, it typically either displays a question mark box (like the first image) or a random character/letter at the end. This character changes each time I run the application, but the question box is the most common.
For my padding "fix", it doesn't matter if the spaces are at the beginning or end of the string, as long as the number of spaces is enough. It also doesn't need to be spaces, any non-zero-width character works.
Compiled using MBCS (Multibyte Character Set) and the Windows 10 UTF-8 Unicode Support setting enabled. (As opposed to compiling with UNICODE defined which isn't an option. Large old codebase)

EDIT: Here is an example on how the text is set

GetDlgItem(IDC_SOME_CTRL_ID)->SetWindowText(GetTranslation("Some String"));

Where GetTranslation() is our own function to look up the translation of "Some String" (basically a lookup table) and return a CString. Using a debugger I can see the returned CString always has the correct string value. I can replace GetTranslation with a hardcoded Japanese string and the issue will still happen.

EDIT 2: I got complaints that this code wasn't enough.

myapp.rc

// Microsoft Visual C++ generated resource script.
//
#include "resource.h"

#define APSTUDIO_READONLY_SYMBOLS
#include "afxres.h"
#undef APSTUDIO_READONLY_SYMBOLS

IDD_VIEW_MENU DIALOGEX 0, 0, 50, 232
STYLE DS_SETFONT | WS_CHILD
FONT 14, "Verdana", 0, 0, 0x1
BEGIN
    CONTROL         "btn0",IDC_BUTTON_MENU_0,"Button",BS_3STATE | BS_PUSHLIKE,12,38,25,13
END
#endif

resource.h

#define IDC_BUTTON_MENU_0             6040

ViewMenu.cpp

#include "stdafx.h"
#include "ViewMenu.h"

CViewMenu::CViewMenu() : CFormView(CViewMenu::IDD)
{
}

void CViewMenu::DoDataExchange(CDateExchange* pDX)
{
    CFormView::DoDataExchange(pDX);
    DDX_Control(pDX, IDC_BUTTON_MENU_0, m_ctrlMenuButton0);
}

void CViewMenu::OnInitialUpdate()
{
    CFormView::OnInitialUpdate();
}

void CViewMenu::OnDraw(CDC* pDC)
{
    CFormView::OnDraw(pDC);

    GetDlgItem(IDC_BUTTON_MENU_0)->SetWindowText("ログアウト／終了");

    return;
}

ViewMenu.h

#include "resource.h"

class CViewMenu : public CFormView
{
    protected:
        CViewMenu();

    public:
        enum { IDD = IDD_VIEW_MENU };
        CButton m_ctrlMenuButton0;
}

回答1:

The following should work in Windows 10 versions 1903 and later, regardless of the default system locale, and fulfills OP's requirements (string literals, MBCS build, no Unicode windows etc). It was verified to work in version 2004 set to En-US locale, without "Beta: Use Unicode UTF-8 for worldwide language support" checked, using VS 2019 16.7.5 to build.

Save source files containing characters outside the active codepage in UTF-8 encoding, with or without BOM.
Compile with _MBCS defined (in the IDE: Properties / Advanced / Character Set = MBCS).
Compile with the /utf-8 switch (C/C++ / Command Line / Additional Options = /utf-8).

Create a manifest file declaring UTF-8 as the target codepage for the process (per the activeCodePage documentation).

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1" xmlns:asmv3="urn:schemas-microsoft-com:asm.v3">
  <asmv3:application>
    <asmv3:windowsSettings xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">
      <activeCodePage>UTF-8</activeCodePage>
    </asmv3:windowsSettings>
  </asmv3:application>
</assembly>

Add the manifest file to the project (in the IDE: Manifest Tool / General / Input and Output / Additional Manifest Files = manifest file created at the previous step).

回答2:

This ain't Python. With C++ you need to know, why your code works. Otherwise it doesn't.

GetDlgItem(IDC_BUTTON_MENU_0)->SetWindowText("ログアウト／終了");

That's where you and your compiler start to disagree. You think this should be UTF-8. Your compiler, on the other hand, trusts you, and assumes that you are using the source character set.

While you are unaware of a concept called source character set, you get all confused about something that should be the norm: Garbage in, garbage out.

If you feel like fixing the "Garbage in" part (now, clearly, that is your job), read up on C++ string literals. In case you don't make it to the end, the quickest way to fix your ungodly workaround is to use a u8 prefix.

Seriously, though, the real solutions is to use Windows' native character encoding. Which, oddly, you seem to reject, even though you could use it, given a string literal. I mean, it's not like you have to change anything global. Just call SetWindowTextW and use an L prefix.

Just saying, you know...

来源：https://stackoverflow.com/questions/65240035/text-on-mfc-controls-unicode-characters-such-as-japanese-get-cut-off

标签

c++

unicode

mfc