Convert Latin characters from Shift JIS to Latin characters in Unicode

浪子不回头ぞ 提交于 2021-02-11 17:29:43

问题


I'm working on parsing files with Shift-JIS encoded strings within the binary data. My current code is this:

public static string DecodeShiftJISString(this byte[] data, int index, int length)
{
    byte[] utf8Bytes = Encoding.Convert(Encoding.GetEncoding(932), Encoding.UTF8, data);
    return Encoding.UTF8.GetString(utf8Bytes);
}

It works fine and I am able to get usable strings from this method, although when I display strings with Latin characters into my WinForms application, I see that the characters are wider than normal.

Latin characters in Shift-JIS string

I'm not sure if this is an issue with my encoding logic, or the way I'm supposed to display the strings (I just pass them directly into my controls). Any help would be appreciated!


回答1:


These aren't normal ASCII characters, they're ‘fullwidth variants’ in the range U+FF01 fullwidth exclamation mark upwards. They're for lining up formatting when setting a mixture of Latin and CJK characters.

Unicode would prefer weird characters like this, which are just semantically-identical stylistic variants of existing characters, not to exist. But it has to include them to round-trip to legacy encodings like Shift-JIS. For this reason they are called Compatibility characters.

You can convert compatibility characters to their basic variants by using Unicode normalisation with a ‘K’ format such as NFKC. In Win32 you can do this using NormalizeString().



来源:https://stackoverflow.com/questions/33454529/convert-latin-characters-from-shift-jis-to-latin-characters-in-unicode

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!