What is QString::toUtf8 doing?

后端 未结 2 346
梦毁少年i
梦毁少年i 2021-01-17 09:13

This may sounds like a obvious question, but I\'m missing something about either how UTF-8 is encoded or how the toUtf8 function works.

Let\'s look at a very simple

相关标签:
2条回答
  • 2021-01-17 10:04

    Running your code I get expected result

    "4dc3bc6c6c6572"

    I think the problem is with your input not output. Check the encoding of your source file and look at void QTextCodec::setCodecForCStrings ( QTextCodec * codec ) [static]

    0 讨论(0)
  • 2021-01-17 10:15

    It depends on the encoding of your source code.

    I tend to think that your file is already encoded in UTF-8, the character ü being encoded as C3 BC.

    You're calling the QString::QString ( const char * str ) constructor which, according to http://doc.qt.io/qt-4.8/qstring.html#QString-8, converts your string to unicode using the QString::fromAscii() method which by default considers the input as Latin1 contents.

    As C3 and BC are both valid in Latin 1, representing respectively à and ¼, converting them to UTF-8 will lead to the following characters:

    Ã (C3) -> C3 83

    ¼ (BC) -> C2 BC

    which leads to the string you get: "4d c3 83 c2 bc 6c 6c 65 72"

    To sum things up, it's double UTF-8 encoding.

    There are several options to solve this issue:

    1) You can convert your source file to Latin-1 using your favorite text editor.

    2) You can properly escape the ü character into \xFC in the litteral string, so the string won't depend on the file's encoding.

    3) you can keep the file and string as UTF-8 data and use QString str = QString::fromUtf8 ("Müller");

    Update: This issue is no longer relevant in QT5. http://doc.qt.io/qt-5/qstring.html#QString-8 states that the constructor now uses QString::fromUtf8() internally instead of QString::fromAscii(). So, as long as UTF-8 encoding is used consistently, it will be used by default.

    0 讨论(0)
提交回复
热议问题