Cross-platform strings (and Unicode) in C++

前端未结

关注

 4  780

半阙折子戏

So I\'ve finally gotten back to my main task - porting a rather large C++ project from Windows to the Mac.

Straight away I\'ve been hit by the problem where wchar_t

相关标签:

4条回答

臣服心动

2020-12-25 09:06

Always use a protocol defined to the byte when a file or network connection is involved. Do not rely on how a C++ compiler stores anything in memory. For Unicode text, this means choosing both an encoding and a byte order (okay, UTF-8 doesn't care about byte order). Even if the platforms you currently want to support have similar architectures, another popular platform with different behavior or even a new OS for one of your existing platforms will likely come along, and you'll be glad you wrote portable code.

0 讨论(0)
发布评论:

提交评论
- 加载中...
情书的邮戳

2020-12-25 09:06

As a rule of thumb: UTF-16 for processing, UTF-8 for communication & storage.

Sure, any rule can be broken and this one is not carved in stone. But you have to know when it is ok to break it.

For instance it might be a good idea to use something else if the environment you are using wants something else. But Mac OS X APIs use UTF-16, same as Windows. So UTF-16 makes more sense. It is more straightforward to convert before you put/get things on the net (because you probably do it in 2-3 routines) than doing all the conversions to call OS APIs.

It also matter the type of application you develop. If it is something with very little text processing, and very little calls to the system (something like an email server that mostly moves things around without changing them), then UTF-8 might be a good choice.

So, as much as you might hate this answer, "it depends".

0 讨论(0)
发布评论:

提交评论
- 加载中...
长情又很酷

2020-12-25 09:07

ICU has a C++ string class, UnicodeString

0 讨论(0)
发布评论:

提交评论
- 加载中...
陌清茗

2020-12-25 09:15
I tend to use UTF-8 as the internal representation. You only lose string length checking, with isn't really useful anyways. For Windows API conversion, I use my own Win32 conversion functions I devised here. As Mac and linux are (for the most part standard UTF-8-aware, no need to convert anything there). Free bonuses you get:
1. use plain old std::string.
2. byte-wise network/stream transport.
3. For most languages, nice memory footprint.
4. For more functionality: utf8cpp
0 讨论(0)
发布评论:

提交评论
- 加载中...