I\'m wondering what the Stack Overflow community thinks when it comes to creating a project (thinking primarily c++ here) with a unicode or a multi-byte character set.
Are there pros to going Unicode straight from the start,
A few years and a million lines of code later, you're going to wish you had answered "yes".
implying all your strings will be in wide format?
I wish Microsoft would quit conflating "Unicode" with UTF-16.
You don't have to store all your strings in wide format. You can use UTF-8 instead, and get a smaller memory footprint (for Latin alphabet languages), and backwards compatibility with 7-bit ASCII.
The one downside to using UTF-8 on Windows is that it's not supported as an ANSI code page, so you have to convert your strings to UTF-16 to make WinAPI calls. How much inconvenience this causes depends on whether you're writing a Windows program or a program that just happens to run on Windows.
The short answer (IMO, and I've been proving wrong) is that it'd better to plan for the worse (or best depending on your point of view) and do unicode right now.
Unless your application is very string intensive, then going directly to unicode will not really matter; in the case of games, it should not be a big factor compared to the rest of the engine.
Max.
Here's a simple consideration: should your program work if it's used by Mr. 菅 直人 ? His home directory might be hard to represent in ASCII.
The first answer to that question should... answer everything you need to know.
You are talking about the VC++ Project setting here, right?
The only thing it affects is the version of Win32 API calls it ends up being exectuted. For instance, a call to MessageBox
will end up as a call to MessageBoxA
in case of the multi-byte setting, and MessageBoxW
in case of Unicode setting. Of course, that will affect the types of string parameters to that functions as well. Internally, MessageBoxA
calls MessageBoxW
after converting the string paramteres from the current system locale to Unicode.
My advice is to use the Unicode settings and pass Unicode strings to Win32 API calls. That does not stop you from using strings in any other encoding internally.
Two issues I'd comment on.
First, you don't mention what platform you're targeting. Although recent Windows versions (Win2000, WinXP, Vista and Win7) support both Multibyte and Unicode versions of system calls using strings, the Unicode versions are faster (the multibyte versions are wrappers that convert to Unicode, call the Unicode version, then convert any returned strings back to mutlibyte). So if you're making a lot of these types of calls the Unicode will be faster.
Just because you're not planning on explicitly supporting additional languages, you should still consider supporting Unicode if your application saves and displays text entered by the users. Just because your application is unilingual, it doesn't follow that all it's users will be unilingual too. They may be perfectly happy to use your English language GUI, but might want to enter names, comments or other text in their own language and have them displayed properly.