TCHAR szExeFileName[MAX_PATH];
GetModuleFileName(NULL, szExeFileName, MAX_PATH);
CString tmp;
lstrcpy(szExeFileName, tmp);
CString out;
out.Format(\"\\nInstall32 a
My guess is you are compiling in Unicode mode.
Try enclosing your format string in the _T macro, which is designed to provide an always-correct method of providing constant string parameters, regardless of whether you're compiling in Unicode or ANSI mode:
out.Format(_T("\nInstall32 at %s\n"), tmp);
The accepted answer addresses the problem. But the question also asked for a better understanding of the differences among all the character types on Windows.
Encodings
A char
on Windows (and virtually all other systems) is a single byte. A byte is typically interpreted as either an unsigned value [0..255] or a signed value [-128..127]. (Older C++ standards guarantees a signed range of only [-127..127], but most implementations give [-128..127]. I believe C++11 guarantees the larger range.)
ASCII is a character mapping for values in the range [0..127] to particular characters, so you can store an ASCII character in either a signed byte or an unsigned byte, and thus it will always fit in a char
.
But ASCII doesn't have all the characters necessary for most languages, so the character sets were often extended by using the rest of the values available in a byte to represent the additional characters needed for certain languages (or families of languages). So, while [0..127] almost always mean the same thing, values like 150 can only be interpreted in the context of a particular encoding. For single-byte alphabets, these encodings are called code pages.
Code pages helped, but they didn't solve all the problems. You always had to know which code page a particular document used in order to interpret it correctly. Furthermore, you typically couldn't write a single document that used different languages.
Also, some languages have more than 256 characters, so there was no way to map one char
to one character. This led to the development of multi-byte character encodings, where [0..127] is still ASCII, but some of the other values are "escapes" that mean you have to look at some number of following char
s to figure out what character you really had. (It's best to think of multi-byte as variable-byte, as some characters require only one byte while other require two or more.) Multi-byte works, but it's a pain to code for.
Meanwhile, memory was becoming more plentiful, so a bunch of organizations got together and created Unicode, with the goal of making a universal mapping of values to characters (for appropriately vague definitions of "characters"). Initially, it was believed that all characters (or at least all the ones anyone would ever use) would fit into 16-bit values, which was nice because you wouldn't have to deal with multi-byte encodings--you'd just use two bytes per character instead of one. About this time, Microsoft decided to adopt Unicode as the internal representation for text in Windows.
WCHAR
So Windows has a type called WCHAR
, a two-byte value that represents a "Unicode" "character". I'm using quotation marks here because Unicode evolved past the original two-byte encoding, so what Windows calls "Unicode" isn't really Unicode today--it's actually a particular encoding of Unicode called UTF-16. And a "character" is not as simple a concept in Unicode as it was in ASCII, because, in some languages, characters combine or otherwise influence adjacent characters in interesting ways.
Newer versions of Windows used these 16-bit WCHAR
values for text internally, but there was a lot of code out there still written for single-byte code pages, and even some for multi-byte encodings. Those programs still used char
s rather than WCHAR
s. And many of these programs had to work with people using older versions of Windows that still used char
s internally as well as newer ones that use WCHAR
. So a technique using C macros and typedefs was devised so that you could mostly write your code one way and--at compile time--choose to have it use either char
or WCHAR
.
TCHAR
To accomplish this flexibility, you use a TCHAR
for a "text character". In some header file (often <tchar.h>
), TCHAR
would be typedef'ed to either char
or WCHAR
, depending on the compile time environment. Windows headers adopted conventions like this:
LPTSTR
is a (long) pointer to a string of TCHAR
s.LPWSTR
is a (long) pointer to a string of WCHAR
s.LPSTR
is a (long) pointer to a string of char
s.(The L
for "long" is a leftover from 16-bit days, when we had long, far, and near pointers. Those are all obsolete today, but the L
prefix tends to remain.)
Most of the Windows API functions that take and return strings were actually replaced with two versions: the A
version (for "ANSI" characters) and the W
version (for wide characters). (Again, historical legacy shows in these. The code pages scheme was often called ANSI code pages, though I've never been clear if they were actually ruled by ANSI standards.)
So when you call a Windows API like this:
SetWindowText(hwnd, lptszTitle);
what you're really doing is invoking a preprocessor macro that expands to either SetWindowTextA
or SetWindowTextW
. It should be consistent with however TCHAR
is defined. That is, if you want strings of char
s, you'll get the A
version, and if you want strings of WCHAR
s, you get the W
version.
But it's a little more complicated because of string literals. If you write this:
SetWindowText(hwnd, "Hello World"); // works only in "ANSI" mode
then that will only compile if you're targeting the char
version, because "Hello World"
is a string of char
s, so it's only compatible with the SetWindowTextA
version. If you wanted the WCHAR
version, you'd have to write:
SetWindowText(hwnd, L"Hello World"); // only works in "Unicode" mode
The L
here means you want wide characters. (The L
actually stands for long, but it's a different sense of long than the long pointers above.) When the compiler sees the L
prefix on the string, it knows that string should be encoded as a series of wchar_t
s rather than char
s.
(Compilers targeting Windows use a two-byte value for wchar_t
, which happens to be identical to what Windows defined a WCHAR
. Compilers targeting other systems often use a four-byte value for wchar_t
, which is what it really takes to hold a single Unicode code point.)
So if you want code that can compile either way, you need another macro to wrap the string literals. There are two to choose from: _T()
and TEXT()
. They work exactly the same way. The first comes from the compiler's library and the second from the OS's libraries. So you write your code like this:
SetWindowText(hwnd, TEXT("Hello World")); // compiles in either mode
If you're targeting char
s, the macro is a no-op that just returns the regular string literal. If you're targeting WCHAR
s, the macro prepends the L
.
So how do you tell the compiler that you want to target WCHAR
? You define UNICODE
and _UNICODE
. The former is for the Windows APIs and the latter is for the compiler libraries. Make sure you never define one without the other.