This question already has an answer here:
What are TCHAR
strings, such as LPTSTR
and LPCTSTR
and how can I work with these? When I create a new project in Visual Studio it creates this code for me:
#include <tchar.h>
int _tmain(int argc, _TCHAR* argv[])
{
return 0;
}
How can I, for instance, concatenate all the command line arguments?
If I'd want to open a file with the name given by the first command line argument, how can I do this? The Windows API defines 'A' and 'W' versions of many of its functions, such as CreateFile
, CreateFileA
and CreateFileW
; so how do these differ from one another and which one should I use?
Let me start off by saying that you should preferably not use TCHAR
for new Windows projects and instead directly use Unicode. On to the actual answer:
Character Sets
The first thing we need to understand is how character sets work in Visual Studio. The project property page has an option to select the character set used:
- Not Set
- Use Unicode Character Set
- Use Multi-Byte Character Set
Depending on which of the three option you choose, a lot of definitions change to accommodate the selected character set. There are three main classes: strings, string routines from tchar.h
, and API functions:
- 'Not Set' corresponds to
TCHAR = char
using ANSI encoding, where you use the standard 8-bit code page of the system for strings. Alltchar.h
string routines use the basicchar
versions. All API functions that work with strings will use the 'A' version of the API function. - 'Unicode' corresponds to
TCHAR = wchar_t
using UTF-16 encoding. Alltchar.h
string routines use thewchar_t
versions. All API functions that work with strings will use the 'W' version of the API function. - 'Multi-Byte' corresponds to
TCHAR = char
, using some multi-byte encoding scheme. Alltchar.h
string routines use the multi-byte character set versions. All API functions that work with strings will use the 'A' version of the API function.
Related reading: About the "Character set" option in visual studio 2010
TCHAR.h header
The tchar.h
header is a helper for using generic names for the C string operations on strings, that switch to the correct function for the given character set. For instance, _tcscat
will switch to either strcat
(not set), wcscat
(unicode), or _mbscat
(mbcs). _tcslen
will switch to either strlen
(not set), wcslen
(unicode), or strlen
(mbcs).
The switch happens by defining all _txxx
symbols as macro's that evaluate to the correct function, depending on the compiler switches.
The idea behind it is that you can use the encoding-agnostic types TCHAR
(or _TCHAR
) and the encoding-agnostic functions that work on them, from tchar.h
, instead of the regular string functions from string.h
.
Similarly, _tmain
is defined to be either main
or wmain
. See also: What is the difference between _tmain() and main() in C++?
A helper macro _T(..)
is defined for getting string literals of the correct type, either "regular literals"
or L"wchar_t literals"
.
See the caveats mentioned here: Is TCHAR still relevant? -- dan04's answer
_tmain
example
For the example of main in the question, the following code concatenates all the strings passed as command line arguments into one.
int _tmain(int argc, _TCHAR *argv[])
{
TCHAR szCommandLine[1024];
if (argc < 2) return 0;
_tcscpy(szCommandLine, argv[1]);
for (int i = 2; i < argc; ++i)
{
_tcscat(szCommandLine, _T(" "));
_tcscat(szCommandLine, argv[i]);
}
/* szCommandLine now contains the command line arguments */
return 0;
}
(Error checking is omitted) This code works for all three cases of the character set, because everywhere we used TCHAR
, the tchar.h
string functions and _T
for string literals. Forgetting to surround your string literals with _T(..)
is a common source of compiler errors when writing such TCHAR
-programs.
If we had not done all these things, then switching character sets would cause the code to either not compile, or worse, compile but misbehave during runtime.
Windows API functions
Windows API functions that work on strings, such as CreateFile
and GetCurrentDirectory
, are implemented in the Windows headers as macro's that, like the tchar.h
macro's, switch to either the 'A' version or 'W' version. For instance, CreateFile
is a macro that is defined to CreateFileA
for ANSI and MBCS, and to CreateFileW
for Unicode.
Whenever you use the flat form (without 'A' or 'W') in your code, the actual function called will switch depending on the selected character set. You can force the use of a particular version by using the explicit 'A' or 'W' names.
The conclusion is that you should always use the unqualified name, unless you want to always refer to a specific version, independently of the character set option.
For the example in the question, where we want to open the file given by the first argument:
int _tmain(int argc, _TCHAR *argv[])
{
if (argc < 2) return 1;
HANDLE hFile = CreateFile(argv[1], GENERIC_READ, 0, NULL, OPEN_EXISTING, 0, NULL);
/* Read from file and do other stuff */
...
CloseHandle(hFile);
return 0;
}
(Error checking is omitted) Note that for this example, nowhere we needed to use any of the TCHAR
specific stuff, because the macro definitions have already taken care of this for us.
Utilising C++ strings
We've seen how we can use the tchar.h
routines to use C style string operations to work with TCHAR
s, but it would be nice if we could leverage C++ string
s to work with this.
My advice would foremost be to not use TCHAR
and instead use Unicode directly, see the Conclusion section, but if you want to work with TCHAR
you can do the following.
To use TCHAR
, what we want is an instance of std::basic_string
that uses TCHAR
. You can do this by typedef
ing your own tstring
:
typedef std::basic_string<TCHAR> tstring;
For string literals, don't forget to use _T
.
You'll also need to use the correct versions of cin
and cout
. You can use references to implement a tcin
and tcout
:
#if defined(_UNICODE)
std::wistream &tcin = wcin;
std::wostream &tcout = wcout;
#else
std::istream &tcin = cin;
std::ostream &tcout = cout;
#end
This should allow you to do almost anything. There might be the occasional exception, such as std::to_string
and std::to_wstring
, for which you can find a similar workaround.
Conclusion
This answer (hopefully) details what TCHAR
is and how it's used and intertwined with Visual Studio and the Windows headers. However, we should also wonder if we want to use it.
My advice is to directly use Unicode for all new Windows programs and don't use TCHAR
at all!
Others giving the same advice: Is TCHAR still relevant?
To use Unicode after creating a new project, first ensure the character set is set to Unicode. Then, remove the #include <tchar.h>
from your source file (or from stdafx.h
). Fix up any TCHAR
or _TCHAR
to wchar_t
and _tmain
to wmain
:
int wmain(int argc, wchar_t *argv[])
For non-console projects, the entry point for Windows applications is WinMain
and will appear in TCHAR
-jargon as
int APIENTRY _tWinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPTSTR lpCmdLine, int nCmdShow)
and should become
int APIENTRY wWinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPWSTR lpCmdLine, int nCmdShow)
After this, only use wchar_t
strings and/or std::wstring
s.
Further caveats
- Be careful when writing
sizeof(szMyString)
when usingTCHAR
arrays (strings), because for ANSI this is the size both in characters and in bytes, for Unicode this is only the size in bytes and the number of characters is at most half, and for MBCS this is the size in bytes and the number of characters may or may not be equal. Both Unicode and MBCS can use multipleTCHAR
s to encode a single character. - Mixing
TCHAR
stuff and fixedchar
orwchar_t
is very annoying; you have to convert the strings from one to the other, using the correct code page! A simple copy will not work in the general case. - There is a slight difference between
_UNICODE
andUNICODE
, relevant if you want to conditionally define your own functions. See Why both UNICODE and _UNICODE?
A very good, complementary answer is: Difference between MBCS and UTF-8 on Windows
来源:https://stackoverflow.com/questions/33836706/what-are-tchar-strings-and-the-a-or-w-version-of-win32-api-functions