问题
I'm a bit confused with differences between unsigned char
(which is also BYTE
in WinAPI) and char
pointers.
Currently I'm working with some ATL-based legacy code and I see a lot of expressions like the following:
CAtlArray<BYTE> rawContent;
CALL_THE_FUNCTION_WHICH_FILLS_RAW_CONTENT(rawContent);
return ArrayToUnicodeString(rawContent);
// or return ArrayToAnsiString(rawContent);
Now, the implementations of ArrayToXXString
look the following way:
CStringA ArrayToAnsiString(const CAtlArray<BYTE>& array)
{
CAtlArray<BYTE> copiedArray;
copiedArray.Copy(array);
copiedArray.Add('\0');
// Casting from BYTE* -> LPCSTR (const char*).
return CStringA((LPCSTR)copiedArray.GetData());
}
CStringW ArrayToUnicodeString(const CAtlArray<BYTE>& array)
{
CAtlArray<BYTE> copiedArray;
copiedArray.Copy(array);
copiedArray.Add('\0');
copiedArray.Add('\0');
// Same here.
return CStringW((LPCWSTR)copiedArray.GetData());
}
So, the questions:
Is the C-style cast from
BYTE*
toLPCSTR
(const char*
) safe for all possible cases?Is it really necessary to add double null-termination when converting array data to wide-character string?
The conversion routine
CStringW((LPCWSTR)copiedArray.GetData())
seems invalid to me, is that true?Any way to make all this code easier to understand and to maintain?
回答1:
The C standard is kind of weird when it comes to the definition of a byte. You do have a couple of guarantees though.
- A byte will always be one char in size
- sizeof(char) always returns 1
- A byte will be at least 8 bits in size
This definition doesn't mesh well with older platforms where a byte was 6 or 7 bits long, but it does mean BYTE*,
and char *
are guaranteed to be equivalent.
Multiple nulls are needed at the end of a Unicode string because there are valid Unicode characters that start with a zero (null) byte.
As for making the code easier to read, that is completely a matter of style. This code appears to be written in a style used by a lot of old C Windows code, which has definitely fallen out of favor. There are probably a ton of ways to make it clearer for you, but how to make it clearer has no clear answer.
回答2:
Yes, it is always safe. Because they both point to an array of single-byte memory locations.
LPCSTR
: Long Pointer to Const (single-byte) StringLPCWSTR
: Long Pointer to Const Wide (multi-byte) StringLPCTSTR
: Long Pointer to Const context-dependent (single-byte or multi-byte) StringIn wide character strings, every single character occupies 2 bytes of memory, and the length of the memory location containing the string must be a multiple of 2. So if you want to add a wide '\0' to the end of a string, you should add two bytes.
Sorry for this part, I do not know ATL and I cannot help you on this part, but actually I see no complexity here, and I think it is easy to maintain. What code do you really want to make easier to understand and maintain?
回答3:
- If the BYTE* behaves like a proper string (i.e. the last BYTE is 0), you can cast a BYTE* to a LPCSTR, yes. Functions working with LPCSTR assume zero-terminated strings.
- I think the multiple zeroes are only necessary when dealing with some multibyte character sets. The most common 8-bit encodings (like ordinary Windows Western and also UTF-8) don't require them.
- The
CString
is Microsoft's best attempt at user-friendly strings. For instance, its constructor can handle bothchar
andwchar_t
type input, regardless of whether the CString itself is wide or not, so you don't have to worry about the conversion much.
Edit: wait, now I see that they are abusing a BYTE array for storing wide chars in. I couldn't recommend that.
回答4:
An LPCWSTR is a String with 2 Bytes per character, a "char" is one Byte per character. That means you cannot cast it in C-style, because you have to adjust the memory (add a "0" before each standard-ASCII), and not just read the Data in a different way from the memory (what a C-Cast would do). So the cast is not so safe i would say.
The Double-Nulltermination: You have always 2 Bytes as one Character, so your "End-of-string" sign must be 2 Bytes long.
To make that code easier to understand look after lexical_cast in Boost (http://www.boost.org/doc/libs/1_48_0/doc/html/boost_lexical_cast.html)
Another way would be using the std::strings (using like std::basic_string; ), and you can perform on String operations.
来源:https://stackoverflow.com/questions/9228698/difference-between-unsigned-char-and-char-pointers