When and how is conversion to char pointer allowed?

旧巷老猫 提交于 2019-11-27 12:56:43

问题


We can look at the representation of an object of type T by converting a T* that points at that object into a char*. At least in practice:

int x = 511;
unsigned char* cp = (unsigned char*)&x;
std::cout << std::hex << std::setfill('0');
for (int i = 0; i < sizeof(int); i++) {
  std::cout << std::setw(2) << (int)cp[i] << ' ';
}

This outputs the representation of 511 on my system: ff 01 00 00.

There is (surely) some implementation defined behaviour occurring here. Which of the casts is allowing me to convert an int* to an unsigned char* and which conversions does that cast entail? Am I invoking undefined behaviour as soon as I cast? Can I cast any T* type like this? What can I rely on when doing this?


回答1:


Which of the casts is allowing me to convert an int* to an unsigned char*?

That C-style cast in this case is the same as reinterpret_cast<unsigned char*>.

Can I cast any T* type like this?

Yes and no. The yes part: You can safely cast any pointer type to a char* or unsigned char* (with the appropriate const and/or volatile qualifiers). The result is implementation-defined, but it is legal.

The no part: The standard explicitly allows char* and unsigned char* as the target type. However, you cannot (for example) safely cast a double* to an int*. Do this and you've crossed the boundary from implementation-defined behavior to undefined behavior. It violates the strict aliasing rule.




回答2:


Your cast maps to:

unsigned char* cp = reinterpret_cast<unsigned char*>(&x);

The underlying representation of an int is implementation defined, and viewing it as characters allows you to examine that. In your case, it is 32-bit little endian.

There is nothing special here -- this method of examining the internal representation is valid for any data type.

C++03 5.2.10.7: A pointer to an object can be explicitly converted to a pointer to an object of different type. Except that converting an rvalue of type "pointer to T1" to the type "pointer to T2" (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value, the result of such a pointer conversion is unspecified.

This suggests that the cast results in unspecified behavior. But pragmatically speaking, casting from any pointer type to char* will always allow you to examine (and modify) the internal representation of the referenced object.




回答3:


The C-style cast in this case is equivalent to reinterpret_cast. The Standard describes the semantics in 5.2.10. Specifically, in paragraph 7:

"A pointer to an object can be explicitly converted to a pointer to a different object type.70 When a prvalue v of type “pointer to T1” is converted to the type “pointer to cvT2”, the result is static_cast<cvT2*>(static_cast<cvvoid*>(v)) if both T1 and T2 are standard-layout types (3.9) and the alignment requirements of T2 are no stricter than those of T1. Converting a prvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. The result of any other such pointer conversion is unspecified."

What it means in your case, the alignment requirements are satisfied, and the result is unspecified.




回答4:


The implementation behaviour in your example is the endianness attribute of your system, in this case your CPU is a little endian.
About the type casting, when you cast an int* to char* all what you are doing is telling the compiler to interpret what cp is pointing to as a char, so it will read the first byte only and interpret it as a character.




回答5:


The cast between pointers are themselves always possible since all pointers are nothing more than memory addresses and whatever type, in memory, can always be thought as a sequence of bytes.

But -of course- the way the sequence is formed depends on how the decomposed type is represented in memory, and that's out of the scope of the C++ specifications.

That said, unless of very pathological cases, you can expect that representation to be the same on all the code produced by a same compiler for all the machines of a same platform (or family), and you should not expect same results on different platforms.

In general one thing to avoid is to express the relation between type sizes as "predefined": in your sample you assume sizeof(int) == 4*sizeof(char): that's not necessarily always true.

But it is always true that sizeof(T) = N*sizeof(char), hence whatever T can always be seen as a integer number of char-s




回答6:


Unless you have a cast operator, then a cast is simply telling to "see" that memory area in a different way. Nothing really fancy, I would say.

Then, you are reading the memory area byte-by-byte; as long as you do not change it, it is just fine. Of course, the result of what you see depends a lot from the platform: think about endianness, word size, padding, and so on.




回答7:


Just reverse the byte order then it becomes

00 00 01 ff

Which is 256 (01) + 255 (ff) = 511

This is because your platfom is little endian.



来源:https://stackoverflow.com/questions/13995748/when-and-how-is-conversion-to-char-pointer-allowed

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!