I have used unions earlier comfortably; today I was alarmed when I read this post and came to know that this code
union ARGB
{
uint32_t colour;
str
Although this is strictly undefined behaviour, in practice it will work with pretty much any compiler. It is such a widely used paradigm that any self-respecting compiler will need to do "the right thing" in cases such as this. It's certainly to be preferred over type-punning, which may well generate broken code with some compilers.
For one more example of the actual use of unions, the CORBA framework serializes objects using the tagged union approach. All user-defined classes are members of one (huge) union, and an integer identifier tells the demarshaller how to interpret the union.
The behavior is undefined from the language point of view. Consider that different platforms can have different constraints in memory alignment and endianness. The code in a big endian versus a little endian machine will update the values in the struct differently. Fixing the behavior in the language would require all implementations to use the same endianness (and memory alignment constraints...) limiting use.
If you are using C++ (you are using two tags) and you really care about portability, then you can just use the struct and provide a setter that takes the uint32_t
and sets the fields appropriately through bitmask operations. The same can be done in C with a function.
Edit: I was expecting AProgrammer to write down an answer to vote and close this one. As some comments have pointed out, endianness is dealt in other parts of the standard by letting each implementation decide what to do, and alignment and padding can also be handled differently. Now, the strict aliasing rules that AProgrammer implicitly refers to are a important point here. The compiler is allowed to make assumptions on the modification (or lack of modification) of variables. In the case of the union, the compiler could reorder instructions and move the read of each color component over the write to the colour variable.
As others mentioned, unions combined with enumerations and wrapped into structs can be used to implement tagged unions. One practical use is to implement Rust's Result<T, E>
, which is originally implemented using a pure enum
(Rust can hold additional data in enumeration variants). Here is a C++ example:
template <typename T, typename E> struct Result {
public:
enum class Success : uint8_t { Ok, Err };
Result(T val) {
m_success = Success::Ok;
m_value.ok = val;
}
Result(E val) {
m_success = Success::Err;
m_value.err = val;
}
inline bool operator==(const Result& other) {
return other.m_success == this->m_success;
}
inline bool operator!=(const Result& other) {
return other.m_success != this->m_success;
}
inline T expect(const char* errorMsg) {
if (m_success == Success::Err) throw errorMsg;
else return m_value.ok;
}
inline bool is_ok() {
return m_success == Success::Ok;
}
inline bool is_err() {
return m_success == Success::Err;
}
inline const T* ok() {
if (is_ok()) return m_value.ok;
else return nullptr;
}
inline const T* err() {
if (is_err()) return m_value.err;
else return nullptr;
}
// Other methods from https://doc.rust-lang.org/std/result/enum.Result.html
private:
Success m_success;
union _val_t { T ok; E err; } m_value;
}
The most common use of union
I regularly come across is aliasing.
Consider the following:
union Vector3f
{
struct{ float x,y,z ; } ;
float elts[3];
}
What does this do? It allows clean, neat access of a Vector3f vec;
's members by either name:
vec.x=vec.y=vec.z=1.f ;
or by integer access into the array
for( int i = 0 ; i < 3 ; i++ )
vec.elts[i]=1.f;
In some cases, accessing by name is the clearest thing you can do. In other cases, especially when the axis is chosen programmatically, the easier thing to do is to access the axis by numerical index - 0 for x, 1 for y, and 2 for z.
You could use unions to create structs like the following, which contains a field that tells us which component of the union is actually used:
struct VAROBJECT
{
enum o_t { Int, Double, String } objectType;
union
{
int intValue;
double dblValue;
char *strValue;
} value;
} object;