There are many situations (especially in low-level programming), where the binary layout of the data is important. For example: hardware/driver manipulation, network protocols,
From the C++14 standard (N3797 draft), section 9.6 [class.bit], paragraph 1:
Allocation of bit-fields within a class object is implementation-defined. Alignment of bit-fields is implementation-defined. Bit-fields are packed into some addressable allocation unit. [ Note: Bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. — end note ]
Although notes are non-normative, every implementation I'm aware of uses one of two layouts: either big-endian or little endian bit order.
Note that:
<cstdint>
).For examples, look in netinet/tcp.h
and other nearby headers.
Edit by OP: for example tcp.h
defines
struct
{
u_int16_t th_sport; /* source port */
u_int16_t th_dport; /* destination port */
tcp_seq th_seq; /* sequence number */
tcp_seq th_ack; /* acknowledgement number */
# if __BYTE_ORDER == __LITTLE_ENDIAN
u_int8_t th_x2:4; /* (unused) */
u_int8_t th_off:4; /* data offset */
# endif
# if __BYTE_ORDER == __BIG_ENDIAN
u_int8_t th_off:4; /* data offset */
u_int8_t th_x2:4; /* (unused) */
# endif
// ...
}
And since it works with mainstream compilers, it means bitset's memory layout is reliable in practice.
Edit:
This is portable within one endianness:
struct Foo {
uint16_t x: 10;
uint16_t y: 6;
};
But this may not be because it straddles a 16-bit unit:
struct Foo {
uint16_t x: 10;
uint16_t y: 12;
uint16_t z: 10;
};
And this may not be because it has implicit padding:
struct Foo {
uint16_t x: 10;
};
It's simple to implement bit fields with known positions with C++:
template<typename T, int POS, int SIZE>
struct BitField {
T *data;
BitField(T *data) : data(data) {}
operator int() const {
return ((*data) >> POS) & ((1ULL << SIZE)-1);
}
BitField& operator=(int x) {
T mask( ((1ULL << SIZE)-1) << POS );
*data = (*data & ~mask) | ((x << POS) & mask);
return *this;
}
};
The above toy implementation allows for example to define a 12-bit field in a unsigned long long
variable with
unsigned long long var;
BitField<unsigned long long, 7, 12> muxno(&var);
and the generated code to access the field value is just
0000000000000020 <_Z6getMuxv>:
20: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax ; Get &var
27: 48 8b 00 mov (%rax),%rax ; Get content
2a: 48 c1 e8 07 shr $0x7,%rax ; >> 7
2e: 25 ff 0f 00 00 and $0xfff,%eax ; keep 12 bits
33: c3 retq
Basically what you'd have to write by hand
We have this in production code where we had to port MIPS code to x86-64
https://codereview.stackexchange.com/questions/54342/template-for-endianness-free-code-data-always-packed-as-big-endian
Works well for us.
It's basically a template without any storage, the template arguments specify the position of the relevant bits.
If you need multiple fields, you put multiple specializations of the template together in a union, together with an array of bytes to provide storage.
The template has overloads for assignment of value and a conversion operator to unsigned
for reading the value.
In addition, if the fields are larger than a byte, they are stored in big-endian byte order, which is sometimes useful when implementing cross-platform protocols.
here's a usage example:
union header
{
unsigned char arr[2]; // space allocation, 2 bytes (16 bits)
BitFieldMember<0, 4> m1; // first 4 bits
BitFieldMember<4, 5> m2; // The following 5 bits
BitFieldMember<9, 6> m3; // The following 6 bits, total 16 bits
};
int main()
{
header a;
memset(a.arr, 0, sizeof(a.arr));
a.m1 = rand();
a.m3 = a.m1;
a.m2 = ~a.m1;
return 0;
}
I have written an implementation of bit fields in C++ as a library header file. An example I give in the documentation is that, instead of writing this:
struct A
{
union
{
struct
{
unsigned x : 5;
unsigned a0 : 2;
unsigned a1 : 2;
unsigned a2 : 2;
}
u;
struct
{
unsigned x : 5;
unsigned all_a : 6;
}
v;
};
};
// …
A x;
x.v.all_a = 0x3f;
x.u.a1 = 0;
you can write:
typedef Bitfield<Bitfield_traits_default<> > Bf;
struct A : private Bitfield_fmt
{
F<5> x;
F<2> a[3];
};
typedef Bitfield_w_fmt<Bf, A> Bwf;
// …
Bwf::Format::Define::T x;
BITF(Bwf, x, a) = 0x3f;
BITF(Bwf, x, a[1]) = 0;
There's an alternative interface, under which the last two lines of the above would change to:
#define BITF_U_X_BWF Bwf
#define BITF_U_X_BASE x
BITF(X, a) = 0x3f;
BITF(X, a[1]) = 0;
Using this implementation of bit fields, the traits template parameter gives the programmer a lot of flexibility. Memory is just processor memory by default, or it can be an abstraction, with the programmer providing functions to perform "memory" reads and writes. The abstracted memory is a sequence of elements of any unsigned integral type (chosen by the programmer). Fields can be laid out either from least-to-most or most-to-least significance. The layout of fields in memory can be the reverse of what they are in the format structure.
The implementation is located at: https://github.com/wkaras/C-plus-plus-library-bit-fields
(As you can see, I unfortunately was not able to fully avoid use of macros.)
C is designed for low-level bit manipulation. It's easy enough to declare a buffer of unsigned chars, and set it to any bit pattern you want. Especially if your bit strings are very short so fit into one of the integral types.
One potential problem is byte endiannness. C can't "see" this at all, but just as integers have an endianness, so too do bytes, when serialised. Another is the very small number of machines that don't use octets for bytes. C guarantees a byte shall be at least an octet, but 32 and 9 are real-world implementations. In those circumstances, you have to take a decision whether to simply ignore the uper bits (in which case naive code should work), or treat them as part of the bitstream (in which case you've got to be careful to fold CHAR_BIT into your calculations). It's also hard to test the code as you unlikely to find it easy to get your hands on a CHAR+BIT 32 machine.