Often I find myself having to represent a structure that consists of very small values. For example, Foo
has 4 values, a, b, c, d
that, range from
Foo has 4 values, a, b, c, d that, range from 0 to 3. Usually I don't care, but sometimes, those structures are ...
There is another option: since the values 0 ... 3 likely indicate some sort of state, you could consider using "flags"
enum{
A_1 = 1<<0,
A_2 = 1<<1,
A_3 = A_1|A_2,
B_1 = 1<<2,
B_2 = 1<<3,
B_3 = B_1|B_2,
C_1 = 1<<4,
C_2 = 1<<5,
C_3 = C_1|C_2,
D_1 = 1<<6,
D_2 = 1<<7,
D_3 = D_1|D_2,
//you could continue to ... D7_3 for 32/64 bits if it makes sense
}
This isn't much different than using bitfields for most situations, but can drastically reduce your conditional logic.
if ( a < 2 && b < 2 && c < 2 && d < 2) // .... (4 comparisons)
//vs.
if ( abcd & (A_2|B_2|C_2|D_2) !=0 ) //(bitop with constant and a 0-compare)
Depending what kinds of operations you will be doing on the data, it may make sense to use either 4 or 8 sets of abcd and pad out the end with 0s as needed. That could allow up to 32 comparisons to be replaced with a bitop and 0-compare.
For instance, if you wanted to set the "1 bit" on all 8 sets of 4 in a 64 bit variable you can do uint64_t abcd8 = 0x5555555555555555ULL;
then to set all the 2 bits you could do abcd8 |= 0xAAAAAAAAAAAAAAAAULL;
making all values now 3
Addendum: On further consideration, you could use a union as your type and either do a union with char and @dbush's bitfields (these flag operations would still work on the unsigned char) or use char types for each a,b,c,d and union them with unsigned int. This would allow both a compact representation and efficient operations depending on what union member you use.
union Foo {
char abcd; //Note: you can use flags and bitops on this too
struct {
unsigned char a:2;
unsigned char b:2;
unsigned char c:2;
unsigned char d:2;
};
};
Or even extended further
union Foo {
uint64_t abcd8; //Note: you can use flags and bitops on these too
uint32_t abcd4[2];
uint16_t abcd2[4];
uint8_t abcd[8];
struct {
unsigned char a:2;
unsigned char b:2;
unsigned char c:2;
unsigned char d:2;
} _[8];
};
union Foo myfoo = {0xFFFFFFFFFFFFFFFFULL};
//assert(myfoo._[0].a == 3 && myfoo.abcd[0] == 0xFF);
This method does introduce some endianness differences, which would also be a problem if you use a union to cover any other combination of your other methods.
union Foo {
uint32_t abcd;
uint32_t dcba; //only here for endian purposes
struct { //anonymous struct
char a;
char b;
char c;
char d;
};
};
You could experiment and measure with different union types and algorithms to see which parts of the unions are worth keeping, then discard the ones that are not useful. You may find that operating on several char/short/int types simultaneously gets automatically optimized to some combination of AVX/simd instructions whereas using bitfields does not unless you manually unroll them... there is no way to know until you test and measure them.
I think the only real answer can be to write your code generically, and then profile the full program with all of them. I don't think this will take that much time, though it may look a little more awkward. Basically, I'd do something like this:
template <bool is_packed> class Foo;
using interface_int = char;
template <>
class Foo<true> {
char m_a, m_b, m_c, m_d;
public:
void setA(interface_int a) { m_a = a; }
interface_int getA() { return m_a; }
...
}
template <>
class Foo<false> {
char m_data;
public:
void setA(interface_int a) { // bit magic changes m_data; }
interface_int getA() { // bit magic gets a from m_data; }
}
If you just write your code like this instead of exposing the raw data, it will be easy to switch implementations and profile. The function calls will get inlined and will not impact performance. Note that I just wrote setA and getA instead of a function that returns a reference, this is more complicated to implement.
The most efficient, performance / execution, is to use the processor's word size. Don't make the processor perform extra work of packing or unpacking.
Some processors have more than one efficient size. Many ARM processors can operate in 8/32 bit mode. This means that the processor is optimized for handling 8 bit quantities or 32-bit quantities. For a processor like this, I recommend using 8-bit data types.
Your algorithm has a lot to do with the efficiency. If you are moving data or copying data you may want to consider moving data 32-bits at a time (4 8-bit quantities). The idea here is to reduce the number of fetches by the processor.
For performance, write your code to make use of registers, such as using more local variables. Fetching from memory into registers is more costly than using registers directly.
Best of all, check out your compiler optimization settings. Set your compile for the highest performance (speed) settings. Next, generate assembly language listings of your functions. Review the listing to see how the compiler generated code. Adjust your code to improve the compiler's optimization capabilities.