I am writing a piece of code designed to do some data compression on CLSID structures. I\'m storing them as a compressed stream of 128 bit integers. However, the code in questio
compressedBytes.push_back(either.bytes.b[0]);
compressedBytes.push_back(either.bytes.b[1]);
compressedBytes.push_back(either.bytes.b[2]);
compressedBytes.push_back(either.bytes.b[3]);
There is an even smarter and faster way! Let's see what this code is doing and how we can improve it.
This code is serializing the integer, one byte at a time. For each byte it's calling push_back, which is checking the free space in the internal vector buffer. If we have no room for another byte, memory reallocation will happen (hint, slow!). Granted, the reallocation will not happen frequently (reallocations typically happen by doubling the existing buffer). Then, the new byte is copied and the internal size is increased by one.
vector<> has a requirement by the standard which dictates that the internal buffer be contiguous. vector<> also happen to have an operator& () and operator[] ().
So, here is the best code you can come up with:
std::string invalidClsids("This is a test string");
std::vector<BYTE> compressedBytes;
DWORD invalidLength = (DWORD) invalidClsids.length();
compressedBytes.resize(sizeof(DWORD)); // You probably want to make this much larger, to avoid resizing later.
// compressedBytes is as large as the length we want to serialize.
BYTE* p = &compressedBytes[0]; // This is valid code and designed by the standard for such cases. p points to a buffer that is at least as large as a DWORD.
*((DWORD*)p) = invalidLength; // Copy all bytes in one go!
The above cast can be done in one go with the &compressedBytes[0] statement, but it won't be faster. This is more readable.
NOTE! Serializing this way (or even with the UNION method) is endian-dependent. That is, on an Intel/AMD processor the least significant byte will come first, while one a big-endian machine (PowerPC, Motorola...) the most significant byte will come first. If you want to be neutral, you must use a math method (shifts).
Perhaps it's possible to get 32bit variable pointer, convert it into char pointer and read char, then add +1 to pointer and read next char .. just theory :) i don't know if it's working
This is probably as optimized as you'll get. Bit-twiddling operations are some of the fastest available on the processor.
It may be faster to >> 16, >> 24 instead of >>= 8 >>= 8 - you cut down an assignment.
Also I don't think you need the & - since you're casting to a BYTE (which should be a 8-bit char) it'll get truncated down appropriately anyway. (Is it? correct me if I'm wrong)
All in all, though, these are really minor changes. Profile it to see if it actually makes a difference :P
A real quick way is to just treat the a DWORD* (single element array) as a BYTE* (4 element array). The code is also a lot more readable.
Warning: I haven't compiled this
Warning: This makes your code dependent on byte ordering
std::vector<BYTE> compressedBytes;
DWORD invalidLength = (DWORD) invalidClsids.length();
BYTE* lengthParts = &invalidLength;
static const int kLenghtPartsLength = sizeof(DWORD) / sizeof(BYTE);
for(int i = 0; i < kLenghtPartsLength; ++i)
compressedBytes.push_back(lengthParts[i]);
Just use a union:
assert(sizeof (DWORD) == sizeof (BYTE[4])); // Sanity check
union either {
DWORD dw;
struct {
BYTE b[4];
} bytes;
};
either invalidLength;
invalidLength.dw = (DWORD) invalidClsids.length();
compressedBytes.push_back(either.bytes.b[0]);
compressedBytes.push_back(either.bytes.b[1]);
compressedBytes.push_back(either.bytes.b[2]);
compressedBytes.push_back(either.bytes.b[3]);
NOTE: Unlike the bit-shifting approach in the original question, this code produces endian-dependent output. This matters only if output from a program running on one computer will be read on a computer with different endianness -- but as there seems to be no measurable speed increase from using this method, you might as well use the more portable bit-shifting approach, just in case.
You should measure rather than guess at any potential improvement but my first thought is that it may be faster to do a union as follows:
typedef union {
DWORD d;
struct {
BYTE b0;
BYTE b1;
BYTE b2;
BYTE b3;
} b;
} DWB;
std::vector<BYTE> compBytes;
DWB invLen;
invLen.d = (DWORD) invalidClsids.length();
compBytes.push_back(invalidLength.b.b3);
compBytes.push_back(invalidLength.b.b2);
compBytes.push_back(invalidLength.b.b1);
compBytes.push_back(invalidLength.b.b0);
That may be the right order for the pushbacks but check just in case - it depends on the endian-ness of the CPU.