I have a block of memory with elements of fixed size, say 100 bytes, put into it one after another, all with the same fixed length, so memory looks like this
&l
The obvious portable, high efficiency method is:
char testblock [fixedElementSize];
memset (testblock, 0, sizeof testblock);
if (!memcmp (testblock, memoryBlock + elementNr*fixedElementSize, fixedElementSize)
// block is all zero
else // a byte is non-zero
The library function memcmp()
in most implementations will use the largest, most efficient unit size it can for the majority of comparisons.
For more efficiency, don't set testblock
at runtime:
static const char testblock [100];
By definition, static variables are automatically initialized to zero unless there is an initializer.
AFAIK there is no automatically function to check memory.
You could use | to speed up the for-loop, no need for "=="
char *elementStart = memoryBlock + elementNr*fixedElementSize;
char special = 0;
for ( size_t curByteNr=0; curByteNr<fixedElementSize; ++curByteNr )
{
special |= (*(elementStart+curByteNr));
}
and also can you use long for even more speed
char *elementStart = memoryBlock + elementNr*fixedElementSize;
long special = 0;
for ( size_t curByteNr=0; curByteNr<fixedElementSize; curByteNr += sizeof(long) )
{
special |= *(long*)(elementStart+curByteNr);
}
WARNING: the above code is not tested. Please test it first so that the sizeof and casting operator works
You could perhaps actually use memcmp without having to allocate a zero-valued array, like this:
static int memvcmp(void *memory, unsigned char val, unsigned int size)
{
unsigned char *mm = (unsigned char*)memory;
return (*mm == val) && memcmp(mm, mm + 1, size - 1) == 0;
}
The standard for memcmp does not say anything about overlapping memory regions.
I can't recall a standard library function which could do this for you. If you are not sure this causes any performance issues I'd just use the loop, maybe replace char* with int* as already suggested.
If you do have to optimize you could unroll the loop:
bool allZeroes(char* buffer)
{
int* p = (int*)buffer; // you better make sure your block starts on int boundary
int acc = *p;
acc |= *++p;
acc |= *++p;
...
acc |= *++p; // as many times as needed
return acc == 0;
}
You may need to add special handling for the end of buffer if it's size is not a multiple of sizeof(int), but it could be more efficient to allocate a slightly larger block with some padding bytes set to 0.
If your blocks are large you could treat them as a sequence of smaller blocks and loop over them, using the code above for each small block.
I would be curious to know how this solution compares with std::upper_bound(begin,end,0)
and memcmp
.
EDIT
Did a quick check how a home-grown implementation compares with memcmp, used VS2010 for that.
In short:
1) in debug mode home-grown can be twice as fast as memcmp
2) in release with full optimization memcmp has an edge on the blocks which start with non-0s. As the length of the zero-filled preamble increases it starts losing, then somehow magically gets almost as fast as homegrown, about only 10% slower.
So depending on your data patterns and need/desire to optimize you could get some extra performance from rolling out your own method, but memcmp is a rather reasonable solution.
Will put the code and results on github in case you could use them.