I am searching for a faster method of accomplishing this:
int is_empty(char * buf, int size)
{
int i;
for(i = 0; i < size; i++) {
if(buf[
I think I have a good solution for this. Create a dummy zeroed array and use memcmp(). Thats what I do.
You stated in your question that you are looking for a most likely unnecessary micro-optimization. In 'normal' cases the ASM approach by Thomas and others should give you the fastest results.
Still, this is forgetting the big picture. If your buffer is really large, then starting from the start and essential do a linear search is definitely not the fastest way to do this. Assume your cp replacement is quite good at finding large consecutive empty regions but has a few non-empty bytes at the end of the array. All linear searches would require reading the whole array. On the other hand a quicksort inspired algorithm could search for any non-zero elements and abort much faster for a large enough dataset.
So before doing any kind of micro-optimization I would look closely at the data in your buffer and see if that gives you any patterns. For a single '1', randomly distributed in the buffer a linear search (disregarding threading/parallelization) will be the fastest approach, in other cases not necessarily so.
What about looping from size to zero (cheaper checks):
int is_empty(char * buf, int size)
{
while(size --> 0) {
if(buf[i] != 0) return 0;
}
return 1;
}
It must be noted that we probably cannot outperform the compiler, so enable the most aggressive speed optimization in your compiler and assume that you're likely to not go any faster.
Or handling everything using pointers (not tested, but likely to perform quite good):
int is_empty(char* buf, int size)
{
char* org = buf;
if (buf[size-1] == 1)
return 0;
buf[size-1] = 1;
while(! *buf++);
buf--;
return buf == org[size-1];
}
Inline assembly version of the initial C code (no error checking, if uiSize
is == 0
and/or the array is not allocated exceptions will be generated. Perhaps use try {} catch()
as this might be faster than adding a lot of check to the code. Or do as I do, try not to call functions with invalid values (usually does not work). At least add a NULL pointer check and a size != 0
check, that is very easy.
unsigned int IsEmpty(char* pchBuffer, unsigned int uiSize)
{
asm {
push esi
push ecx
mov esi, [pchBuffer]
mov ecx, [uiSize]
// add NULL ptr and size check here
mov eax, 0
next_char:
repe scasb // repeat string instruction as long as BYTE ptr ds:[ESI] == 0
// scasb does pointer arithmetic for BYTES (chars), ie it copies a byte to al and increments ESI by 1
cmp cx,0 // did the loop complete?
je all_chars_zero // yes, array is all 0
jmp char_not_zero // no, loop was interrupted due to BYTE PTR ds:[ESI] != 0
all_chars_zero:
mov eax, 1 // Set return value (works in MASM)
jmp end
char_not_zero:
mov eax, 0 // Still not sure if this works in inline asm
end:
pop ecx
pop esi
}
}
That is written on the fly, but it looks correct enough, corrections are welcome. ANd if someone known about how to set the return value from inline asm, please do tell.
Four functions for testing zeroness of a buffer with simple benchmarking:
#include <stdio.h>
#include <string.h>
#include <wchar.h>
#include <inttypes.h>
#define SIZE (8*1024)
char zero[SIZE] __attribute__(( aligned(8) ));
#define RDTSC(var) __asm__ __volatile__ ( "rdtsc" : "=A" (var));
#define MEASURE( func ) { \
uint64_t start, stop; \
RDTSC( start ); \
int ret = func( zero, SIZE ); \
RDTSC( stop ); \
printf( #func ": %s %12"PRIu64"\n", ret?"non zero": "zero", stop-start ); \
}
int func1( char *buff, size_t size ){
while(size--) if(*buff++) return 1;
return 0;
}
int func2( char *buff, size_t size ){
return *buff || memcmp(buff, buff+1, size-1);
}
int func3( char *buff, size_t size ){
return *(uint64_t*)buff || memcmp(buff, buff+sizeof(uint64_t), size-sizeof(uint64_t));
}
int func4( char *buff, size_t size ){
return *(wchar_t*)buff || wmemcmp((wchar_t*)buff, (wchar_t*)buff+1, size/sizeof(wchar_t)-1);
}
int main(){
MEASURE( func1 );
MEASURE( func2 );
MEASURE( func3 );
MEASURE( func4 );
}
Result on my old PC:
func1: zero 108668
func2: zero 38680
func3: zero 8504
func4: zero 24768
One potential way, inspired by Kieveli's dismissed idea:
int is_empty(char *buf, size_t size)
{
static const char zero[999] = { 0 };
return !memcmp(zero, buf, size > 999 ? 999 : size);
}
Note that you can't make this solution work for arbitrary sizes. You could do this:
int is_empty(char *buf, size_t size)
{
char *zero = calloc(size);
int i = memcmp(zero, buf, size);
free(zero);
return i;
}
But any dynamic memory allocation is going to be slower than what you have. The only reason the first solution is faster is because it can use memcmp()
, which is going to be hand-optimized in assembly language by the library writers and will be much faster than anything you could code in C.
EDIT: An optimization no one else has mentioned, based on earlier observations about the "likelyness" of the buffer to be in state X: If a buffer isn't empty, will it more likely not be empty at the beginning or the end? If it's more likely to have cruft at the end, you could start your check at the end and probably see a nice little performance boost.
EDIT 2: Thanks to Accipitridae in the comments:
int is_empty(char *buf, size_t size)
{
return buf[0] == 0 && !memcmp(buf, buf + 1, size - 1);
}
This basically compares the buffer to itself, with an initial check to see if the first element is zero. That way, any non-zero elements will cause memcmp()
to fail. I don't know how this would compare to using another version, but I do know that it will fail quickly (before we even loop) if the first element is nonzero. If you're more likely to have cruft at the end, change buf[0]
to buf[size]
to get the same effect.