How does memchr() work under the hood?

折月煮酒 提交于 2019-11-30 08:55:32
Chris Young

I would suggest taking a look at GNU libc's source. As for most functions, it will contain both a generic optimized C version of the function, and optimized assembly language versions for as many supported architectures as possible, taking advantage of machine specific tricks.

The x86-64 SSE2 version combines the results from pcmpeqb on a whole cache-line of data at once (four 16B vectors), to amortize the overhead of the early-exit pmovmskb/test/jcc.

gcc and clang are currently incapable of auto-vectorizing loops with if() break early-exit conditions, so they make naive byte-at-a-time asm from the obvious C implementation.

ChrisW

This implementation of memchr from newlib is one example of someone's optimizing memchr: it's reading and testing 4 bytes at a time (apart from memchr, other functions in the newlib library are here).

Incidentally, most of the the source code for the MSVC run-time library is available, as an optional part of the MSVC installation (so, you could look at that).

Here is FreeBSD's (BSD-licensed) memchr() from memchr.c. FreeBSD's online source code browser is a good reference for time-tested, BSD-licensed code examples.

void *
memchr(s, c, n)
    const void *s;
    unsigned char c;
    size_t n;
{
    if (n != 0) {
        const unsigned char *p = s;

        do {
            if (*p++ == c)
                return ((void *)(p - 1));
        } while (--n != 0);
    }
    return (NULL);
}

memchr like memset and memcpy generally reduce to fairly small amount of machine code. You are unlikely to be able to reproduce that kind of speed without inlining similar assembly code. One major issue to consider in an implementation is data alignment.

One generic technique you may be able to use is to insert a sentinel at the end of the string being searched, which guarantees that you will find it. It allows you to move the test for end of string from inside the loop, to after the loop.

GNU libc definitely uses the assembly version of memchr() (on any common linux distro). This is why it is so unbelievable fast.

For example, if we count lines in 11Gb file (like "wc -l" does) it takes around 2.5 seconds with assembly version of memchr() from GNU libc. But if we replace memchr() assembly call with for example memchr() C implementation from FreeBSD - the speed will decrease to like 30 seconds.

This is equal to replacing memchr() with just a while loop which compares one char after another.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!