Understanding code in strlen implementation

后端 未结 1 521
情歌与酒
情歌与酒 2021-02-13 16:14

I have two questions regarding the implementation of strlen in string.h in glibc.

  1. The implementation uses a magic number with \'holes\

1条回答
  •  深忆病人
    2021-02-13 16:54

    This is used to look at 4 bytes (32 bits) or even 8 (64 bits) in one go, to check if one of them is zero (end of string), instead of checking each byte individually.

    Here is one example to check for a null byte:

    unsigned int v; // 32-bit word to check if any 8-bit byte in it is 0
    bool hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);
    

    For some more see Bit Twiddling Hacks.

    The one used here (32-bit example):

    There is yet a faster method — use hasless(v, 1), which is defined below; it works in 4 operations and requires no subsquent verification. It simplifies to

    #define haszero(v) (((v) - 0x01010101UL) & ~(v) & 0x80808080UL)

    The subexpression (v - 0x01010101UL), evaluates to a high bit set in any byte whenever the corresponding byte in v is zero or greater than 0x80. The sub-expression ~v & 0x80808080UL evaluates to high bits set in bytes where the byte of v doesn't have its high bit set (so the byte was less than 0x80). Finally, by ANDing these two sub-expressions the result is the high bits set where the bytes in v were zero, since the high bits set due to a value greater than 0x80 in the first sub-expression are masked off by the second.

    Looking at one byte at a time costs at least as much cpu cycles as looking at a full interger value (register wide). In this algorithm, full integers are checked to see if they contain a zero. If not, little instructions are used, and a jump can be made to the next full integer. If there is a zero byte inside, a further check is done to see at what exact position it was.

    0 讨论(0)
提交回复
热议问题