Understanding code in strlen implementation

后端未结

关注

 1  524

情歌与酒

I have two questions regarding the implementation of strlen in string.h in glibc.

The implementation uses a magic number with \'holes\

相关标签:

1条回答

深忆病人

2021-02-13 16:54
This is used to look at 4 bytes (32 bits) or even 8 (64 bits) in one go, to check if one of them is zero (end of string), instead of checking each byte individually.

Here is one example to check for a null byte:
```
unsigned int v; // 32-bit word to check if any 8-bit byte in it is 0
bool hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);
```
For some more see Bit Twiddling Hacks.

The one used here (32-bit example):

There is yet a faster method — use hasless(v, 1), which is defined below; it works in 4 operations and requires no subsquent verification. It simplifies to

#define haszero(v) (((v) - 0x01010101UL) & ~(v) & 0x80808080UL)

The subexpression (v - 0x01010101UL), evaluates to a high bit set in any byte whenever the corresponding byte in v is zero or greater than 0x80. The sub-expression ~v & 0x80808080UL evaluates to high bits set in bytes where the byte of v doesn't have its high bit set (so the byte was less than 0x80). Finally, by ANDing these two sub-expressions the result is the high bits set where the bytes in v were zero, since the high bits set due to a value greater than 0x80 in the first sub-expression are masked off by the second.

Looking at one byte at a time costs at least as much cpu cycles as looking at a full interger value (register wide). In this algorithm, full integers are checked to see if they contain a zero. If not, little instructions are used, and a jump can be made to the next full integer. If there is a zero byte inside, a further check is done to see at what exact position it was.
0 讨论(0)
发布评论:

提交评论
- 加载中...