What are the shift rules for Boyer–Moore string search algorithm?

前端未结

关注

 3  1982

小鲜肉 2021-02-06 05:29

I have been trying to understand shift rules in Boyer–Moore string search algorithm but haven\'t understood them. I read here on wikipedia but that is too complex !

3条回答

野性不改 (楼主)

2021-02-06 06:18
There are two heuristics: bat symbol heuristic and good pattern heuristic.

First, you know, needle comparison starts from its end. So, if characters do not match needle shifted such at least compared character in haystack would match needle. E. g. needle is "ABRACADABRA", and current caracter in haystack is "B" it does not match last "A", and also does not match previous "R", so shift by one is pointless, there will be no match. But "B" match 2-th from the end character in needle. So we would shift needle at least by 2 positions. If current character in haystack does not match any in needle, needle have to be shifted beyond current character. In other words we shift pattern until current character in haystack match character in needle, or whole needle is shifted beyond.

Amount of shift is calculated and stored in array, so for "ABRACADABRA" it would be: ['R'] = 1, ['B'] = 2, ['D'] = 4, etc.
```
haystack: XYABRACADABRA...
                    |
needle:   ABRACADABRA
           ABRACADABRA <-- pointless shift: R will not match B
            ABRACADABRA
```
Second, if found match for at least "ABRA" in haystack (but no full match) needle can be shifted so next "ABRA" will match.

Amount of shift for matched part is also precalculated: e. g. ['A'] = 3, ['RA'] = 11, ['BRA'] = 11, ['ABRA'] = 7, ['DABRA'] = 7...
```
haystack: XYZYXADABRACADABRA...
needle:   ABRACADABRA           (shift to ABRA from matched ADABRA)
          ~~~~   ~~~~
                 ABRACADABRA
```
This is not full explantaion of all corner cases, but main idea of algorithm.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...