What are the shift rules for Boyer–Moore string search algorithm?

前端 未结 3 1982
小鲜肉
小鲜肉 2021-02-06 05:29

I have been trying to understand shift rules in Boyer–Moore string search algorithm but haven\'t understood them. I read here on wikipedia but that is too complex !

3条回答
  •  野性不改
    2021-02-06 06:18

    There are two heuristics: bat symbol heuristic and good pattern heuristic.

    First, you know, needle comparison starts from its end. So, if characters do not match needle shifted such at least compared character in haystack would match needle. E. g. needle is "ABRACADABRA", and current caracter in haystack is "B" it does not match last "A", and also does not match previous "R", so shift by one is pointless, there will be no match. But "B" match 2-th from the end character in needle. So we would shift needle at least by 2 positions. If current character in haystack does not match any in needle, needle have to be shifted beyond current character. In other words we shift pattern until current character in haystack match character in needle, or whole needle is shifted beyond.

    Amount of shift is calculated and stored in array, so for "ABRACADABRA" it would be: ['R'] = 1, ['B'] = 2, ['D'] = 4, etc.

    haystack: XYABRACADABRA...
                        |
    needle:   ABRACADABRA
               ABRACADABRA <-- pointless shift: R will not match B
                ABRACADABRA
    

    Second, if found match for at least "ABRA" in haystack (but no full match) needle can be shifted so next "ABRA" will match.

    Amount of shift for matched part is also precalculated: e. g. ['A'] = 3, ['RA'] = 11, ['BRA'] = 11, ['ABRA'] = 7, ['DABRA'] = 7...

    haystack: XYZYXADABRACADABRA...
    needle:   ABRACADABRA           (shift to ABRA from matched ADABRA)
              ~~~~   ~~~~
                     ABRACADABRA
    

    This is not full explantaion of all corner cases, but main idea of algorithm.

提交回复
热议问题