I have been trying to understand shift rules in Boyer–Moore string search algorithm but haven\'t understood them. I read here on wikipedia but that is too complex ! >
There are two heuristics: bat symbol heuristic and good pattern heuristic.
First, you know, needle comparison starts from its end. So, if characters do not match needle shifted such at least compared character in haystack would match needle. E. g. needle is "ABRACADABRA", and current caracter in haystack is "B" it does not match last "A", and also does not match previous "R", so shift by one is pointless, there will be no match. But "B" match 2-th from the end character in needle. So we would shift needle at least by 2 positions. If current character in haystack does not match any in needle, needle have to be shifted beyond current character. In other words we shift pattern until current character in haystack match character in needle, or whole needle is shifted beyond.
Amount of shift is calculated and stored in array, so for "ABRACADABRA" it would be: ['R'] = 1, ['B'] = 2, ['D'] = 4, etc.
haystack: XYABRACADABRA...
|
needle: ABRACADABRA
ABRACADABRA <-- pointless shift: R will not match B
ABRACADABRA
Second, if found match for at least "ABRA" in haystack (but no full match) needle can be shifted so next "ABRA" will match.
Amount of shift for matched part is also precalculated: e. g. ['A'] = 3, ['RA'] = 11, ['BRA'] = 11, ['ABRA'] = 7, ['DABRA'] = 7...
haystack: XYZYXADABRACADABRA...
needle: ABRACADABRA (shift to ABRA from matched ADABRA)
~~~~ ~~~~
ABRACADABRA
This is not full explantaion of all corner cases, but main idea of algorithm.