What are the shift rules for Boyer–Moore string search algorithm?

前端未结

关注

 3  1979

小鲜肉 2021-02-06 05:29

I have been trying to understand shift rules in Boyer–Moore string search algorithm but haven\'t understood them. I read here on wikipedia but that is too complex !

3条回答

野的像风 (楼主)

2021-02-06 06:21
There's a good visualization here.

(EDIT: There's also a very good explanation with both examples and an example of how to implement the preprocessing steps here.)

General rules:
- We're looking for how to align the pattern with the text so that the aligned parts match. If no such alignment exists, the pattern isn't found in the text.
- Check each alignment from right to left - that is, start by checking if the last character of the pattern matches its current alignment.
- When you hit a character that doesn't align, increase the offset (shift the pattern) so that the last occurrence of the text-side letter in the pattern is aligned with this occurrence of the text-side letter we're currently looking at. This requires pre-building (or searching each time, but that's less efficient) an index of where each letter exists in the pattern.
- If the character being considered in the text doesn't appear in the pattern, skip forward by the full length of the pattern.
- If the end of the pattern sticks out past the end of the text (offset + length(pattern) > length(text)), the pattern doesn't appear in the text.
What I've just described is the "bad character" rule. The "good suffix" rule gives another option for shifting; whichever shifts farther is the one you should take. It's entirely possible to implement the algorithm without the good suffix rule, but it will be less efficient once the indices are built up.

The good-suffix rule requires that you also know where to find each multi-character substring of the pattern. When you hit a mismatch (checking, as always, from right to left), the good-suffix shift moves the pattern to a point where the letters that did already match will do so again. Alternatively, if the part that matched is unique in the pattern, you know you can skip all the way past it, because if it didn't match when lined up with the sole occurrence, it can't possibly match when lined up with any other part of the pattern.

For example, let's consider the following situation:
- My pattern ends in "a dog".
- I currently have it aligned with a part of the text that ends in "s dog".
- Therefore, the bad letter is 's' (where they stop matching), and the good suffix is " dog" (the part that did match).
I have two options here:
1. Shift so that the first 's' (from right to left) in the pattern is aligned with the 's' in the text. If there is no 's' in the pattern, shift the beginning of the pattern to just past the 's'.
2. Shift so that the next " dog" is aligned with the " dog" in the text. If there isn't another " dog" in the pattern, shift the beginning of the pattern to just past the end of " dog".
and I should take whichever one lets me shift farther.

If you're still confused, try asking a more specific question; it's hard to be clear when we don't know where you're stuck.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...