What are the main differences between the Knuth-Morris-Pratt and Boyer-Moore search algorithms?

纵饮孤独 提交于 2019-12-04 07:27:18

问题


What are the main differences between the Knuth-Morris-Pratt search algorithm and the Boyer-Moore search algorithm?

I know KMP searches for Y in X, trying to define a pattern in Y, and saves the pattern in a vector. I also know that BM works better for small words, like DNA (ACTG).

What are the main differences in how they work? Which one is faster? Which one is less computer-greedy? In which cases?


回答1:


Moore's UTexas webpage walks through both algorithms in a step-by-step fashion (he also provides various technical sources):

  • Knuth-Morris-Pratt
  • Boyer-Moore

According to the man himself,

The classic Boyer-Moore algorithm suffers from the phenomenon that it tends not to work so efficiently on small alphabets like DNA. The skip distance tends to stop growing with the pattern length because substrings re-occur frequently. By remembering more of what has already been matched, one can get larger skips through the text. One can even arrange ``perfect memory'' and thus look at each character at most once, whereas the Boyer-Moore algorithm, while linear, may inspect a character from the text multiple times. This idea of remembering more has been explored in the literature by others. It suffers from the need for very large tables or state machines.

However, there have been some modifications of BM that have made small-alphabet searching viable.




回答2:


In an rough explanation

Boyer-Moore's approach is to try to match the last character of the pattern instead of the first one with the assumption that if there's not match at the end no need to try to match at the beginning. This allows for "big jumps" therefore BM works better when the pattern and the text you are searching resemble "natural text" (i.e. English)

Knuth-Morris-Pratt searches for occurrences of a "word" W within a main "text string" S by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters. (Source: Wiki)

This means KMP is better suited for small sets like DNA (ACTG)




回答3:


Boyer-Moore technique match the characters from right to left, works well on long patterns. knuth moris pratt match the characters from left to right, works fast on short patterns.



来源:https://stackoverflow.com/questions/12656160/what-are-the-main-differences-between-the-knuth-morris-pratt-and-boyer-moore-sea

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!