Please help me to understand the construction of state transition table for more than one patterns in the Aho-Corasick algorithm.
Please give a simple and detailed explanation so that I could understand.
I am following this paper and here is the animation for that.
Thanks.
Phase 1
Creating the keyword tree :
Starting at the root, follow the path labeled by chars of Pi If the path ends before Pi, continue it by adding new edges and ... nodes for the remaining characters of P Store identifier i of Pi at the terminal node of the path
I demonstrate it by an example :
Suppose we have a finite set of patterns P = {he, she, his, hers}.
Phase2
Next we extend the keyword tree into an automaton,to support linear-time matching as follows.
Obviously the automaton consists of two parts :
- states
- transitions
States : Exactly the nodes achieved by the keyword tree; and initial state = root of the tree.
Transitions : using goto(q,a) function as follows.
/*This function gives the state entered from current state q by matching target char a*/ - If edge (q,v) in the tree is labeled by a, then g(q,a)=v; - g(0,a)=0 for each a that does not label an edge out of the root //the automton stays at the initial state while scanning non-matching characters - O.W. g(q,a)={}
Using the automaton :
SearchofTarget T[1..m] //considering the target string T as an
array of chars indexed from 1 to m
q:= 0; // initial state(root)
for i:= 1 to m do
while g(q,T[i]) == {} do
q:= fail(q); // follow a fail
q:= goto(q,T[i]); // follow a goto
if output(q) != {} then print i, out(q);
endfor;
As you see in the above pseudo-code it calls two functions : fail(q) and output(q)
fail(q) : // for q !=0 gives the state entered at a mismatch
failure(q) is the node labeled by the longest proper suffix w of L(q) ...
s.t. w is a prefix of some pattern
//L(q) is the label of node q as the concatenation of edge labels ...
on the path from the root to q
output(q) :
gives the set of patterns recognized when entering state q
After computation of these two functions, the automaton looks like this :
Now you may wonder when to compute these methods, so please look at these for more official form :
Hope this help and if still something is ambiguous please don't hesitate to ask.
来源:https://stackoverflow.com/questions/22398190/state-transition-table-for-aho-corasick-algorithm