Find simplest regular expression matching all given strings

后端 未结 2 1789
不思量自难忘°
不思量自难忘° 2021-01-11 20:40

Is there an algorithm that can produce a regular expression (maybe limited to a simplified grammar) from a set of strings such that the evaluation of all possible strings th

2条回答
  •  -上瘾入骨i
    2021-01-11 21:18

    You can try to use Aho-Corasick algorithm to create a finite state machine from the input strings, after which it should be somewhat easy to generate the simplified regex. Your input strings as example:

    h_q1_a
    h_q1_b
    h_q1_c
    h_p2_a
    h_p2_b
    h_p2_c
    

    will generate a finite machine that most probably look like this:

          [h_]         <-level 0
         /   \
      [q1]  [p2]       <-level 1
         \   /
          [_]          <-level 2
          /\  \
         /  \  \
        a    b  c      <-level 3
    

    Now for every level/depth of the trie all the stings (if multiple) will go under OR brackets, so

    h_(q1|p2)_(a|b|c)
    L0   L1  L2  L3
    

提交回复
热议问题