Algorithm to generate context free grammar from any regex

核能气质少年 提交于 2019-12-04 12:34:07

问题


Can anyone outline for me an algorithm that can convert any given regex into an equivalent set of CFG rules?

I know how to tackle the elementary stuff such as (a|b)*:

S -> a A
S -> a B
S -> b A
S -> b B
A -> a A
A -> a B
A -> epsilon
B -> b A
B -> b B
B -> epsilon
S -> epsilon (end of string)

However, I'm having some problem formalizing it into a proper algorithm especially with more complex expressions that can have many nested operations.


回答1:


If you are just talking about regular expressions from a theoretical point of view, there are these three constructs:

ab       # concatenation
a|b      # alternation
a*       # repetition or Kleene closure

What you could then just do:

  • create a rule S -> (fullRegex)
  • for every repeated term (x)* in fullRegex create a rule X -> x X and X -> ε, then replace (x)* with X.
  • for every alternation (a|b|c) create rules Y -> a, Y -> b and Y -> c, then replace (a|b|c) with Y

Simply repeat this recursively (note that all x, a, b and c can still be complex regular expressions). Note that of course you have to use unique identifiers for every step.

This should be enough. This will certainly not give the most elegant or efficient grammar, but that is what normalization is for (and it should be done in a separate step and there are well-defined steps to do this).

One example: a(b|cd*(e|f)*)*

S -> a(b|cd*(e|f)*)*

S -> a X1; X1 -> (b|cd*(e|f)*) X1; X1 -> ε

S -> a X1; X1 -> Y1 X1; X1 -> ε; Y1 -> b; Y1 -> cd*(e|f)*

S -> a X1; X1 -> Y1 X1; X1 -> ε; Y1 -> b; Y1 -> c X2 (e|f)*; X2 -> d X2; X2 -> ε

... and a few more of those steps, until you end up with:

S  -> a X1
X1 -> Y1 X1
X1 -> ε
Y1 -> b
Y1 -> c X2 X3
X2 -> d X2
X2 -> ε
X3 -> Y2 X3
X3 -> ε
Y2 -> e
Y2 -> f


来源:https://stackoverflow.com/questions/13139821/algorithm-to-generate-context-free-grammar-from-any-regex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!