Steps to creating an NFA from a regular expression

后端未结

关注

 3  1374

攒了一身酷

I\'m having issues \'describing each step\' when creating an NFA from a regular expression. The question is as follows:

Convert the following regular expression to a

相关标签:

3条回答

感动是毒

2020-12-22 22:38

https://github.com/White-White/RegSwift

No more tedious words. Check out this repo, it translates your regular expression to an NFA and visually shows you the state transitions of an NFA.

0 讨论(0)
发布评论:

提交评论
- 加载中...
慢半拍i

2020-12-22 22:42
Short version for general approach.
There's an algo out there called the Thompson-McNaughton-Yamada Construction Algorithm or sometimes just "Thompson Construction." One builds intermediate NFAs, filling in the pieces along the way, while respecting operator precedence: first parentheses, then Kleene Star (e.g., a*), then concatenation (e.g., ab), followed by alternation (e.g., a|b).

Here's an in-depth walkthrough for building (b|a)*b(a|b)'s NFA

Building the top level
1. Handle parentheses. Note: In actual implementation, it can make sense to handling parentheses via a recursive call on their contents. For the sake of clarity, I'll defer evaluation of anything inside of parens.
2. Kleene Stars: only one * there, so we build a placeholder Kleene Star machine called P (which will later contain b|a). Intermediate result:
3. Concatenation: Attach P to b, and attach b to a placeholder machine called Q (which will contain (a|b). Intermediate result:
4. There's no alternation outside of parentheses, so we skip it.
Now we're sitting on a P*bQ machine. (Note that our placeholders P and Q are just concatenation machines.) We replace the P edge with the NFA for b|a, and replace the Q edge with the NFA for a|b via recursive application of the above steps.

Building P
1. Skip. No parens.
2. Skip. No Kleene stars.
3. Skip. No contatenation.
4. Build the alternation machine for b|a. Intermediate result:
Integrating P

Next, we go back to that P*bQ machine and we tear out the P edge. We have the source of the P edge serve as the starting state for the P machine, and the destination of the P edge serve as the destination state for the P machine. We also make that state reject (take away its property of being an accept state). The result looks like this:

Building Q
1. Skip. No parens.
2. Skip. No Kleene stars.
3. Skip. No contatenation.
4. Build the alternation machine for a|b. Incidentally, alternation is commutative, so a|b is logically equivalent to b|a. (Read: skipping this minor footnote diagram out of laziness.)
Integrating Q

We do what we did with P above, except replacing the Q edge with the intermedtae b|a machine we constructed. This is the result:

Tada! Er, I mean, QED.

Want to know more?

All the images above were generated using an online tool for automatically converting regular expressions to non-deterministic finite automata. You can find its source code for the Thompson-McNaughton-Yamada Construction algorithm online.

The algorithm is also addressed in Aho's Compilers: Principles, Techniques, and Tools, though its explanation is sparse on implementation details. You can also learn from an implementation of the Thompson Construction in C by the excellent Russ Cox, who described it some detail in a popular article about regular expression matching.
0 讨论(0)
发布评论:

提交评论
- 加载中...
谎友^

2020-12-22 22:57

In the GitHub repository below, you can find a Java implementation of Thompson's construction where first an NFA is being created from the regex and then an input string is being matched against that NFA:

https://github.com/meghdadFar/regex

0 讨论(0)
发布评论:

提交评论
- 加载中...