Steps to creating an NFA from a regular expression

后端 未结 3 1374
攒了一身酷
攒了一身酷 2020-12-22 22:11

I\'m having issues \'describing each step\' when creating an NFA from a regular expression. The question is as follows:

Convert the following regular expression to a

相关标签:
3条回答
  • 2020-12-22 22:38

    https://github.com/White-White/RegSwift

    No more tedious words. Check out this repo, it translates your regular expression to an NFA and visually shows you the state transitions of an NFA.

    0 讨论(0)
  • 2020-12-22 22:42

    Short version for general approach.
    There's an algo out there called the Thompson-McNaughton-Yamada Construction Algorithm or sometimes just "Thompson Construction." One builds intermediate NFAs, filling in the pieces along the way, while respecting operator precedence: first parentheses, then Kleene Star (e.g., a*), then concatenation (e.g., ab), followed by alternation (e.g., a|b).

    Here's an in-depth walkthrough for building (b|a)*b(a|b)'s NFA

    Building the top level

    1. Handle parentheses. Note: In actual implementation, it can make sense to handling parentheses via a recursive call on their contents. For the sake of clarity, I'll defer evaluation of anything inside of parens.

    2. Kleene Stars: only one * there, so we build a placeholder Kleene Star machine called P (which will later contain b|a). Intermediate result:
      Non-Deterministic Finite Automata for P*

    3. Concatenation: Attach P to b, and attach b to a placeholder machine called Q (which will contain (a|b). Intermediate result:
      Non-Deterministic Finite Automata for P*bQ

    4. There's no alternation outside of parentheses, so we skip it.

    Now we're sitting on a P*bQ machine. (Note that our placeholders P and Q are just concatenation machines.) We replace the P edge with the NFA for b|a, and replace the Q edge with the NFA for a|b via recursive application of the above steps.


    Building P

    1. Skip. No parens.

    2. Skip. No Kleene stars.

    3. Skip. No contatenation.

    4. Build the alternation machine for b|a. Intermediate result:
      NFA for b or a


    Integrating P

    Next, we go back to that P*bQ machine and we tear out the P edge. We have the source of the P edge serve as the starting state for the P machine, and the destination of the P edge serve as the destination state for the P machine. We also make that state reject (take away its property of being an accept state). The result looks like this:
    enter image description here


    Building Q

    1. Skip. No parens.

    2. Skip. No Kleene stars.

    3. Skip. No contatenation.

    4. Build the alternation machine for a|b. Incidentally, alternation is commutative, so a|b is logically equivalent to b|a. (Read: skipping this minor footnote diagram out of laziness.)


    Integrating Q

    We do what we did with P above, except replacing the Q edge with the intermedtae b|a machine we constructed. This is the result:
    enter image description here

    Tada! Er, I mean, QED.


    Want to know more?

    All the images above were generated using an online tool for automatically converting regular expressions to non-deterministic finite automata. You can find its source code for the Thompson-McNaughton-Yamada Construction algorithm online.

    The algorithm is also addressed in Aho's Compilers: Principles, Techniques, and Tools, though its explanation is sparse on implementation details. You can also learn from an implementation of the Thompson Construction in C by the excellent Russ Cox, who described it some detail in a popular article about regular expression matching.

    0 讨论(0)
  • 2020-12-22 22:57

    In the GitHub repository below, you can find a Java implementation of Thompson's construction where first an NFA is being created from the regex and then an input string is being matched against that NFA:

    https://github.com/meghdadFar/regex

    0 讨论(0)
提交回复
热议问题