Capture groups using DFA-based (linear-time) regular expressions: possible?

后端 未结 2 687
挽巷
挽巷 2021-02-14 23:48

Is it possible to implement capture groups with DFA-based regular expressions while maintaining a linear time complexity with respect to the input length?

Intuitively I

2条回答
  •  暗喜
    暗喜 (楼主)
    2021-02-14 23:57

    My http://github.com/hoehrmann/demo-parselov does this. I do not currently explain the construction on the web page, but suppose you have a grammar like

    X = "a" B "c"
    B = "b"
    

    You can turn this regular grammar into a graph with labeled vertices

    1. start X
    2. "a"
    3. start B
    4. "b"
    5. final B
    6. "c"
    7. final X

    DFA states correspond to sets of these vertices. The first one would consist of vertices 1 and 2, the second one of vertices 3 and 4, then 5 and 6, and finally 7. If you parse the string "abc", you have

    1. { offset: 0, vertices: [1, 2] }
    2. { offset: 1, vertices: [3, 4] }
    3. { offset: 2, vertices: [5, 6] }
    4. { offset: EOF, vertices: [7] }

    That is also a graph. You can write out the edges using (offset, vertex) pairs as vertices:

    1. (o0, v1) -> (o0, v2)
    2. (o0, v2) -> (o1, v3)
    3. (o1, v3) -> (o1, v4)
    4. ...

    Such a graph might contain vertices that do not ultimately reach the final vertex (EOF, v7), but such vertices can be eliminated in O(n) time. If the grammar is ambiguous, a match would be a path through the resulting graph. There may be many possible paths.

提交回复
热议问题