Capture groups using DFA-based (linear-time) regular expressions: possible?

后端未结

关注

 2  699

挽巷 2021-02-14 23:48

Is it possible to implement capture groups with DFA-based regular expressions while maintaining a linear time complexity with respect to the input length?

Intuitively I

2条回答

暗喜 (楼主)

2021-02-14 23:57
My http://github.com/hoehrmann/demo-parselov does this. I do not currently explain the construction on the web page, but suppose you have a grammar like
```
X = "a" B "c"
B = "b"
```
You can turn this regular grammar into a graph with labeled vertices
1. start X
2. "a"
3. start B
4. "b"
5. final B
6. "c"
7. final X
DFA states correspond to sets of these vertices. The first one would consist of vertices 1 and 2, the second one of vertices 3 and 4, then 5 and 6, and finally 7. If you parse the string "abc", you have
1. { offset: 0, vertices: [1, 2] }
2. { offset: 1, vertices: [3, 4] }
3. { offset: 2, vertices: [5, 6] }
4. { offset: EOF, vertices: [7] }
That is also a graph. You can write out the edges using (offset, vertex) pairs as vertices:
1. (o0, v1) -> (o0, v2)
2. (o0, v2) -> (o1, v3)
3. (o1, v3) -> (o1, v4)
4. ...
Such a graph might contain vertices that do not ultimately reach the final vertex (EOF, v7), but such vertices can be eliminated in O(n) time. If the grammar is ambiguous, a match would be a path through the resulting graph. There may be many possible paths.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...