Is it possible to implement capture groups with DFA-based regular expressions while maintaining a linear time complexity with respect to the input length?
Intuitively I
My http://github.com/hoehrmann/demo-parselov does this. I do not currently explain the construction on the web page, but suppose you have a grammar like
X = "a" B "c"
B = "b"
You can turn this regular grammar into a graph with labeled vertices
DFA states correspond to sets of these vertices. The first one would consist of vertices 1 and 2, the second one of vertices 3 and 4, then 5 and 6, and finally 7. If you parse the string "abc", you have
That is also a graph. You can write out the edges using (offset, vertex) pairs as vertices:
Such a graph might contain vertices that do not ultimately reach the final vertex (EOF, v7), but such vertices can be eliminated in O(n) time. If the grammar is ambiguous, a match would be a path through the resulting graph. There may be many possible paths.