I\'m currently working on a scanner generator. The generator already works fine. But when using character classes the algorithm gets very slow.
The scanner generator pr
I had the same problem with my scanner generator, so I've come up with the idea of replacing intervals by their ids which is determined using interval tree. For instance a..z range in dfa can be represented as: 97, 98, 99, ..., 122, instead I represent ranges as [97, 122], then build interval tree structure out of them, so at the end they are represented as ids that is referring to the interval tree. Given the following RE: a..z+, we end up with such DFA:
0 -> a -> 1
0 -> b -> 1
0 -> c -> 1
0 -> ... -> 1
0 -> z -> 1
1 -> a -> 1
1 -> b -> 1
1 -> c -> 1
1 -> ... -> 1
1 -> z -> 1
1 -> E -> ACCEPT
Now compress intervals:
0 -> a..z -> 1
1 -> a..z -> 1
1 -> E -> ACCEPT
Extract all intervals from your DFA and build interval tree out of them:
{
"left": null,
"middle": {
id: 0,
interval: [a, z],
},
"right": null
}
Replace actual intervals to their ids:
0 -> 0 -> 1
1 -> 0 -> 1
1 -> E -> ACCEPT