I\'m currently working on a scanner generator. The generator already works fine. But when using character classes the algorithm gets very slow.
The scanner generator pr
There are a number of ways to handle it. They all boil down to treating sets of characters at a time in the data structures, instead of enumerating the entire alphabet ever at all. It's also how you make scanners for Unicode in a reasonable amount of memory.
You've many choices about how to represent and process sets of characters. I'm presently working with a solution that keeps an ordered list of boundary conditions and corresponding target states. You can process operations on these lists much faster than you could if you had to scan the entire alphabet at each juncture. In fact, it's fast enough that it runs in Python with acceptable speed.