问题
(I'm just learning how to write a compiler, so please correct me if I make any incorrect claims)
Why would anyone still implement DFAs in code (goto statements, table-driven implementations) when they can simply use regular expressions? As far as I understand, lexical analyzers take in a string of characters and churn out a list of tokens which, in the languages' grammar definition, are terminals, making it possible for them to be described by a regular expression. Wouldn't it be easier to just loop over a bunch of regexes, breaking out of the loop if it finds a match?
回答1:
You're absolutely right that it's easier to write regular expressions than DFAs. However, A good question to think about is
How do these regex matchers work?
Most very fast implementations of regex matchers work by compiling down to some type of automaton (either an NFA or a minimum-state DFA) internally. If you wanted to build a scanner that worked by using regexes to describe which tokens to match and then looping through all of them, you could absolutely do so, but internally they'd probably compile to DFAs.
It's extremely rare to see anyone actually code up a DFA for doing scanning or parsing because it's just so complicated. This is why there are tools like lex
or flex
, which let you specify the regexes to match and then automatically compile down to DFAs behind the scenes. That way, you get the best of both worlds - you describe what to match using the nicer framework for regexes, but you get the speed and efficiency of DFAs behind the scenes.
One more important detail about building a giant DFA is that it is possible to build a single DFA that tries matching multiple different regular expressions in parallel. This increases efficiency, since it's possible to run the matching DFA over the string in a way that will concurrently search for all possible regex matches.
Hope this helps!
来源:https://stackoverflow.com/questions/14419614/dfas-vs-regexes-when-implementing-a-lexical-analyzer