How do I implement a lexer given that I have already implemented a basic regular expression matcher?

前端 未结 2 824
太阳男子
太阳男子 2021-02-10 05:00

I\'m trying to implement a lexer for fun. I have already implemented a basic regular expression matcher(by first converting a pattern to a NFA and then to a DFA). Now I\'m cluel

2条回答
  •  滥情空心
    2021-02-10 05:55

    I've done pretty much the same thing. The way I did it was to combine all the expressions in one pretty big NFA and converted that same thing into one DFA. When doing that keep track of the states that previously were accepting states in each corresponding original DFA and their precedence.

    The generated DFA will have many states that are accepting states. You run this DFA until it recieves a character that it has no corresponding transitions for. If the DFA is then in an accepting state you will then look at which of your original NFAs that had that accepting state in them. The one that has the highest precedence is the token you're going to return.

    This does not handle regular expression lookaheads. These are typically not really needed for lexer work anyway. That would be the job of the parser.

    Such a lexer runs in much the same speed as a single regular expression since there is basically only one DFA for it to run. You can omit converting the NFA altogether for a faster-to-construct algorithm but slower to run. The algorithm is basically the same.

    The source code for the lexer I wrote is freely available on github, should you want to see how I did it.

提交回复
热议问题