Combining lexer and parser in a parser combinator

放肆的年华 提交于 2019-12-01 06:30:28

Table-based parsers require separation of lexical analysis and parsing because of their limited lookahead capability. Looking ahead far enough to combine lexical analysis into the parser would explode the state space.

Combinator-based approaches do not usually suffer this problem, as they are typically doing recursive-descent parsing. Unless otherwise noted by the library author, there is no harm in combining the phases and not much to gain by separating them.

Although uu-parsinglib provides the Str class to abstract over different string-like inputs, looking at its definition shows that it still assumes that you are ultimately reading a sequence of Char, whether they be from a String, ByteString, Text, etc. So trying to get it to parse a MyToken stream seems like it could be difficult. Parsec might be a better choice if you feel you need to do that.

As to your question about your string implementation, combinators take a string-like input containing syntactic structure and return the corresponding semantic value, if they match. Inside the combinator, you get to build that semantic value from what you parse directly by taking from the input stream and by combining the semantic values from sub-combinators you call.

So, your 'String matching' combinator in your example will have a list of tokens in its scope thanks to the parsing it did. You can use the full power of Haskell to combine those tokens into a single MyString value in whatever way makes sense for your language: Maybe a 'SplicedString' type that represents what values are to be sliced into it.

The string combinator was probably called by an 'expression' combinator, which will be able to combine the MyString value with other parsed values into a MyExpression value. It's combinators returning semantic values all the way back up!

I think there is nothing in uu-parsinglib which prevents you from using an input different from Text. It is only that for Text (and friends) we have provided quite some functions you are likely to need. If you look at the older uulib parser combinators you will find a scanner based approach, which can be used just as well with the newer uu-parsinglib.

If you want to process a lot of data maybe it is better to have separate scannning phase. Error messages tend to be more informative. In the uulib you will find some support for writing your scanner (most languages somehow put some special restrictions/requirements on lexical structure that quite some tools will (fail/need to be adapted) to create your scanner (e.g. the offside rule))

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!