Does Pyparsing Support Context-Sensitive Grammars?

邮差的信 提交于 2019-12-03 15:44:17

It is in fact a grammar for a context-sensitive language, classically abstracted as wcw where w is in (a|b)* (note that wcw' , where ' indicates reversal, is context-free).

Parsing Expression Grammars are capable of parsing wcw-type languages by using semantic predicates. PyParsing provides the matchPreviousExpr() and matchPreviousLiteral() helper methods for this very purpose, e.g.

w = Word("ab")
s = w + "c" + matchPreviousExpr(w)

So in your case you'd probably do something like

table_name = Word(alphas, alphanums)
object = Literal("OBJECT") + "=" + table_name + ... +
  Literal("END_OBJECT") + "=" +matchPreviousExpr(table_name)

As a general rule, parsers are built as context-free parsing engines. If there is context sensitivity, it is grafted on after parsing (or at least after the relevant parsing steps are completed).

In your case, you want to write context-free grammar rules:

  head = 'OBJECT' '=' IDENTIFIER ;
  tail = 'END_OBJECT'  '=' IDENTIFIER ;
  element = IDENTIFIER '=' value ;
  element_list = element ;
  element_list = element_list element ;
  block = head element_list tail ;

The checks that the head and tail constructs have matching identifiers isn't technically done by the parser.

Many parsers, however, allow a semantic action to occur when a syntactic element is recognized, often for the purpose of building tree nodes. In your case, you want to use this to enable additional checking. For element, you want to make sure the IDENTIFIER isn't a duplicate of something already in the block; this means for each element encountered, you'll want to capture the corresponding IDENTIFIER and make a block-specific list to enable duplicate checking. For block, you want to capture the head *IDENTIFIER*, and check that it matches the tail *IDENTIFIER*.

This is easiest if you build a tree representing the parse as you go along, and hang the various context-sensitive values on the tree in various places (e.g., attach the actual IDENTIFIER value to the tree node for the head clause). At the point where you are building the tree node for the tail construct, it should be straightforward to walk up the tree, find the head tree, and then compare the identifiers.

This is easier to think about if you imagine the entire tree being built first, and then a post-processing pass over the tree is used to this checking. Lazy people in fact do it this way :-} All we are doing is pushing work that could be done in the post processing step, into the tree-building steps attached to the semantic actions.

None of these concepts is python specific, and the details for PyParsing will vary somewhat.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!