Matching trailing context in flex

百般思念 提交于 2019-12-11 08:55:17

问题


In the flex manual it mentions a "trailing context" pattern (r/s), which means r, but only if followed by s. However the following code doesn't compile (instead it gives an error of "unrecognized rule". Why?

LITERAL a/b
%%
{LITERAL} { }

回答1:


The simple answer is that unless you use the -l option, which is not recommended, you cannot put trailing context into a name definition. That's because flex:

  • doesn't allow trailing context inside parentheses; and

  • automatically surrounds expansions of definitions with parentheses, except in a few situations (see below).

The reason flex surrounds expansions with parentheses is that otherwise weird things happen. For example:

prefix        milli|centi
%%
{prefix}pede  return BUG;

Without the automatic parentheses, the pattern would expand to:

milli|centipede

which would not match millipede. (There's a similar problem with the various postfix operators. Consider {prefix}?pede, for example.)

Flex doesn't allow trailing context inside parentheses because many such expressions are harder to compile. In effect, you can end up writing patterns which are the intersection of two regular expressions. (For example, ({base}/{a}){b} matches {base} followed by a {b} which is either a prefix or a projection of an {a}.) These are still regular expressions, but they aren't contemplated by the Thomson algorithm for turning regular expressions into finite state machines. Since the feature is rarely if ever needed, no attempt was ever made to implement it.

Unfortunately, banning trailing context inside parentheses also bans redundant parentheses around patterns which include trailing context, and this includes definition expansions because definitions are expanded with possibly redundant parentheses.

The original AT&T lex did not add the parentheses, which is why forcing lex-compatibility with -l allows your flex file to compile. However, it may result in all sorts of other problems, as indicated above, so I wouldn't recommend it.

Also, "trailing context" here means either a full pattern of the form r/s or of the form r$. Putting r/s inside parentheses (whether explicitly or implicitly) produces an error message, but putting r$ inside parentheses just makes the $ match a $ character, instead of forcing the pattern to match at the end of a line. No error or warning is emitted in this case.

That would make it impossible to use $ (or ^) inside a name definition. However, at some point prior to version 2.3.53, a hack was inserted which suppresses the parentheses if the definition starts with ^ or ends with $. And, for reasons I don't fully understand, it also suppresses the parentheses if the expansion occurs at the end of trailing context. This might be a bug, and indeed there is a bug report relating to it.




回答2:


I found the answer to your problem in the FAQ of the info pages of flex: "Your problem is that some of the definitions in the scanner use the '/' trailing context operator, and have it enclosed in ()'s. Flex does not allow this operator to be enclosed in ()'s because doing so allows undefined regular expressions such as "(a/b)+". So the solution is to remove the parentheses. Note that you must also be building the scanner with the -l option for AT&T lex compatibility. Without this option, flex automatically encloses the definitions in parentheses." (quote from Vern Paxson). See also FAQ trailing context

The use of trailing contexts is better avoided when possible. As it is described above it is not allowed in nested expressions. Your example does work with the -l option.



来源:https://stackoverflow.com/questions/15558684/matching-trailing-context-in-flex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!