Flex default rule

随声附和 提交于 2019-12-12 07:23:48

问题


How do I customize the default action for flex. I found something like <*> but when I run it it says "flex scanner jammed"? Also the . rule only adds a rule so it does not work either. What I want is

comment               "/*"[^"*/"]*"*/"

%%
{comment}             return 1;
{default}             return 0; 
<<EOF>>               return -1;

Is it possible to change the behavior of matching longest to match first? If so I would do something like this

default               (.|\n)*

but because this almost always gives a longer match it will hide the comment rule.

EDIT

I found the {-} operator in the manual, however this example straight from the manual gives me "unrecogized rule":

[a-c]{-}[b-z]


回答1:


The flex default rule matches a single character and prints it on standard output. If you don't want that action, write an explicit rule which matches a single character and does something else.

The pattern (.|\n)* matches the entire input file as a single token, so that is a very bad idea. You're thinking that the default should be a long match, but in fact you want that to be as short as possible (but not empty).

The purpose of the default rule is to do something when there is no match for any of the tokens in the input language. When lex is used for tokenizing a language, such a situation is almost always erroneous because it means that the input begins with a character which is not the start of any valid token of the language.

Thus, a "catch any character" rule is coded as a form of error recovery. The idea is to discard the bad character (just one) and try tokenizing from the character after that one. This is only a guess, but it's a good guess because it's based on what is known: namely that there is one bad character in the input.

The recovery rule can be wrong. For instance suppose that no token of the language begins with @, and the programmer wanted to write the string literal "@abc". Only, she forgot the opening " and wrote @abc". The right fix is to insert the missing ", not to discard the @. But that would require a much more clever set of rules in the lexer.

Anyway, usually when discarding a bad character, you want to issue an error message for this case like "skipping invalid character '~` in line 42, column 3".

The default rule/action of copying the unmatched character to standard output is useful when lex is used for text filtering. The default rule then brings about the semantics of a regex search (as opposed to a regex match): the idea is to search the input for matches of the lexer's token-recognizing state machine, while printing all material that is skipped by that search.

So for instance, a lex specification containing just the rule:

 "foo" { printf("bar"); }

will implement the equivalent of

 sed -e 's/foo/bar/g'



回答2:


There's no such thing. This sounds suspiciously like an XY problem—you've asked us how to customise flex's default action (Y), but you're really wanting it to achieve some other end, X.

What's X?

Re: your question, why does adding "." not do the trick? You can't perform an action in the absence of a matched amount, so the question as asked may make no sense. flex won't do anything if there is no match, so to add a "default" rule, just make it match something.




回答3:


I solved the problem manually instead if trying to match the complement of a rule. This works fine because the matching pattern involved in this case is quite simple.



来源:https://stackoverflow.com/questions/10267307/flex-default-rule

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!