Flex default rule

后端 未结 3 1350
渐次进展
渐次进展 2021-01-06 11:19

How do I customize the default action for flex. I found something like <*> but when I run it it says \"flex scanner jammed\"? Also the . rule only adds a rule so it does

相关标签:
3条回答
  • 2021-01-06 12:01

    There's no such thing. This sounds suspiciously like an XY problem—you've asked us how to customise flex's default action (Y), but you're really wanting it to achieve some other end, X.

    What's X?

    Re: your question, why does adding "." not do the trick? You can't perform an action in the absence of a matched amount, so the question as asked may make no sense. flex won't do anything if there is no match, so to add a "default" rule, just make it match something.

    0 讨论(0)
  • 2021-01-06 12:01

    I solved the problem manually instead if trying to match the complement of a rule. This works fine because the matching pattern involved in this case is quite simple.

    0 讨论(0)
  • 2021-01-06 12:08

    The flex default rule matches a single character and prints it on standard output. If you don't want that action, write an explicit rule which matches a single character and does something else.

    The pattern (.|\n)* matches the entire input file as a single token, so that is a very bad idea. You're thinking that the default should be a long match, but in fact you want that to be as short as possible (but not empty).

    The purpose of the default rule is to do something when there is no match for any of the tokens in the input language. When lex is used for tokenizing a language, such a situation is almost always erroneous because it means that the input begins with a character which is not the start of any valid token of the language.

    Thus, a "catch any character" rule is coded as a form of error recovery. The idea is to discard the bad character (just one) and try tokenizing from the character after that one. This is only a guess, but it's a good guess because it's based on what is known: namely that there is one bad character in the input.

    The recovery rule can be wrong. For instance suppose that no token of the language begins with @, and the programmer wanted to write the string literal "@abc". Only, she forgot the opening " and wrote @abc". The right fix is to insert the missing ", not to discard the @. But that would require a much more clever set of rules in the lexer.

    Anyway, usually when discarding a bad character, you want to issue an error message for this case like "skipping invalid character '~` in line 42, column 3".

    The default rule/action of copying the unmatched character to standard output is useful when lex is used for text filtering. The default rule then brings about the semantics of a regex search (as opposed to a regex match): the idea is to search the input for matches of the lexer's token-recognizing state machine, while printing all material that is skipped by that search.

    So for instance, a lex specification containing just the rule:

     "foo" { printf("bar"); }
    

    will implement the equivalent of

     sed -e 's/foo/bar/g'
    
    0 讨论(0)
提交回复
热议问题