adjacency as an operator - can any lexer handle it?

前端 未结 3 767
眼角桃花
眼角桃花 2021-01-21 17:08

Say a language defines adjacency of two mathematical unicode alphanumerical symbols as an operator. Say,

3条回答
  •  迷失自我
    2021-01-21 17:49

    Invisible operators cannot be recognized with lexical analysis, for reasons which should be more or less obvious. You can only deduce the presence of an invisible operator by analyzing the syntactic context, which is the role of a parser.

    Of course, most lexical analysis tools allow arbitrary code to be executed for each recognized token, so nothing stops you from building a state machine, or even a complete parser, into the lexical scanner. That is rarely good design.

    If your language is unambiguous, then there is no problem handling adjacency in your grammar. But some care must be taken. For example, you would rarely want x-4 to be parsed as a multiplication of x and -4, but a naive grammar which included, eg.,

    expr -> term | expr '-' term
    term -> factor | term factor | term '*' factor
    factor -> ID | NUMBER | '(' expr ')' | '-' factor
    

    would include that ambiguity. To resolve it, you need to disallow the adjacency production with a second operand starting with a unary operator:

    expr -> term | expr '-' term
    term -> factor | term item | term '*' factor
    factor -> item | '-' factor
    item -> ID | NUMBER | '(' expr ')'
    

    Note the difference between term -> term '*' factor, which allows x * - y, and term -> term base, which does not allow x - y (expr -> expr '-' term recognizes x - y as a subtraction).

    For examples of context-free grammars which allow adjacency as an operator, see, for example, Awk, in which adjacency represents string concatenation, and Haskell, in which it represents function application.


    Since this question comes up from time to time, there are a number of relevant answers already on SO. Here are a few:

    • Parsing a sequence of expressions using yacc. Invisible function application operator. Uses yacc/bison; includes both explicit and precedence-based solutions

    • yacc - Precedence of a rule with no operator? Invisible string concatenation operator. Uses Ply (Python parser generator)

    • Concatenation shift-reduce conflict Another invisible concatenation operator. Uses JavaCUP.

    • Parsing a sequence of expressions using yacc Invisible function application operator. Uses fsyacc (F# parser generator)

    • Using yacc precedence for rules with no terminals, only non-terminals. Adjacency in ordinary mathematical expressions. Uses yacc/bison with precedence rules.

    • bison/yacc - limits of precedence settings. Haskell-like function application adjacency. Uses yacc/bison with precedence rules.

提交回复
热议问题