Regex required: It should match for following patterns

后端 未结 3 1937
遇见更好的自我
遇见更好的自我 2021-01-25 11:25

Valid:

  1. ((int)10)
  2. (int)10
  3. ((char)((x+y)&1))
  4. ((int *)1)

Invalid:

相关标签:
3条回答
  • 2021-01-25 11:38

    Adding to aioobe's answer:

    Looks like you're trying to write an expression parser. As already said in the other answer it is not possible using a regex. You should consider using an expression parser such as JEP or write one yourself using javacc.

    0 讨论(0)
  • 2021-01-25 11:43

    The language of (balanced) parenthesized expressions is not regular, i.e., you can't write a regular expressions matching these kind of strings.

    See SO question: Why are regular expressions called "regular" expressions and Wikipedia: Regular Languages.

    You need to work with a more capable parsing technique such as a CFG with for instance ANTLR.

    You could start with something like:

    CastedExpression ::= Cast Expression | LPAR CastedExpression RPAR
    Cast             ::= LPAR Type RPAR
    Expression       ::= Sum | Product | Litteral | LPAR Expression RPAR | ...
    Type             ::= char | int | Type ASTERISK | ...
    

    (Feel free to edit grammar above if you find any obvious improvements).

    0 讨论(0)
  • 2021-01-25 11:58

    This statement:

    The language of (balanced) parenthesized expressions is not regular, i.e., you can't write a regular expressions matching these kind of strings.

    is true only of classic regular expressions in the pathologically formal sense. It does not apply to the practical patterns which many of us use daily.

    For example, using the third string from the original list of valid inputs, this Perl code:

    my $str = "((char)((x+y)&1))";
    my $w   = length length $str ;
    my $rx  = qr{ (?<PAREN>
                    \(
                       (?:
                           [^()] +
                         |
                           (?&PAREN)
                       ) *
                    \)
                  )
              }x;
    
    while ($str =~ /(?=$rx)/g) {
        printf "Matched from %*d to %*d: %s%s\n" =>
            $w => pos($str),
            $w => pos($str) + length($+{PAREN})-1,
            " " x pos($str)   =>     $+{PAREN};
    }
    

    quite handily produces the following output:

    Matched from  0 to 16: ((char)((x+y)&1))
    Matched from  1 to  6:  (char)
    Matched from  7 to 15:        ((x+y)&1)
    Matched from  8 to 12:         (x+y)
    

    I can't tell from eyeballing the original set of inputs just what it is that makes one valid and the other invalid. Still, I'm sure some elaboration of the code I gave above will work perfectly fine.

    However, you will have to write it in Perl, as Java's patterns just aren't powerful enough. ☹

    0 讨论(0)
提交回复
热议问题