Is there a regular expression to detect a valid regular expression?

前端 未结 8 1023
天命终不由人
天命终不由人 2020-11-22 09:48

Is it possible to detect a valid regular expression with another regular expression? If so please give example code below.

相关标签:
8条回答
  • 2020-11-22 10:18

    Good question.

    True regular languages can not decide arbitrarily deeply nested well-formed parenthesis. If your alphabet contains '(' and ')' the goal is to decide if a string of these has well-formed matching parenthesis. Since this is a necessary requirement for regular expressions the answer is no.

    However, if you loosen the requirement and add recursion you can probably do it. The reason is that the recursion can act as a stack letting you "count" the current nesting depth by pushing onto this stack.

    Russ Cox wrote "Regular Expression Matching Can Be Simple And Fast" which is a wonderful treatise on regex engine implementation.

    0 讨论(0)
  • 2020-11-22 10:18

    Though it is perfectly possible to use a recursive regex as MizardX has posted, for this kind of things it is much more useful a parser. Regexes were originally intended to be used with regular languages, being recursive or having balancing groups is just a patch.

    The language that defines valid regexes is actually a context free grammar, and you should use an appropriate parser for handling it. Here is an example for a university project for parsing simple regexes (without most constructs). It uses JavaCC. And yes, comments are in Spanish, though method names are pretty self-explanatory.

    SKIP :
    {
        " "
    |   "\r"
    |   "\t"
    |   "\n"
    }
    TOKEN : 
    {
        < DIGITO: ["0" - "9"] >
    |   < MAYUSCULA: ["A" - "Z"] >
    |   < MINUSCULA: ["a" - "z"] >
    |   < LAMBDA: "LAMBDA" >
    |   < VACIO: "VACIO" >
    }
    
    IRegularExpression Expression() :
    {
        IRegularExpression r; 
    }
    {
        r=Alternation() { return r; }
    }
    
    // Matchea disyunciones: ER | ER
    IRegularExpression Alternation() :
    {
        IRegularExpression r1 = null, r2 = null; 
    }
    {
        r1=Concatenation() ( "|" r2=Alternation() )?
        { 
            if (r2 == null) {
                return r1;
            } else {
                return createAlternation(r1,r2);
            } 
        }
    }
    
    // Matchea concatenaciones: ER.ER
    IRegularExpression Concatenation() :
    {
        IRegularExpression r1 = null, r2 = null; 
    }
    {
        r1=Repetition() ( "." r2=Repetition() { r1 = createConcatenation(r1,r2); } )*
        { return r1; }
    }
    
    // Matchea repeticiones: ER*
    IRegularExpression Repetition() :
    {
        IRegularExpression r; 
    }
    {
        r=Atom() ( "*" { r = createRepetition(r); } )*
        { return r; }
    }
    
    // Matchea regex atomicas: (ER), Terminal, Vacio, Lambda
    IRegularExpression Atom() :
    {
        String t;
        IRegularExpression r;
    }
    {
        ( "(" r=Expression() ")" {return r;}) 
        | t=Terminal() { return createTerminal(t); }
        | <LAMBDA> { return createLambda(); }
        | <VACIO> { return createEmpty(); }
    }
    
    // Matchea un terminal (digito o minuscula) y devuelve su valor
    String Terminal() :
    {
        Token t;
    }
    {
        ( t=<DIGITO> | t=<MINUSCULA> ) { return t.image; }
    }
    
    0 讨论(0)
提交回复
热议问题