RegEx Removing Methods from Code

后端 未结 2 1238
-上瘾入骨i
-上瘾入骨i 2021-01-06 08:49

With Regular Expressions I\'m trying to remove all the methods/functions from the following code. Leaving the \"global scope\" alone. However, I can\'t manage to make it mat

2条回答
  •  挽巷
    挽巷 (楼主)
    2021-01-06 09:36

    You can't do this properly with regex. You need to write a parser that can properly parse comments, string literals and nested brackets.

    Regex cannot cope with these cases:

    class Hello
    {
      function foo()
      {
        echo '} <- that is not the closing bracket!';
        // and this: } bracket isn't the closing bracket either!
        /*
        } and that one isn't as well...
        */
      }
    }
    

    EDIT

    Here's a little demo of how to use the tokenizer function mentioned by XUE Can:

    $source = <<global();
    
    function asodaosdo() {
    
    }
    
    ?>
    BLOCK;
    
    if (!defined('T_ML_COMMENT')) {
       define('T_ML_COMMENT', T_COMMENT);
    } 
    else {
       define('T_DOC_COMMENT', T_ML_COMMENT);
    }
    
    // Tokenize the source
    $tokens = token_get_all($source);
    
    // Some flags and counters
    $tFunction = false;
    $functionBracketBalance = 0;
    $buffer = '';
    
    // Iterate over all tokens
    foreach ($tokens as $token) {
        // Single-character tokens.
        if(is_string($token)) {
            if(!$tFunction) {
                echo $token;
            }
            if($tFunction && $token == '{') {
                // Increase the bracket-counter (not the class-brackets: `$tFunction` must be true!)
                $functionBracketBalance++;
            }
            if($tFunction && $token == '}') {
                // Decrease the bracket-counter (not the class-brackets: `$tFunction` must be true!)
                $functionBracketBalance--;
                if($functionBracketBalance == 0) {
                    // If it's the closing bracket of the function, reset `$tFunction`
                    $tFunction = false;
                }
            }
        } 
        // Tokens consisting of (possibly) more than one character.
        else {
            list($id, $text) = $token;
            switch ($id) {
                case T_PUBLIC:
                case T_PROTECTED:
                case T_PRIVATE: 
                    // Don'timmediately echo 'public', 'protected' or 'private'
                    // before we know if it's part of a variable or method.
                    $buffer = "$text ";
                    break; 
                case T_WHITESPACE:
                    // Only display spaces if we're outside a function.
                    if(!$tFunction) echo $text;
                    break;
                case T_FUNCTION:
                    // If we encounter the keyword 'function', flip the `tFunction` flag to 
                    // true and reset the `buffer` 
                    $tFunction = true;
                    $buffer = '';
                    break;
                default:
                    // Echo all other tokens if we're not in a function and prepend a possible 
                    // 'public', 'protected' or 'private' previously put in the `buffer`.
                    if(!$tFunction) {
                        echo "$buffer$text";
                        $buffer = '';
                    }
           }
       }
    }
    

    which will print:

    global();
    
    
    
    ?>
    

    which is the original source, only without functions.

提交回复
热议问题