Match the body of a function using Regex

前端 未结 2 1326
暖寄归人
暖寄归人 2021-01-14 20:42

Given a dummy function as such:

public function handle()
{
  if (isset($input[\'data\']) {
    switch($data) {
      ...
    }
  } else {
    switch($data) {         


        
2条回答
  •  臣服心动
    2021-01-14 21:41

    Update #2

    According to others comments

    ^\s*[\w\s]+\(.*\)\s*\K({((?>"(?:[^"\\]*+|\\.)*"|'(?:[^'\\]*+|\\.)*'|//.*$|/\*[\s\S]*?\*/|#.*$|<<<\s*["']?(\w+)["']?[^;]+\3;$|[^{}<'"/#]++|[^{}]++|(?1))*)})
    

    Note: A short RegEx i.e. {((?>[^{}]++|(?R))*)} is enough if you know your input does not contain { or } out of PHP syntax.

    So a long RegEx, in what evil cases does it work?

    1. You have [{}] in a string between quotation marks ["']
    2. You have those quotation marks escaped inside one another
    3. You have [{}] in a comment block. //... or /*...*/ or #...
    4. You have [{}] in a heredoc or nowdoc << or <<<['"]STR['"]

    Otherwise it is meant to have a pair of opening/closing braces and depth of nested braces is not important.

    Do we have a case that it fails?

    No unless you have a martian that lives inside your codes.

     ^ \s* [\w\s]+ \( .* \) \s* \K               # how it matches a function definition
     (                             # (1 start)
          {                                      # opening brace
          (                             # (2 start)
               (?>                               # atomic grouping (for its non-capturing purpose only)
                    "(?: [^"\\]*+ | \\ . )*"     # double quoted strings
                 |  '(?: [^'\\]*+ | \\ . )*'     # single quoted strings
                 |  // .* $                      # a comment block starting with //
                 |  /\* [\s\S]*? \*/             # a multi line comment block /*...*/
                 |  \# .* $                      # a single line comment block starting with #...
                 |  <<< \s* ["']?                # heredocs and nowdocs
                    ( \w+ )                      # (3) ^
                    ["']? [^;]+ \3 ; $           # ^
                 |  [^{}<'"/#]++                 # force engine to backtack if it encounters special characters [<'"/#] (possessive)
                 |  [^{}]++                      # default matching bahaviour (possessive)
                 |  (?1)                         # recurse 1st capturing group
               )*                                # zero to many times of atomic group
          )                             # (2 end)
          }                                      # closing brace
     )                             # (1 end)
    

    Formatting is done by @sln's RegexFormatter software.

    What I provided in live demo?

    Laravel's Eloquent Model.php file (~3500 lines) randomly is given as input. Check it out: Live demo

提交回复
热议问题