Composed Regular Expressions - breaking a regex down into a readable form

后端 未结 3 1432
不思量自难忘°
不思量自难忘° 2021-01-26 07:39

I was reading an article put together by Martin Fowler regarding Composed Regular Expressions. This is where you might take code such as this:

const string patt         


        
相关标签:
3条回答
  • 2021-01-26 08:13

    I deal with this in PHP by using associative arrays and PHP's version of the tr function (I assume a similar data structure and function exists in any language).

    The array looks like this:

    $mappings = array ( 
      'a' => '[a-z0-9]',
      'd' => '[0-9]', 
      's' => '\s+', //and so on 
    );
    

    Then when I put them to use, it's just a matter of merging with the tr function. Mapped stuff gets converted, and unmapped stuff falls through:

     $regexp = strtr( $simplified_string, $mappings) ;
    

    Bear in mind that this approach can just as easily overcomplicate things as it can simplify them. You're still writing out patterns, it's just that you've abstracted one pattern into another. Nevertheless, having these poor-man's character classes can be useful in outsourcing regexp's to devs or spec providers that don't speak the language.

    0 讨论(0)
  • 2021-01-26 08:21

    Yes, absolutely. Regexes are powerful, but because of their terse syntax, extremely unreadable. When I read a comment such as "this matches an URI", that doesn't actually help me figure out how it does that, and where I should look to (for example) fix a bug where it doesn't match some obscure corner case in query string properly. Regex is code; document it as you'd document a function. If it's short and (reasonably) clear, a single comment for the entire regex is fine. If it's complicated, clearly highlight and comment individual parts. If it's really complex, split it into several regexes.

    0 讨论(0)
  • 2021-01-26 08:25

    It is fairly easy to read if you can have extended syntax.

    /^
      score   \s+ (\d+) \s+
      for     \s+ (\d+) \s+
      nights? \s+  at   \s+ (.*)
    /x
    

    I personally prefer Perl 6 style regex. I think they're easier to read.

    rule pattern{
      score        $<score>= [ <.digits>+ ]
      for          $<nights>=[ <.digits>+ ]
      night[s]? at $<hotel>= [ .+ ]
    }
    

    After you perform a match against that rule $/ is associated with the matched text.

    So something like this:

    say "Hotel $/<hotel>";
    say $/.perl;
    

    Would output something like this

    Hotel name of hotel
    {
      'hotel'  => 'name of hotel',
      'nights' => 5,
      'score'  => 8
    }
    
    0 讨论(0)
提交回复
热议问题