Simplifying regex OR patterns

后端 未结 3 1622
后悔当初
后悔当初 2021-02-09 23:57

I was asked today if there was a library to take a list of strings and to compute the most efficient regex to match only those strings. I think it\'s an NP Complete problem by i

相关标签:
3条回答
  • 2021-02-10 00:12

    Regexp::Assemble::Compressed / Regexp::Assemble know far more tricks than PreSuf. R::A comes with the command-line tool assemble (not installed by default) which makes building regexes even easier.

    0 讨论(0)
  • 2021-02-10 00:21

    The Perl regex compiler builds a branching trie data structure out of patterns with parts in common across alternatives:

     $ perl -Mre=debug -ce '"whatever" =~ /appserver1\.domain\.tld|appserver2\.domain\.tld|appserver3\.domain\.tld/'
    Compiling REx "appserver1\.domain\.tld|appserver2\.domain\.tld|appserver3\."...
    Final program:
       1: EXACT <appserver> (5)
       5: TRIEC-EXACT[123] (25)
          <1.domain.tld> 
          <2.domain.tld> 
          <3.domain.tld> 
      25: END (0)
    anchored "appserver" at 0 (checking anchored) minlen 21 
    -e syntax OK
    Freeing REx: "appserver1\.domain\.tld|appserver2\.domain\.tld|appserver3\."...
    
    0 讨论(0)
  • 2021-02-10 00:32

    The Regex::PreSuf module is designed to do exactly this.

    To quote the Synopsis:

    use Regex::PreSuf;
    
    my $re = presuf(qw(foobar fooxar foozap));
    
    # $re should be now 'foo(?:zap|[bx]ar)'
    
    0 讨论(0)
提交回复
热议问题