How do I build Perl regular expressions dynamically?

后端 未结 6 1505
-上瘾入骨i
-上瘾入骨i 2021-02-08 15:24

I have a Perl script that traverses a directory hierarchy using File::Next::files. It will only return to the script files that end in \".avi\", \".flv\", \".mp3\", \".mp4\", a

相关标签:
6条回答
  • 2021-02-08 16:00

    If you want to build a potentially large regexp and don't want to bother debugging the parentheses, use a Perl module to build it for you!

    use strict;
    use Regexp::Assemble;
    
    my $re = Regexp::Assemble->new->add(qw(avi flv mp3 mp4 wmv));
    
    ...
    
    if ($file =~ /$re/) {
        # a match!
    }
    
    print "$re\n"; # (?:(?:fl|wm)v|mp[34]|avi)
    
    0 讨论(0)
  • 2021-02-08 16:08

    Lets say that you use Config::General for you config-file and that it contains these lines:

    <MyApp>
        extensions    avi flv mp3 mp4 wmv
        unwanted      frames svn
    </MyApp>
    

    You could then use it like so (see the Config::General for more):

    my $conf = Config::General->new('/path/to/myapp.conf')->getall();
    my $extension_string = $conf{'MyApp'}{'extensions'};
    
    my @extensions = split m{ }, $extension_string;
    
    # Some sanity checks maybe...
    
    my $regex_builder = join '|', @extensions;
    
    $regex_builder = '.(' . $regex_builder . ')$';
    
    my $regex = qr/$regex_builder/;
    
    if($file =~ m{$regex}) {
        # Do something.
    }
    
    
    my $uw_regex_builder = '.(' . join ('|', split (m{ }, $conf{'MyApp'}{'unwanted'})) . ')$';
    my $unwanted_regex = qr/$uw_regex_builder/;
    
    if(File::Next::dir !~ m{$unwanted_regex}) {
        # Do something. (Note that this does not enforce /^.svn$/. You
        # will need some kind of agreed syntax in your conf-file for that.
    }
    

    (This is completely untested.)

    0 讨论(0)
  • 2021-02-08 16:12

    Although File::Find::Rule already has ways to deal with this, in similar cases you don't really want a regex. The regex doesn't buy you much here because you're looking for a fixed sequence of characters at the end of each filename. You want to know if that fixed sequence is in a list of sequences that interest you. Store all the extensions in a hash and look in that hash:

    my( $extension ) = $filename =~ m/\.([^.]+)$/;
    if( exists $hash{$extension} ) { ... }
    

    You don't need to build up a regular expression, and you don't need to go through several possible regex alternations to check every extension you have to examine.

    0 讨论(0)
  • 2021-02-08 16:15

    Its reasonably straight forward with File::Find::Rule, just a case of creating the list before hand.

    use strict;
    use warnings;
    use aliased 'File::Find::Rule';
    
    
    # name can do both styles. 
    my @ignoredDirs = (qr/^.svn/,  '*.frames' );
    my @wantExt = qw( *.avi *.flv *.mp3 );
    
    my $finder = Rule->or( 
        Rule->new->directory->name(@ignoredDirs)->prune->discard, 
        Rule->new->file->name(@wantExt)
    );
    
    $finder->start('./');
    
    while( my $file = $finder->match() ){
        # Matching file.
    }
    

    Then its just a case of populating those arrays. ( Note: above code also untested, but will likely work ). I'd generally use YAML for this, it makes life easier.

    use strict;
    use warnings;
    use aliased 'File::Find::Rule';
    use YAML::XS;
    
    my $config = YAML::XS::Load(<<'EOF');
    ---
    ignoredir:
    - !!perl/regexp (?-xism:^.svn)
    - '*.frames'
    want:
    - '*.avi'
    - '*.flv'
    - '*.mp3'
    EOF
    
    my $finder = Rule->or( 
        Rule->new->directory->name(@{ $config->{ignoredir} })->prune->discard, 
        Rule->new->file->name(@{ $config->{want} })
    );
    
    $finder->start('./');
    
    while( my $file = $finder->match() ){
        # Matching file.
    }
    

    Note Using the handy module 'aliased.pm' which imports "File::Find::Rule" for me as "Rule".

    • File::Find::Rule - Alternative interface to File::Find
    • YAML::XS - Perl YAML Serialization using XS and libyaml
    • aliased - Use shorter versions of class names.
    0 讨论(0)
  • 2021-02-08 16:20

    Build it like you would a normal string and then use interpolation at the end to turn it into a compiled regex. Also be careful, you are not escaping . or putting it in a character class, so it means any character (rather than a literal period).

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    my (@ext, $dir, $dirp);
    while (<DATA>) {
        next unless my ($key, $val) = /^ \s* (ext|dirp|dir) \s* = \s* (\S+)$/x;
        push @ext, $val if $key eq 'ext';
        $dir = $val     if $key eq 'dir';
        $dirp = $val    if $key eq 'dirp';
    }
    
    my $re = join "|", @ext;
    $re = qr/[.]($re)$/;
    
    print "$re\n";
    
    while (<>) {
        print /$re/ ? "matched" : "didn't match", "\n";
    }
    
    __DATA__
    ext = avi
    ext = flv
    ext = mp3
    dir = .svn
    dirp= .frames
    
    0 讨论(0)
  • 2021-02-08 16:24

    Assuming that you've parsed the configuration file to get a list of extensions and ignored directories, you can build the regular expression as a string and then use the qr operator to compile it into a regular expression:

    my @extensions = qw(avi flv mp3 mp4 wmv);  # parsed from file
    my $pattern    = '\.(' . join('|', @wanted) . ')$';
    my $regex      = qr/$pattern/;
    
    if ($file =~ $regex) {
        # do something
    }
    

    The compilation isn't strictly necessary; you can use the string pattern directly:

    if ($file =~ /$pattern/) {
        # do something
    }
    

    Directories are a little harder because you have two different situations: full names and suffixes. Your configuration file will have to use different keys to make it clear which is which. e.g. "dir_name" and "dir_suffix." For full names I'd just build a hash:

    %ignore = ('.svn' => 1);
    

    Suffixed directories can be done the same way as file extensions:

    my $dir_pattern = '(?:' . join('|', map {quotemeta} @dir_suffix), ')$';
    my $dir_regex   = qr/$dir_pattern/;
    

    You could even build the patterns into anonymous subroutines to avoid referencing global variables:

    my $file_filter    = sub { $_ =~ $regex };
    my $descend_filter = sub {
        ! $ignore{$File::Next::dir} &&
        ! $File::Next::dir =~ $dir_regex;
    };
    
    my $iter = File::Next::files({
        file_filter    => $file_filter,
        descend_filter => $descend_filter,
    }, $directory);
    
    0 讨论(0)
提交回复
热议问题