I have a Perl script that traverses a directory hierarchy using File::Next::files. It will only return to the script files that end in \".avi\", \".flv\", \".mp3\", \".mp4\", a
If you want to build a potentially large regexp and don't want to bother debugging the parentheses, use a Perl module to build it for you!
use strict;
use Regexp::Assemble;
my $re = Regexp::Assemble->new->add(qw(avi flv mp3 mp4 wmv));
...
if ($file =~ /$re/) {
# a match!
}
print "$re\n"; # (?:(?:fl|wm)v|mp[34]|avi)
Lets say that you use Config::General for you config-file and that it contains these lines:
<MyApp>
extensions avi flv mp3 mp4 wmv
unwanted frames svn
</MyApp>
You could then use it like so (see the Config::General for more):
my $conf = Config::General->new('/path/to/myapp.conf')->getall();
my $extension_string = $conf{'MyApp'}{'extensions'};
my @extensions = split m{ }, $extension_string;
# Some sanity checks maybe...
my $regex_builder = join '|', @extensions;
$regex_builder = '.(' . $regex_builder . ')$';
my $regex = qr/$regex_builder/;
if($file =~ m{$regex}) {
# Do something.
}
my $uw_regex_builder = '.(' . join ('|', split (m{ }, $conf{'MyApp'}{'unwanted'})) . ')$';
my $unwanted_regex = qr/$uw_regex_builder/;
if(File::Next::dir !~ m{$unwanted_regex}) {
# Do something. (Note that this does not enforce /^.svn$/. You
# will need some kind of agreed syntax in your conf-file for that.
}
(This is completely untested.)
Although File::Find::Rule already has ways to deal with this, in similar cases you don't really want a regex. The regex doesn't buy you much here because you're looking for a fixed sequence of characters at the end of each filename. You want to know if that fixed sequence is in a list of sequences that interest you. Store all the extensions in a hash and look in that hash:
my( $extension ) = $filename =~ m/\.([^.]+)$/;
if( exists $hash{$extension} ) { ... }
You don't need to build up a regular expression, and you don't need to go through several possible regex alternations to check every extension you have to examine.
Its reasonably straight forward with File::Find::Rule, just a case of creating the list before hand.
use strict;
use warnings;
use aliased 'File::Find::Rule';
# name can do both styles.
my @ignoredDirs = (qr/^.svn/, '*.frames' );
my @wantExt = qw( *.avi *.flv *.mp3 );
my $finder = Rule->or(
Rule->new->directory->name(@ignoredDirs)->prune->discard,
Rule->new->file->name(@wantExt)
);
$finder->start('./');
while( my $file = $finder->match() ){
# Matching file.
}
Then its just a case of populating those arrays. ( Note: above code also untested, but will likely work ). I'd generally use YAML for this, it makes life easier.
use strict;
use warnings;
use aliased 'File::Find::Rule';
use YAML::XS;
my $config = YAML::XS::Load(<<'EOF');
---
ignoredir:
- !!perl/regexp (?-xism:^.svn)
- '*.frames'
want:
- '*.avi'
- '*.flv'
- '*.mp3'
EOF
my $finder = Rule->or(
Rule->new->directory->name(@{ $config->{ignoredir} })->prune->discard,
Rule->new->file->name(@{ $config->{want} })
);
$finder->start('./');
while( my $file = $finder->match() ){
# Matching file.
}
Note Using the handy module 'aliased.pm' which imports "File::Find::Rule" for me as "Rule".
Build it like you would a normal string and then use interpolation at the end to turn it into a compiled regex. Also be careful, you are not escaping . or putting it in a character class, so it means any character (rather than a literal period).
#!/usr/bin/perl
use strict;
use warnings;
my (@ext, $dir, $dirp);
while (<DATA>) {
next unless my ($key, $val) = /^ \s* (ext|dirp|dir) \s* = \s* (\S+)$/x;
push @ext, $val if $key eq 'ext';
$dir = $val if $key eq 'dir';
$dirp = $val if $key eq 'dirp';
}
my $re = join "|", @ext;
$re = qr/[.]($re)$/;
print "$re\n";
while (<>) {
print /$re/ ? "matched" : "didn't match", "\n";
}
__DATA__
ext = avi
ext = flv
ext = mp3
dir = .svn
dirp= .frames
Assuming that you've parsed the configuration file to get a list of extensions and ignored directories, you can build the regular expression as a string and then use the qr
operator to compile it into a regular expression:
my @extensions = qw(avi flv mp3 mp4 wmv); # parsed from file
my $pattern = '\.(' . join('|', @wanted) . ')$';
my $regex = qr/$pattern/;
if ($file =~ $regex) {
# do something
}
The compilation isn't strictly necessary; you can use the string pattern directly:
if ($file =~ /$pattern/) {
# do something
}
Directories are a little harder because you have two different situations: full names and suffixes. Your configuration file will have to use different keys to make it clear which is which. e.g. "dir_name" and "dir_suffix." For full names I'd just build a hash:
%ignore = ('.svn' => 1);
Suffixed directories can be done the same way as file extensions:
my $dir_pattern = '(?:' . join('|', map {quotemeta} @dir_suffix), ')$';
my $dir_regex = qr/$dir_pattern/;
You could even build the patterns into anonymous subroutines to avoid referencing global variables:
my $file_filter = sub { $_ =~ $regex };
my $descend_filter = sub {
! $ignore{$File::Next::dir} &&
! $File::Next::dir =~ $dir_regex;
};
my $iter = File::Next::files({
file_filter => $file_filter,
descend_filter => $descend_filter,
}, $directory);