Sorting a directory in perl, taking numbers into account

风流意气都作罢 提交于 2019-12-12 09:28:08

问题


I think I need some sort of Schwartzian Transform to get this working, but I'm having trouble figuring it out, as perl isn't my strongest language.

I have a directory with contents as such:

album1.htm
album2.htm
album3.htm
....
album99.htm
album100.htm

I'm trying to get the album with the highest number from this directory (in this case, album100.htm). Note that timestamps on the files are not a reliable means of determining things, as people are adding old "missing" albums after the fact.

The previous developer simply used the code snippet below, but this clearly breaks down once there are more than 9 albums in a directory.

opendir(DIR, PATH) || print $!;
@files = readdir(DIR);
foreach $file ( sort(@files) ) {
    if ( $file =~ /album/ ) {
        $last_file = $file;
    }
}

回答1:


If you just need to find the album with the highest number, you don't really need to sort the list, just run through it and keep track of the maximum.

#!/usr/bin/perl 

use strict;
use warnings;

my $max = 0;

while ( <DATA> ) {
    my ($album) = $_ =~ m/album(\d+)/;
    $max = $album if $album > $max;
}

print "album$max.htm";

__DATA__
album1.htm
album100.htm
album2.htm
album3.htm
album99.htm



回答2:


To find the highest number, try a custom sort...

sub sort_files {
    (my $num_a = $a) =~ s/^album(\d+)\.htm$/$1/;
    (my $num_b = $b) =~ s/^album(\d+)\.htm$/$1/;
    return $num_a <=> $num_b;
}

my @sorted = sort \&sort_files @files;
my $last = pop @sorted;

Also, take a look at the File::Next module. It will let you pick out just the files that begin with the word "album". I find it a little easier than readdir.




回答3:


The reason why you're encountering difficulties is the operator, <=> is the numeric comparison, cmp is the default and it is string comparison.

$ perl -E'say for sort qw/01 1 02 200/';
01
02
1
200

With a slight modification we get something much closer to correct:

$ perl -E'say for sort { $a <=> $b } qw/01 1 02 200/';
01
1
02
200

However, in your case you need to remove the non digits.

$ perl -E'say for sort { my $s1 = $a =~ m/(\d+)/; my $s2 = $b =~ /(\d+)/; $s1 <=> $s2  } qw/01 1 02 200/';
01
1
02
200

Here is it more pretty:

sort {
  my $s1 = $a =~ m/(\d+)/;
  my $s2 = $b =~ /(\d+)/;
  $s1 <=> $s2
}

This isn't flawless, but it should give you a good idea of your issue with sort.

Oh, and as a follow up, the Shcwartzian Transform solves a different problem: it stops you from having to run a complex task (unlike the one you're needing -- a regex) multiple times in the search algorithm. It comes at a memory cost of having to cache the results (not to be unexpected). Essentially, what you do is map the input of the problem, to the output (typically in an array) [$input, $output] then you sort on the outputs $a->[1] <=> $b->[1]. With your stuff now sorted you map back over to get your original inputs $_->[0].

map $_->[0],
sort { $a->[1] <=> $b->[1] }
map [ $_, fn($_) ]
, qw/input list here/
;

It is cool because it is so compact while being so efficient.




回答4:


Here you go, using Schwartzian Transform:

my @files = <DATA>;

print join '',
    map  { $_->[1] }
    sort { $a->[0] <=> $b->[0] }
    map  { [ m/album(\d+)/, $_ ] }
    @files;


 __DATA__
album12.htm
album1.htm
album2.htm
album10.htm



回答5:


Here's an alternative solution using reduce:

use strict;
use warnings;
use List::Util 'reduce';

my $max = reduce {
    my ($aval, $bval) = ($a =~ m/album(\d+)/, $b =~ m/album(\d+)/);
    $aval > $bval ? $a : $b
} <DATA>;
print "max album is $max\n";

__DATA__
album1.htm
album100.htm
album2.htm
album3.htm
album99.htm



回答6:


Here's a generic solution:

my @sorted_list
    = map  { $_->[0] } # we stored it at the head of the list, so we can pull it out
      sort {
          # first test a normalized version
          my $v = $a->[1] cmp $b->[1];
          return $v if $v;

          my $lim = @$a > @$b ? @$a : @$b;

          # we alternate between ascii sections and numeric
          for ( my $i = 2; $i < $lim; $i++ ) {
              $v  =  ( $a->[$i] || '' ) cmp ( $b->[$i] || '' );
              return $v if $v;

              $i++;
              $v = ( $a->[$i] || 0 ) <=> ( $b->[$i] || 0 );
              return $v if $v;
          }
          return 0;

      }
      map {
          # split on digits and retain captures in place.
          my @parts = split /(\d+)/;
          my $nstr  = join( '', map { m/\D/ ? $_ : '0' x length() } @parts );
          [ $_, $nstr, @parts ];
      } @directory_names
      ;


来源:https://stackoverflow.com/questions/2960488/sorting-a-directory-in-perl-taking-numbers-into-account

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!