Sorting a directory in perl, taking numbers into account

I think I need some sort of Schwartzian Transform to get this working, but I'm having trouble figuring it out, as perl isn't my strongest language.

I have a directory with contents as such:

album1.htm
album2.htm
album3.htm
....
album99.htm
album100.htm

I'm trying to get the album with the highest number from this directory (in this case, album100.htm). Note that timestamps on the files are not a reliable means of determining things, as people are adding old "missing" albums after the fact.

The previous developer simply used the code snippet below, but this clearly breaks down once there are more than 9 albums in a directory.

opendir(DIR, PATH) || print $!;
@files = readdir(DIR);
foreach $file ( sort(@files) ) {
    if ( $file =~ /album/ ) {
        $last_file = $file;
    }
}

If you just need to find the album with the highest number, you don't really need to sort the list, just run through it and keep track of the maximum.

#!/usr/bin/perl 

use strict;
use warnings;

my $max = 0;

while ( <DATA> ) {
    my ($album) = $_ =~ m/album(\d+)/;
    $max = $album if $album > $max;
}

print "album$max.htm";

__DATA__
album1.htm
album100.htm
album2.htm
album3.htm
album99.htm

To find the highest number, try a custom sort...

sub sort_files {
    (my $num_a = $a) =~ s/^album(\d+)\.htm$/$1/;
    (my $num_b = $b) =~ s/^album(\d+)\.htm$/$1/;
    return $num_a <=> $num_b;
}

my @sorted = sort \&sort_files @files;
my $last = pop @sorted;

Also, take a look at the File::Next module. It will let you pick out just the files that begin with the word "album". I find it a little easier than readdir.

The reason why you're encountering difficulties is the operator, <=> is the numeric comparison, cmp is the default and it is string comparison.

$ perl -E'say for sort qw/01 1 02 200/';
01
02
1
200

With a slight modification we get something much closer to correct:

$ perl -E'say for sort { $a <=> $b } qw/01 1 02 200/';
01
1
02
200

However, in your case you need to remove the non digits.

$ perl -E'say for sort { my $s1 = $a =~ m/(\d+)/; my $s2 = $b =~ /(\d+)/; $s1 <=> $s2  } qw/01 1 02 200/';
01
1
02
200

Here is it more pretty:

sort {
  my $s1 = $a =~ m/(\d+)/;
  my $s2 = $b =~ /(\d+)/;
  $s1 <=> $s2
}

This isn't flawless, but it should give you a good idea of your issue with sort.

Oh, and as a follow up, the Shcwartzian Transform solves a different problem: it stops you from having to run a complex task (unlike the one you're needing -- a regex) multiple times in the search algorithm. It comes at a memory cost of having to cache the results (not to be unexpected). Essentially, what you do is map the input of the problem, to the output (typically in an array) [$input, $output] then you sort on the outputs $a->[1] <=> $b->[1]. With your stuff now sorted you map back over to get your original inputs $_->[0].

map $_->[0],
sort { $a->[1] <=> $b->[1] }
map [ $_, fn($_) ]
, qw/input list here/
;

It is cool because it is so compact while being so efficient.

Here you go, using Schwartzian Transform:

my @files = <DATA>;

print join '',
    map  { $_->[1] }
    sort { $a->[0] <=> $b->[0] }
    map  { [ m/album(\d+)/, $_ ] }
    @files;


 __DATA__
album12.htm
album1.htm
album2.htm
album10.htm

Here's an alternative solution using reduce:

use strict;
use warnings;
use List::Util 'reduce';

my $max = reduce {
    my ($aval, $bval) = ($a =~ m/album(\d+)/, $b =~ m/album(\d+)/);
    $aval > $bval ? $a : $b
} <DATA>;
print "max album is $max\n";

__DATA__
album1.htm
album100.htm
album2.htm
album3.htm
album99.htm

Here's a generic solution:

my @sorted_list
    = map  { $_->[0] } # we stored it at the head of the list, so we can pull it out
      sort {
          # first test a normalized version
          my $v = $a->[1] cmp $b->[1];
          return $v if $v;

          my $lim = @$a > @$b ? @$a : @$b;

          # we alternate between ascii sections and numeric
          for ( my $i = 2; $i < $lim; $i++ ) {
              $v  =  ( $a->[$i] || '' ) cmp ( $b->[$i] || '' );
              return $v if $v;

              $i++;
              $v = ( $a->[$i] || 0 ) <=> ( $b->[$i] || 0 );
              return $v if $v;
          }
          return 0;

      }
      map {
          # split on digits and retain captures in place.
          my @parts = split /(\d+)/;
          my $nstr  = join( '', map { m/\D/ ? $_ : '0' x length() } @parts );
          [ $_, $nstr, @parts ];
      } @directory_names
      ;

来源：https://stackoverflow.com/questions/2960488/sorting-a-directory-in-perl-taking-numbers-into-account

标签

perl

sorting

readdir