I\'m sure I\'ve done this in the past and there is something small I\'m forgetting, but how can I sort a CSV file on a certain column? I\'m interested in answers with and wi
There is also DBD::CSV:
#!/usr/bin/perl
use strict; use warnings;
use DBI;
my $dbh = DBI->connect('dbi:CSV:', undef, undef, {
RaiseError => 1,
f_ext => '.csv',
csv_tables => { test => { col_names => [qw' name age sex '] } },
});
my $sth = $dbh->prepare(q{
SELECT name, age, sex FROM test ORDER BY age
});
$sth->execute;
while ( my @row = $sth->fetchrow_array ) {
print join(',' => @row), "\n";
}
$sth->finish;
$dbh->disconnect;
Output:
name,21,male name,24,male name,25,female name,27,female
The original poster asked for no third-party modules (which I take to mean nothing from CPAN). Whilst this is restriction that will horribly limit your ability to write good modern Perl code, in this instance it's possible using the (core) Text::ParseWords module in place of the (non-core) Text::CSV. So, borrowing heavily from Alan's example, we get:
#!/usr/bin/env perl
use strict;
use warnings;
use Text::ParseWords;
my @rows;
while (<DATA>) {
push @rows, [ parse_line(',', 0, $_) ];
}
@rows = sort { $a->[1] <=> $b->[1] } @rows;
foreach (@rows) {
print join ',', @$_;
}
__DATA__
name,25,female
name,24,male
name,27,female
name,21,male
When you provide your own comparison code, you can sort on anything. Just extract the desired element with a regex, or probably a split in this case, and then compare on that. If you have a lot of elements, I would parse the data into a list of lists and then the comparison code can access it without parsing. That would eliminate parsing the same row over and over as it's compared with other rows.
using Raku (née Perl6)
This is a fairly quick-and-dirty solution, mainly intended for "hand-rolled" CSV. The code works as long as there's only one (1) age-per-row: Read lines $a
, comb for 1-to-3 <digit>
surrounded by commas and assign to @b
, derive sorting index $c
, use $c
to reorder lines $a
:
~$ raku -e 'my $a=lines(); my @b=$a.comb(/ \, <(\d**1..3)> \, /).pairs; my $c=@b.sort(*.values)>>.keys.flat; $a[$c.flat]>>.put;' sort_age.txt
name,21,male
name,24,male
name,25,female
name,27,female
I prepended a few dummy lines to the OP's input file see how the code above reacts with 1). a blank age field, 2). a blank "" string for age, 3). a bogus "9999" for age, and 4). a bogus "NA" for age. The code above fails catastrophically. To fix this you have to write a ternary that inserts a numeric placeholder value (e.g. zero) whenever the regex fails to match a line.
Below is a longer but more robust solution. Note--I use a placeholder value of 999
to move lines with blank/invalid ages to the bottom:
~$ raku -e 'my @a=lines(); my @b = do for @a {if $_ ~~ m/ \, <(\d**1..3)> \, / -> { +$/ } else { 999 }; }; my $c=@b.pairs.sort(*.values)>>.keys.flat; @a[$c.flat]>>.put;' sort_age.txt
name,21,male
name,24,male
name,25,female
name,27,female
name,,male
name,"",female
name,9999,male
name,NA,male
To sort in reverse, add .reverse
to the end of the method chain that creates $c
. Again, change the else
placeholder argument to move lines absent a valid age to the top or to the bottom. Also, creation of @b
above can be written using the ternary operator: my @b = do for @a {(m/ \, <(\d**1..3)> \, /) ?? +$/ !! 999 };
, as an alternative.
Here's the unsorted input file for posterity:
$ cat sort_age.txt
name,,male
name,"",female
name,9999,male
name,NA,male
name,25,female
name,24,male
name,27,female
name,21,male
HTH.
https://raku.org/
In the spirit of there always being another way to do it, bear in mind that plain old GNU sort might be enough.
$ sort -t, -k2 -n unsorted.txt
name,21,male
name,24,male
name,25,female
name,27,female
Where the command line args are:
-t, # use comma as the record separator
-k2 # sort on the second key (record) in the line
-n # sort using numerical comparison (like using <=> instead of cmp in perl)
If you want a Perl solution, wrap it in qx() ;-)
I would do something like this:
#!/usr/bin/perl
use warnings;
use strict;
my @rows = map { chomp; [split /[,\s]+/, $_] } <DATA>; #read each row into an array
my @sorted = sort { $a->[1] <=> $b->[1] } @rows; # sort the rows (numerically) by second column
for (@sorted) {
print join(', ', @$_) . "\n"; # print them out as CSV
}
__DATA__
name,25,female
name,24,male
name,27,female
name,21,male