Sort CSV based on a certain column?

前端未结

关注

 7  922

I\'m sure I\'ve done this in the past and there is something small I\'m forgetting, but how can I sort a CSV file on a certain column? I\'m interested in answers with and wi

相关标签:

7条回答

情话喂你

2020-12-18 10:27

There is also DBD::CSV:

#!/usr/bin/perl

use strict; use warnings;
use DBI;

my $dbh = DBI->connect('dbi:CSV:', undef, undef, {
    RaiseError => 1,
    f_ext => '.csv',
    csv_tables => { test => { col_names => [qw' name age sex '] } },
});

my $sth = $dbh->prepare(q{
    SELECT name, age, sex FROM test ORDER BY age
});

$sth->execute;

while ( my @row = $sth->fetchrow_array ) {
    print join(',' => @row), "\n";
}

$sth->finish;
$dbh->disconnect;

Output:

name,21,male
name,24,male
name,25,female
name,27,female

0 讨论(0)

野的像风

2020-12-18 10:33
The original poster asked for no third-party modules (which I take to mean nothing from CPAN). Whilst this is restriction that will horribly limit your ability to write good modern Perl code, in this instance it's possible using the (core) Text::ParseWords module in place of the (non-core) Text::CSV. So, borrowing heavily from Alan's example, we get:
```
#!/usr/bin/env perl

use strict;
use warnings;

use Text::ParseWords;

my @rows;

while (<DATA>) {
    push @rows, [ parse_line(',', 0, $_) ];
}

@rows = sort { $a->[1] <=> $b->[1] } @rows;

foreach (@rows) {
    print join ',', @$_;
}

__DATA__
name,25,female
name,24,male
name,27,female
name,21,male
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
一向

2020-12-18 10:35

When you provide your own comparison code, you can sort on anything. Just extract the desired element with a regex, or probably a split in this case, and then compare on that. If you have a lot of elements, I would parse the data into a list of lists and then the comparison code can access it without parsing. That would eliminate parsing the same row over and over as it's compared with other rows.

0 讨论(0)
发布评论:

提交评论
- 加载中...
梦毁少年i

2020-12-18 10:36
using Raku (née Perl6)

This is a fairly quick-and-dirty solution, mainly intended for "hand-rolled" CSV. The code works as long as there's only one (1) age-per-row: Read lines $a, comb for 1-to-3 <digit> surrounded by commas and assign to @b, derive sorting index $c, use $c to reorder lines $a:
```
~$ raku -e 'my $a=lines();  my @b=$a.comb(/ \, <(\d**1..3)> \, /).pairs;  my $c=@b.sort(*.values)>>.keys.flat;  $a[$c.flat]>>.put;' sort_age.txt
name,21,male
name,24,male
name,25,female
name,27,female
```
I prepended a few dummy lines to the OP's input file see how the code above reacts with 1). a blank age field, 2). a blank "" string for age, 3). a bogus "9999" for age, and 4). a bogus "NA" for age. The code above fails catastrophically. To fix this you have to write a ternary that inserts a numeric placeholder value (e.g. zero) whenever the regex fails to match a line.

Below is a longer but more robust solution. Note--I use a placeholder value of 999 to move lines with blank/invalid ages to the bottom:
```
~$ raku -e 'my @a=lines(); my @b = do for @a {if $_ ~~ m/ \, <(\d**1..3)> \, / -> { +$/ } else { 999 }; }; my $c=@b.pairs.sort(*.values)>>.keys.flat;  @a[$c.flat]>>.put;' sort_age.txt
name,21,male
name,24,male
name,25,female
name,27,female
name,,male
name,"",female
name,9999,male
name,NA,male
```
To sort in reverse, add .reverse to the end of the method chain that creates $c. Again, change the else placeholder argument to move lines absent a valid age to the top or to the bottom. Also, creation of @b above can be written using the ternary operator: my @b = do for @a {(m/ \, <(\d**1..3)> \, /) ?? +$/ !! 999 };, as an alternative.

Here's the unsorted input file for posterity:
```
$ cat sort_age.txt
name,,male
name,"",female
name,9999,male
name,NA,male
name,25,female
name,24,male
name,27,female
name,21,male
```
HTH.

https://raku.org/
0 讨论(0)
发布评论:

提交评论
- 加载中...
不知归路

2020-12-18 10:40
In the spirit of there always being another way to do it, bear in mind that plain old GNU sort might be enough.
```
$ sort -t, -k2 -n unsorted.txt
name,21,male
name,24,male
name,25,female
name,27,female
```
Where the command line args are:
```
-t, # use comma as the record separator
-k2 # sort on the second key (record) in the line
-n  # sort using numerical comparison (like using <=> instead of cmp in perl)
```
If you want a Perl solution, wrap it in qx() ;-)
0 讨论(0)
发布评论:

提交评论
- 加载中...

情书的邮戳

2020-12-18 10:48

I would do something like this:

#!/usr/bin/perl
use warnings;
use strict;

my @rows = map { chomp; [split /[,\s]+/, $_] } <DATA>; #read each row into an array
my @sorted = sort { $a->[1] <=> $b->[1] } @rows; # sort the rows (numerically) by second column

for (@sorted) {
  print join(', ', @$_) . "\n"; # print them out as CSV
}

__DATA__
name,25,female
name,24,male
name,27,female
name,21,male

0 讨论(0)

1 2 下一页