Sort CSV based on a certain column?

前端 未结 7 922
自闭症患者
自闭症患者 2020-12-18 09:50

I\'m sure I\'ve done this in the past and there is something small I\'m forgetting, but how can I sort a CSV file on a certain column? I\'m interested in answers with and wi

相关标签:
7条回答
  • 2020-12-18 10:27

    There is also DBD::CSV:

    #!/usr/bin/perl
    
    use strict; use warnings;
    use DBI;
    
    my $dbh = DBI->connect('dbi:CSV:', undef, undef, {
        RaiseError => 1,
        f_ext => '.csv',
        csv_tables => { test => { col_names => [qw' name age sex '] } },
    });
    
    my $sth = $dbh->prepare(q{
        SELECT name, age, sex FROM test ORDER BY age
    });
    
    $sth->execute;
    
    while ( my @row = $sth->fetchrow_array ) {
        print join(',' => @row), "\n";
    }
    
    $sth->finish;
    $dbh->disconnect;
    

    Output:

    name,21,male
    name,24,male
    name,25,female
    name,27,female
    0 讨论(0)
  • 2020-12-18 10:33

    The original poster asked for no third-party modules (which I take to mean nothing from CPAN). Whilst this is restriction that will horribly limit your ability to write good modern Perl code, in this instance it's possible using the (core) Text::ParseWords module in place of the (non-core) Text::CSV. So, borrowing heavily from Alan's example, we get:

    #!/usr/bin/env perl
    
    use strict;
    use warnings;
    
    use Text::ParseWords;
    
    my @rows;
    
    while (<DATA>) {
        push @rows, [ parse_line(',', 0, $_) ];
    }
    
    @rows = sort { $a->[1] <=> $b->[1] } @rows;
    
    foreach (@rows) {
        print join ',', @$_;
    }
    
    __DATA__
    name,25,female
    name,24,male
    name,27,female
    name,21,male
    
    0 讨论(0)
  • 2020-12-18 10:35

    When you provide your own comparison code, you can sort on anything. Just extract the desired element with a regex, or probably a split in this case, and then compare on that. If you have a lot of elements, I would parse the data into a list of lists and then the comparison code can access it without parsing. That would eliminate parsing the same row over and over as it's compared with other rows.

    0 讨论(0)
  • 2020-12-18 10:36

    using Raku (née Perl6)

    This is a fairly quick-and-dirty solution, mainly intended for "hand-rolled" CSV. The code works as long as there's only one (1) age-per-row: Read lines $a, comb for 1-to-3 <digit> surrounded by commas and assign to @b, derive sorting index $c, use $c to reorder lines $a:

    ~$ raku -e 'my $a=lines();  my @b=$a.comb(/ \, <(\d**1..3)> \, /).pairs;  my $c=@b.sort(*.values)>>.keys.flat;  $a[$c.flat]>>.put;' sort_age.txt
    name,21,male
    name,24,male
    name,25,female
    name,27,female
    

    I prepended a few dummy lines to the OP's input file see how the code above reacts with 1). a blank age field, 2). a blank "" string for age, 3). a bogus "9999" for age, and 4). a bogus "NA" for age. The code above fails catastrophically. To fix this you have to write a ternary that inserts a numeric placeholder value (e.g. zero) whenever the regex fails to match a line.

    Below is a longer but more robust solution. Note--I use a placeholder value of 999 to move lines with blank/invalid ages to the bottom:

    ~$ raku -e 'my @a=lines(); my @b = do for @a {if $_ ~~ m/ \, <(\d**1..3)> \, / -> { +$/ } else { 999 }; }; my $c=@b.pairs.sort(*.values)>>.keys.flat;  @a[$c.flat]>>.put;' sort_age.txt
    name,21,male
    name,24,male
    name,25,female
    name,27,female
    name,,male
    name,"",female
    name,9999,male
    name,NA,male
    

    To sort in reverse, add .reverse to the end of the method chain that creates $c. Again, change the else placeholder argument to move lines absent a valid age to the top or to the bottom. Also, creation of @b above can be written using the ternary operator: my @b = do for @a {(m/ \, <(\d**1..3)> \, /) ?? +$/ !! 999 };, as an alternative.

    Here's the unsorted input file for posterity:

    $ cat sort_age.txt
    name,,male
    name,"",female
    name,9999,male
    name,NA,male
    name,25,female
    name,24,male
    name,27,female
    name,21,male
    

    HTH.

    https://raku.org/

    0 讨论(0)
  • 2020-12-18 10:40

    In the spirit of there always being another way to do it, bear in mind that plain old GNU sort might be enough.

    $ sort -t, -k2 -n unsorted.txt
    name,21,male
    name,24,male
    name,25,female
    name,27,female
    

    Where the command line args are:

    -t, # use comma as the record separator
    -k2 # sort on the second key (record) in the line
    -n  # sort using numerical comparison (like using <=> instead of cmp in perl)
    

    If you want a Perl solution, wrap it in qx() ;-)

    0 讨论(0)
  • 2020-12-18 10:48

    I would do something like this:

    #!/usr/bin/perl
    use warnings;
    use strict;
    
    my @rows = map { chomp; [split /[,\s]+/, $_] } <DATA>; #read each row into an array
    my @sorted = sort { $a->[1] <=> $b->[1] } @rows; # sort the rows (numerically) by second column
    
    for (@sorted) {
      print join(', ', @$_) . "\n"; # print them out as CSV
    }
    
    __DATA__
    name,25,female
    name,24,male
    name,27,female
    name,21,male
    
    0 讨论(0)
提交回复
热议问题