extract multiples columns from txt file perl

后端 未结 3 1969
情歌与酒
情歌与酒 2021-01-17 05:29

I have a txt file like this:

#Genera columnA columnB columnC columnD columnN
x1       1       3       7      0.9      2
x2       5       3       13     7            


        
相关标签:
3条回答
  • 2021-01-17 06:13

    You can simplify like this and using hash slices.

    #!/usr/bin/env perl
    use strict;
    use warnings;
    
    my @wanted = ( '#Genera' , qw (  columnA columnC columnN ));
    
    open my $input, '<', "file.txt" or die $!;
    
    chomp ( my @header = split ' ', <$input> ); 
    
    print join "\t", @wanted, "\n";
    while ( <$input> ) { 
       my %row;
       @row{@header} = split; 
       print join "\t", @row{@wanted}, "\n";
    }
    

    Which outputs:

    #Genera columnA columnC columnN 
    x1  1   7   2   
    x2  5   13  5   
    x3  0.1 7   0.4 
    

    If you want to exactly match your indentation then add sprintf to the mix:

    E.g.:

    print join "\t", map { sprintf "%8s", $_} @wanted, "\n";
    while ( <$input> ) { 
       my %row;
       @row{@header} = split; 
       print join "\t", map { sprintf "%8s", $_} @row{@wanted}, "\n";
    }
    

    Which then gives:

     #Genera     columnA     columnC     columnN           
          x1           1           7           2           
          x2           5          13           5           
          x3         0.1           7         0.4    
    
    0 讨论(0)
  • 2021-01-17 06:14

    There are command line switches that are used for this kind of application:

    perl -lnae 'print join "\t", @F[1,3,5]' file.txt
    

    Switch -a automatically creates variable @F for each line, split by whitespace. So @F[1,3,5] is an array slice of elements 1, 3, and 5.

    The downside of this, of course, is that you have to use the column numbers instead of the names.

    0 讨论(0)
  • 2021-01-17 06:16

    This program does as you ask. It expects the path to the input file as a parameter on the command line, which can then be read using the empty "diamond operator" <> without explicitly opening it

    Each non-blank line of the file is split into fields, and the header line is identified by the first starting with a hash symbol #

    A call to map converts the @wanted_fields array into a list of indexes into @fields where those column headers appear and stores it in array @idx

    This array is then used to slice the wanted columns from @fields for every line of input. The fields are printed, separated by tabs

    use strict;
    use warnings 'all';
    
    use List::Util 'first';
    
    my @wanted_fields = qw/ columnA columnC columnN /;
    
    my @idx;
    
    while ( <> ) {
        next unless /\S/;
    
        my @fields = split;
    
        if ( $fields[0] =~ /^#/ ) {
    
            @idx = ( 0, map {
                my $wanted = $_;
                first { $fields[$_] eq $wanted } 0 .. $#fields;
            } @wanted_fields );
        }
    
        print join( "\t", @fields[@idx] ), "\n" if @idx;
    }
    

    output

    #Genera columnA columnC columnN
    x1  1   7   2
    x2  5   13  5
    x3  0.1 7   0.4
    
    0 讨论(0)
提交回复
热议问题