I have a txt file like this:
#Genera columnA columnB columnC columnD columnN
x1 1 3 7 0.9 2
x2 5 3 13 7
You can simplify like this and using hash slices.
#!/usr/bin/env perl
use strict;
use warnings;
my @wanted = ( '#Genera' , qw ( columnA columnC columnN ));
open my $input, '<', "file.txt" or die $!;
chomp ( my @header = split ' ', <$input> );
print join "\t", @wanted, "\n";
while ( <$input> ) {
my %row;
@row{@header} = split;
print join "\t", @row{@wanted}, "\n";
}
Which outputs:
#Genera columnA columnC columnN
x1 1 7 2
x2 5 13 5
x3 0.1 7 0.4
If you want to exactly match your indentation then add sprintf
to the mix:
E.g.:
print join "\t", map { sprintf "%8s", $_} @wanted, "\n";
while ( <$input> ) {
my %row;
@row{@header} = split;
print join "\t", map { sprintf "%8s", $_} @row{@wanted}, "\n";
}
Which then gives:
#Genera columnA columnC columnN
x1 1 7 2
x2 5 13 5
x3 0.1 7 0.4
There are command line switches that are used for this kind of application:
perl -lnae 'print join "\t", @F[1,3,5]' file.txt
Switch -a
automatically creates variable @F
for each line, split by whitespace. So @F[1,3,5]
is an array slice of elements 1, 3, and 5.
The downside of this, of course, is that you have to use the column numbers instead of the names.
This program does as you ask. It expects the path to the input file as a parameter on the command line, which can then be read using the empty "diamond operator" <>
without explicitly opening it
Each non-blank line of the file is split into fields, and the header line is identified by the first starting with a hash symbol #
A call to map
converts the @wanted_fields
array into a list of indexes into @fields
where those column headers appear and stores it in array @idx
This array is then used to slice the wanted columns from @fields
for every line of input. The fields are printed, separated by tabs
use strict;
use warnings 'all';
use List::Util 'first';
my @wanted_fields = qw/ columnA columnC columnN /;
my @idx;
while ( <> ) {
next unless /\S/;
my @fields = split;
if ( $fields[0] =~ /^#/ ) {
@idx = ( 0, map {
my $wanted = $_;
first { $fields[$_] eq $wanted } 0 .. $#fields;
} @wanted_fields );
}
print join( "\t", @fields[@idx] ), "\n" if @idx;
}
#Genera columnA columnC columnN
x1 1 7 2
x2 5 13 5
x3 0.1 7 0.4