I would like your help on trimming a file by removing the columns with the same value.
# the file I have (tab-delimited, millions of columns)
jack 1 5 9
joh
#!/usr/bin/perl
$/="\t";
open(R,"<","/tmp/filename") || die;
while (<R>)
{
next if (($. % 4) == 3);
print;
}
Well, this was assuming it was the third column. If it is by value:
#!/usr/bin/perl
$/="\t";
open(R,"<","/tmp/filename") || die;
while (<R>)
{
next if (($_ == 5);
print;
}
With the question edit, OP's desires become clear. How about:
#!/usr/bin/perl
open(R,"<","/tmp/filename") || die;
my $first = 1;
my (@cols);
while (<R>)
{
my (@this) = split(/\t/);
if ($. == 1)
{
@cols = @this;
}
else
{
for(my $x=0;$x<=$#cols;$x++)
{
if (defined($cols[$x]) && !($cols[$x] ~~ $this[$x]))
{
$cols[$x] = undef;
}
}
}
next if (($_ == 5));
# print;
}
close(R);
my(@del);
print "Deleting columns: ";
for(my $x=0;$x<=$#cols;$x++)
{
if (defined($cols[$x]))
{
print "$x ($cols[$x]), ";
push(@del,$x-int(@del));
}
}
print "\n";
open(R,"<","/tmp/filename") || die;
while (<R>)
{
chomp;
my (@this) = split(/\t/);
foreach my $col (@del)
{
splice(@this,$col,1);
}
print join("\t",@this)."\n";
}
close(R);
As I understand you want to go through each line and check if values in some column have no variance, and then i that case you can remove that column.
If that is the case I have a suggestion, but not ready made script, but I think you'll be able to figure it out. You should look at cut
. It extracts parts of line. You can use it to extract i.e. column one, then run uniq
on outputted data, and then if after unique theres only one value, it means all values in that column are identical. This way you can collect numbers of columns that have no variance. You will need shell script to see how many columns you file has(i guess using head -n 1
and counting number of delimiters) and run such procedure on every column, storing column numbers in array, then in the end crafting cut line to remove columns that are of no interest. Granted its not awk or perl but should work, and would use only traditional Unix tools. Well you can use them in perl script if you want :)
Well and i if misunderstood the question maybe cut will still be useful:) it seems to be one of lesser known tools.