I have two text files that contain columnar data of the variety position
-value
, sorted by position
.
Here is an example of the
Looks like a problem one would likely stumble upon, for example database table data with keys and values. Here's an implementation of the pseudocode provided by rjp.
#!/usr/bin/perl
use strict;
use warnings;
sub read_file_line {
my $fh = shift;
if ($fh and my $line = <$fh>) {
chomp $line;
return [ split(/\t/, $line) ];
}
return;
}
sub compute {
# do something with the 2 values
}
open(my $f1, "file1");
open(my $f2, "file2");
my $pair1 = read_file_line($f1);
my $pair2 = read_file_line($f2);
while ($pair1 and $pair2) {
if ($pair1->[0] < $pair2->[0]) {
$pair1 = read_file_line($f1);
} elsif ($pair2->[0] < $pair1->[0]) {
$pair2 = read_file_line($f2);
} else {
compute($pair1->[1], $pair2->[1]);
$pair1 = read_file_line($f1);
$pair2 = read_file_line($f2);
}
}
close($f1);
close($f2);
Hope this helps!
If the files are sorted, step through them based on which one has the lower position.
Pseudocode:
read Apos, Aval from A # initial values
read Bpos, Bval from B
until eof(A) or eof(B)
if Apos == Bpos then
compare()
read Apos, Aval from A # advance both files to get a new position
read Bpos, Bval from B
fi
if Apos < Bpos then read Apos, Aval from A
if Bpos < Apos then read Bpos, Bval from B
end
You could also use join(1) to isolate the lines with common positions and process that at your leisure.
For looping through files you can use the core Tie::File module. It represents a regular text file as an array.
Here is a quick solution. If the data in both files is pretty much equivalent (e.g. same number of lines), you don't really need to store in hash tables. But I thought it would be helpful in case you the data is scrambled.
Code:
open(f1, "<data1");
open(f2, "<data2");
# initialize hashes
%data1 = ();
%data2 = ();
while(($line1 = <f1>) and ($line2 = <f2>)){
chomp($line1);
chomp($line2);
# split fields 1 and 2 into an array
@LINE1 = split(/\t/, $line1);
@LINE2 = split(/\t/, $line2);
# store data into hashes
$data1{$LINE1[0]} = $LINE1[1];
$data2{$LINE2[0]} = $LINE2[1];
# compare column 2
if ($data1{$LINE2[0]} == $data2{$LINE1[0]}){
# compute something
$new_val = $data1{$LINE2[0]} + $data2{$LINE1[0]};
print $LINE1[0] . "\t" . $new_val . "\n";
} else {
print $LINE1[0] . "\t" . $data1{$LINE1[0]} . "\n";
}
}
I hope it helps and let me know if its useful.