I am mostly a Matlab user and a Perl n00b. This is my first Perl script.
I have a large fixed width data file that I would like to process into a binary file with a
First off, this piece of code causes the input file to be read once for every param. Which is quite in-efficient.
foreach $current_param (@param_name) {
...
seek(INFILE,$data_start_pos,0); #Jump to data start
while ($line = ) { ... }
...
}
Also there is very rarely a reason to use a continue
block. This is more style / readability, then a real problem.
Now on to make it more performant.
I packed the sections individually, so that I could process a line exactly once. To prevent it from using up tons of RAM, I used File::Temp to store the data until I was ready for it. Then I used File::Copy to append those sections into the binary file.
This is a quick implementation. If I were to add much more to it, I would split it up more than it is now.
#!/usr/bin/perl
use strict;
use warnings;
use File::Temp 'tempfile';
use File::Copy 'copy';
use autodie qw':default copy';
use 5.10.1;
my $input_filename = shift @ARGV;
open my $input, '<', $input_filename;
my @param_names;
my $template = ''; # stop uninitialized warning
my @field_names;
my $field_name_line;
while( <$input> ){
chomp;
next if /^\s*$/;
if( my ($param) = /^\s*(.+?)\s+filter = ALL_VALUES\s*$/ ){
push @param_names, $param;
}elsif( /^[\s-]+$/ ){
my @fields = split /(\s+)/;
my $pos = 0;
for my $field (@fields){
my $length = length $field;
if( substr($field, 0, 1) eq '-' ){
$template .= "\@${pos}A$length ";
}
$pos += $length;
}
last;
}else{
$field_name_line = $_;
}
}
@field_names = unpack $template, $field_name_line;
for( @field_names ){
s(^\s+){};
$_ = lc $_;
$_ = 'type' if substr('type', 0, length $_) eq $_;
}
my %temp_files;
for my $param ( @param_names ){
for(qw'time data'){
my $fh = tempfile 'temp_XXXX', UNLINK => 1;
binmode $fh, ':raw';
$temp_files{$param}{$_} = $fh;
}
}
my %convert = (
TXT => sub{ pack 'A*', join "\n", @_ },
D => sub{ pack 'd*', @_ },
UI => sub{ pack 'L*', @_ },
);
sub print_time{
my($param,$time) = @_;
my $fh = $temp_files{$param}{time};
print {$fh} $convert{D}->($time);
}
sub print_data{
my($param,$format,$data) = @_;
my $fh = $temp_files{$param}{data};
print {$fh} $convert{$format}->($data);
}
my %data_type;
while( my $line = <$input> ){
next if $line =~ /^\s*$/;
my %fields;
@fields{@field_names} = unpack $template, $line;
print_time( @fields{(qw'name time')} );
print_data( @fields{(qw'name type value')} );
$data_type{$fields{name}} //= $fields{type};
}
close $input;
open my $bin, '>:raw', $input_filename.".bin";
open my $toc, '>', $input_filename.".toc";
for my $param( @param_names ){
my $data_fh = $temp_files{$param}{data};
my $time_fh = $temp_files{$param}{time};
seek $data_fh, 0, 0;
seek $time_fh, 0, 0;
my @toc_line = ( $param, $data_type{$param}, 0+sysseek($bin, 0, 1) );
copy( $time_fh, $bin, 8*1024 );
close $time_fh;
push @toc_line, sysseek($bin, 0, 1);
copy( $data_fh, $bin, 8*1024 );
close $data_fh;
push @toc_line, sysseek($bin, 0, 1);
say {$toc} join ',', @toc_line, '';
}
close $bin;
close $toc;