问题
I'm writing a Perl script to run through and grab various data elements such as:
1253592000
1253678400 86400 6183.000000
1253764800 86400 4486.000000
1253851200 36.000000 86400 10669.000000
1253937600 0.000000 86400 9126.000000
1254024000 0.000000 86400 2930.000000
1254110400 0.000000 86400 2895.000000
1254196800 0.000000 8828.000000
I can grab each line of this text file no problem.
I have working regex to grab each of those fields. Once I have the line in a variable, i.e. $line - how can I grab each of those fields and place them into their own variables even though they have different delimiters?
回答1:
This example illustrates how to parse the line either with whitespace as the delimiter (split) or with a fixed-column layout (unpack). With unpack
if you use upper-case (A10 etc), whitespace will be removed for you. Note: as brian d foy points out, the split
approach does not work well for a situation with missing fields (for example, the second line of data), because the field position information will be lost; unpack
is the way to go here, unless we are misunderstanding your data.
use strict;
use warnings;
while (my $line = <DATA>){
chomp $line;
my @fields_whitespace = split m'\s+', $line;
my @fields_fixed = unpack('a10 a10 a12 a28', $line);
}
__DATA__
1253592000
1253678400 86400 6183.000000
1253764800 86400 4486.000000
1253851200 36.000000 86400 10669.000000
1253937600 0.000000 86400 9126.000000
1254024000 0.000000 86400 2930.000000
1254110400 0.000000 86400 2895.000000
1254196800 0.000000 8828.000000
回答2:
Use my module DataExtract::FixedWidth. It is the most full featured, and well tested, for working with Fixed Width columns in perl. If this isn't fast enough you can pass in an unpack_string
and eliminate the need for heuristic detection of boundaries.
#!/usr/bin/env perl
use strict;
use warnings;
use DataExtract::FixedWidth;
use feature ':5.10';
my @rows = <DATA>;
my $de = DataExtract::FixedWidth->new({
heuristic => \@rows
, header_row => undef
});
say join ('|', @{$de->parse($_)}) for @rows;
--alternatively if you want header info--
my @rows = <DATA>;
my $de = DataExtract::FixedWidth->new({
heuristic => \@rows
, header_row => undef
, cols => [qw/timestamp field2 period field4/]
});
use Data::Dumper;
warn Dumper $de->parse_hash($_) for @rows;
__DATA__
1253592000
1253678400 86400 6183.000000
1253764800 86400 4486.000000
1253851200 36.000000 86400 10669.000000
1253937600 0.000000 86400 9126.000000
1254024000 0.000000 86400 2930.000000
1254110400 0.000000 86400 2895.000000
1254196800 0.000000 8828.000000
回答3:
I'm unsure of the column names and formatting but you should be able to adjust this recipe to your liking using Text::FixedWidth
use strict;
use warnings;
use Text::FixedWidth;
my $fw = Text::FixedWidth->new;
$fw->set_attributes(
qw(
timestamp undef %10s
field2 undef %10s
period undef %12s
field4 undef %28s
)
);
while (<DATA>) {
$fw->parse( string => $_ );
print $fw->get_timestamp . "\n";
}
__DATA__
1253592000
1253678400 86400 6183.000000
1253764800 86400 4486.000000
1253851200 36.000000 86400 10669.000000
1253937600 0.000000 86400 9126.000000
1254024000 0.000000 86400 2930.000000
1254110400 0.000000 86400 2895.000000
1254196800 0.000000 8828.000000
回答4:
You can split the line. It appears that your delimiter is just whitespace? You can do something on the order of:
@line = split(" ", $line);
This will match all whitespace. You can then do bounds checking and access each field via $line[0], $line[1], etc.
Split can also take a regular expression rather than a string as a delimiter as well.
@line = split(/\s+/, $line);
This might do the same thing.
回答5:
If all fields have the same fixed width and are formatted with spaces, you can use the following split
:
@array = split / {1,N}/, $line;
where N
is the with of the field. This will yield a space for each empty field.
回答6:
Fixed width delimiting can be done like this:
my @cols;
my %header;
$header{field1} = 0; // char position of first char in field
$header{field2} = 12;
$header{field3} = 15;
while(<IN>) {
print chomp(substr $_, $header{field2}, $header{field3}); // value of field2
}
My Perl is very rusty so I am sure there are syntax errors there. but that is the gist of it.
来源:https://stackoverflow.com/questions/1494611/how-can-i-extract-columns-from-a-fixed-width-format-in-perl