Parse fixed-width files

后端 未结 3 1060
南方客
南方客 2021-01-12 00:09

I have a lot of text files with fixed-width fields:

            
Dave    Thomas    123 Main
Dan     Anderson  456 Center
Wilma   R         


        
相关标签:
3条回答
  • 2021-01-12 00:33

    As user604939 mentions, unpack is the tool to use for fixed width fields. However, unpack needs to be passed a template to work with. Since you say your fields can change width, the solution is to build this template from the first line of your file:

    my @template = map {'A'.length}        # convert each to 'A##'
                   <DATA> =~ /(\S+\s*)/g;  # split first line into segments
    $template[-1] = 'A*';                  # set the last segment to be slurpy
    
    my $template = "@template";
    print "template: $template\n";
    
    my @data;
    while (<DATA>) {
        push @data, [unpack $template, $_]
    }
    
    use Data::Dumper;
    
    print Dumper \@data;
    
    __DATA__
    <c>     <c>       <c>
    Dave    Thomas    123 Main
    Dan     Anderson  456 Center
    Wilma   Rainbow   789 Street
    

    which prints:

    template: A8 A10 A*
    $VAR1 = [
              [
                'Dave',
                'Thomas',
                '123 Main'
              ],
              [
                'Dan',
                'Anderson',
                '456 Center'
              ],
              [
                'Wilma',
                'Rainbow',
                '789 Street'
              ]
            ];
    
    0 讨论(0)
  • 2021-01-12 00:47

    Just use Perl's unpack function. Something like this:

    while (<FILE>) {
        my ($first,$last,$street) = unpack("A9A25A50",$_);
    
        <Do something ....>
    }
    

    Inside the unpack template, the "A###", you can put the width of the field for each A. There are a variety of other formats that you can use to mix and match with, that is, integer fields, etc... If the file is fixed width, like mainframe files, then this should be the easiest.

    0 讨论(0)
  • 2021-01-12 00:55

    CPAN to the rescue!

    DataExtract::FixedWidth not only parses fixed-width files, but (based on POD) appears to be smart enough to figure out column widths from header line by itself!

    0 讨论(0)
提交回复
热议问题